Let me tell you about the last time I used RAID-5 in 4 disk array:
When one of my disks died a sudden death, I swiftly popped in a new one and kicked off the resync process, which took about 48 hours to complete.
14 minutes(!) after the resync finished a second disk died.
Lesson 1: RAID-5 is pretty useless. When a disk dies you have no more redundancy during the extremely stressful RAID recovery.
Lesson 2: RAID is not an alternative to backups.
@fribbledom This is why I only use raid 0. RIDE OR DIE
@fribbledom i’m feeling kind of called out here and i don’t like it!
@fribbledom raid 6 is way better imho
It certainly makes it a lot more likely your array will survive in the worst case scenario.
@fribbledom oh yeah. It's not a backup but it helps, especially since the two-disk failure mode is actually super common and kills a lot of people's RAIDs
@fribbledom very true. "RAID is not a backup" is an oft-repeated mantra that still doesn't seem to sink in for some people. Honestly I just use RAID because I can't be bothered to manage multiple mount points haha
@fribbledom Very true. To me, RAID is uptime enhancement. I back my NAS up locally, my PCs to the cloud, and my cloud drives to the NAS.
@fribbledom last time I used RAID-5 and a disk died, I replaced the dead disk and a second disk died during the resync. that's the only time I've ever lost data that I was specifically trying to protect.
@walruslifestyle @fribbledom I remember reading a long article that basically amounted to "For disks bigger than a TB and without special low failure rates (read: expensive), RAID5 almost guarantees another disk will break during rebuild"
Basically, big "normal" disks' failure rates are too low. Expensive "storage" HDs have an order of magnitude better survivability. But bad chance can still bite you in the ass.
@fribbledom I'd recommend this article from @sir, which reasons about some pros and cons of RAID: https://drewdevault.com/2020/04/22/How-to-store-data-forever.html
@fribbledom RAID5 is a trap. It's a pet peeve of mine that people downtalk btrfs because it doesn't support RAID5/6. Of course not, in any situation requiring RAID you'd be better served with RAID0,1, or 10
@fribbledom Yeah, I know a place who suffered a major data loss because they had no backups. Or rather, they were using RAID as the backup, and had no other backup otherwise. Also, they were using the kind of RAID where if you lose one disk you lose everything... so even less of a backup.
@fribbledom Holy shiiiiit. D:
@fribbledom I run 3-disk raid1e (or raid10-near2, depending on who you ask)
it seemed balanced to me, and I noted that, it gives you essentially the same stats as 4-disk raid10 but with fewer disks:
- up to one drive failure*
- 50% capacity
- at least 2x the speed of the slowest drive
*yes, 4-disk raid10 can survive 2 drive failures, but only as long as they're the correct drives, and I don't trust that
@fribbledom (and then you potentially have a 4th disk available for backups)
@fribbledom I agree, sir.
@fribbledom Rackspace lost my data that way once. The guy who called me was surprised I had backups.
"The biggest difference between RAID 5 and RAID 10 is how it rebuilds the disks. RAID 10 only reads the surviving mirror and stores the copy to the new drive you replaced. Your usual read and write operations are virtually unchanged from normal operations."
"However, if a drive fails with RAID 5, it needs to read everything on all the remaining drives to rebuild the new, replaced disk. Compared to RAID 10 operations, which reads only the surviving mirror, this extreme load means you have a much higher chance of a second disk failure and data loss."
@fribbledom I've always just kept all my stuff on one disk and occasionally backed it up to another, or in the last 10 years or so to a cloud thing. I've never messed around with RAID because someone described it to me once and I went "OK so you buy two hard drives, same sort, probably from the same batch, put 'em in the same box, with the same vibration, the same heat cycles, doing the same wear to each... that's not backup, that's figuring out the tolerances at the hard drive factory."
@fribbledom So true! 3 years ago one of my friends was not so lucky and lost a second disk before the resync could complete.
@fribbledom 6 is bad too.
@fribbledom We had a 3-disk RAID failure at work once.
@fribbledom I just back my important personal data up regularly and assume I'll lose the rest every few years.
@fribbledom Here's my story from just this past week. I've had a 2-bay Synology with the same 2 disks in in RAID-1 for the last ~9yrs. I got an alert on Weds that disk 2 has failed. I immediately made a backup to an external hard drive and ordered a replacement disk. I installed the new disk on Saturday and while the array was rebuilding, disk 1 failed.
RAID saved my neck in that it gave me a short window to make a real backup before the entire array crashed. RAID itself is not a backup.
@fribbledom which drives did you use? When did you buy them?
WD Reds 3TB, from 2012, iirc.
@fribbledom meaning: Same brand, bought at the same time?
Personal painful experience: Lifetime variance correlates with the production batch. Most likely the source of stories like yours ("Second HD failed during RAID rebuild").
I try to buy at different times from different vendors...
@fribbledom I'm using a RAID-Z2, and bought drives from different suppliers, to minimize chances of multiple drives failing
@fribbledom I really hope that will be enough, otherwise ~20TB of Data needs to be recovered from varios sources. (no, i don't have a "one contains everything"-backup)
I can recommend this article.
1) Mirrors are faster and easier to maintain than other (z)raid-levels.
2) do backups
@fribbledom I had 4 failing disks (non zero hw ecc errors) working with 4 redundant copies with btrfs for ~4 years non-stop, monthly scrub. Still no data lost, but the ext2 /boot partition is gone.
@fribbledom we lost a server that was on Raid 5 because the lazy and incompetent admin didn't know a disk had failed. So we lost about 6 hours of customer data plus the downtime.
I looked into it, as I know nothing about such things, realised raid 5 was pants and told him to instruct our IT contractors to rebuild the server with a different configuration.
His reply: aw but I've already told em to do it exactly how it was before and I don't want to email them again.
Spoiler alert, he didn't. The same thing happened again in 6 months time.
@fribbledom RAID increases availability by (usually) sparing you a tedious restore from backups. If you're paranoid, you should proactively replace drives on a rolling schedule rather than wait for one to fail.
Backups allow you to restore after an admin accidentally the whole /
RAID 60 means that no amount of disk failure will stop an admin from accidentally the whole /
I can only recommend RAID 5 for disks up to 4 TB, and this is for data you either have backed up off site, or you can fetch again (torrents)
@fribbledom If you bought the disks at the same time it might suggest another thing: the life expectancy of modern disks is very predictable!
Definitely not an alternative for backups (availability, not safety), but I've had several RAID 5 arrays, and several rebuilds, and it's fine.
I do think, though, that at around 6 disks the latest, RAID 6 is the much safer option.
I guess it also depends what disks you use and how old they are. The more probable the next failure the higher the danger...
Trying BTRFS Raid on my main machine now, but it's already got checksum failures and haven't worked out how to fix them..
Just had this convo come up today with a coworker, asking why I don't have the same problems in servers. I thought you might be at least interested.
Higher end raid controllers do more than raid.
There is a feature called "scrubbing", mainly to find and fix bit errors.
But a side effect is, the controller prioritizes host requests, and when the host is "idle" it spends its free time scrubbing in the background.
This means the drives are always under 100% load.
There's zero difference between the high load of scrubbing and the high load of a raid rebuild.
I also have a drive swap schedule based on the low end MTBF.
On raid controllers without this, disk load varies, so a higher chance of another disk failing during a rebuild.
Often also the owners ignore MTBF, because "it's still working", so you have older drives usually past their life expectancy, seeing a huge load difference.
It's the most likely time a second drive may die.
Raid 6 can only help so much there.
A hotspare is always good, but to use it still triggers a rebuild onto it.
But bast practice is raid6, hotspares, coldspares, background scrubbing, and battery backed cache.
Plus backups backups backups! With extra backups sprinkled on top!
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!