Using RAID-5 Means The Sky Is Falling

HardOCP News · Mar 11, 2015

Today's "must read" editorial of the day is brought to you by Olin and the crew at Benchmark Reviews.

Some writers build their reputation by making audacious claims that create controversy, done solely to help propel traffic onto the website they write for. Common sense and real-word experience be damned; let the lack of evidence claiming otherwise and the use of complex math help prove their confusing point!

schizrade · Mar 11, 2015

I will take the advice of my storage vendors when they tell me that over a certain size (1.2TB+), RAID 5 should not be used due to the risk of loss of additional platters during a rebuild, which takes longer as platter sizes climb.

Case in point I lost a drive in a 24 (2TB platters) drive array 2 months after it was installed. Another drive died halfway through the rebuild. RAID 6 saved me a very painful recovery. This has never happened to me in 15 years, but it did that day.

I use 5 in many other things, just not with large platters.

Liger88 · Mar 11, 2015

schizrade said:
I will take the advice of my storage vendors when they tell me that over a certain size (1.2TB+), RAID 5 should not be used due to the risk of loss of additional platters during a rebuild, which takes longer as platter sizes climb.

Case in point I lost a drive in a 24 (2TB platters) drive array 2 months after it was installed. Another drive died halfway through the rebuild. RAID 6 saved me a very painful recovery. This has never happened to me in 15 years, but it did that day.

I use 5 in many other things, just not with large platters.

It goes both ways if you ask me. It doesn't matter what RAID level you choose, RAID is and always will be a dangerous way of pooling data. You're accounting on statistics to save your ass and offer you reliability in reality. So long as you know and understand the dangers of each RAID level then you can take the necessary steps to protecting yourself from its faults.

The author does point out something which I've often thought the past 5-6 years myself about URE rates magically jumping up for those rated for "enterprise" or "high-end" consumer drives. It does come across as something that was spun into existence and drive manufacturers are capitalizing on the fear to sell more disks. With capacities reaching incredibly large sizes, prices forever falling, people buying fewer hard drives per system, and SSD's, they are desperate to get more sold.

Makes complete sense with all these rainbow colors coming from Western Digital in recent years and all the computer coded variants from Seagate. At this point selling more drives is all they have left and RAID-6 oddly enough helps that.

A danger is clearly there, but how much of it is over-hyped and how much is because people don't take the appropriate steps to secure there data? I think the future lies in technologies at the file system level such as: ReFS and ZFS. It's a technology whose time has come.

JoeComp · Mar 11, 2015

I would like to see a study of a large number of 4+ TB consumer and/or enterprise HDDs that attempts to determine the average URE rate. I wonder if Backblaze has data that would let them do it.

I'd also like to see all RAID controllers and software allow an option to NOT automatically fail an HDD for a single URE (I know some RAID systems already have this option or default behavior, but I think it should be available for all RAID systems). Perhaps have a configurable number of UREs necessary to fail the drive. This would be beneficial not only for RAID 5, but also for higher parity levels. For example, with RAID 6, if you had a system that would automatically fail any drive after a single URE, then if you had two HDDs with UREs during rebuilding (after a single disk failure), the rebuild would fail. But if you just skip the URE on each HDD and keep going, then assuming UREs are not in the same stripe, you would be able to complete the rebuild without data loss (since you would have at most two missing chunks per stripe, and RAID 6 can rebuild a stripe with 2 missing chunks).

panhead · Mar 11, 2015

Not all drives have 10^14 UREs.
Consumer SATA drives are 10^14
Nearline Enterprise 10^15
Enterprise SAS/FC 10^16
some Enterprise SAS SSD's 10^17

schizrade · Mar 11, 2015

Liger88 said:
I think the future lies in technologies at the file system level such as: ReFS and ZFS. It's a technology whose time has come.

Yes it does. I can see on the horizon NTFS going bye bye and ReFS and DAC becoming the controlling method.

leeleatherwood · Mar 11, 2015

schizrade said:
Yes it does. I can see on the horizon NTFS going bye bye and ReFS and DAC becoming the controlling method.

I recently implemented NTFS on a 27TB LUN, I was planning on ReFS but at the last minute seen some amazing results with Server 2012 Dedup and Veeam showing many people getting in excess of 70% space savings with the Server 2012 Dedup alone.

Veeam has file level dedup that is perfectly compatible with the Server 2012 Block level dedup, I in my case ~18TB of data down to less than 2TB.

Anyways, so far this is only supported on NTFS, I hope with Server 10/"Next" it will be available on ReFS and we can make the switch.

temujin987 · Mar 11, 2015

Some writers build their reputation by making audacious claims that create controversy, done solely to help propel traffic onto the website they write for.

that's why [H] should stop linking to kotaku

oROEchimaru · Mar 11, 2015

what does this have to do with kotaku? kotaku is a good site and fun for gamer/manga news. get over one bad article geez. dont be a hemroid, my ass itches enough. if it wasnt for steve and kyle i wouldnt know half of the tech news on the internet in the last 14 years. +1 for all the reading and surfing they do. somehow they are not as fat as me.

Zarathustra[H] · Mar 11, 2015

I understand their point, but I also feel that the sentiment that RAID5 has outlived its usefulness is true.

There is the URE problem that they mention, in which you are pretty much guaranteed some sort of trouble during a rebuild (even if it may just be a flipped bit somewhere you never notice) but there is also the rebuild risk. You have no redundancy during a rebuild, and if a second drive fails, BAM you just lost everything.

For my own home storage I have use the ZFS equivalent of a 12 disk RAID60. It might be overkill, but I like overkill, especially when my data is on the line. Better safe than sorry.

(Oh, and I'd also use ECC ram and a UPS on any storage server)

USMCGrunt · Mar 11, 2015

BS complaint article plain and simple. All he has to back his argument up with is, "It hasn't happened to me in all my years so it must never happen and I don't feel like it will happen!"

A perfect rebuttal, in my mind at least, is using his analogy with cars. If 96% of people will not experience a car accident in a given year, why bother wearing seat belts

RAID 5 is intended to increase performance over RAID 1 while providing additional redundancy features in the event of a drive failure, if something comes along that puts that purpose at a greater potential risk, AND there is a better alternative, such as RAID 6 or 10, why continue to use it?

drescherjm · Mar 11, 2015

There is the URE problem that they mention, in which you are pretty much guaranteed some sort of trouble during a rebuild

If you have enough disks and do regular scrubs you will know that this guaranty of a URE is absolute garbage. I mean for healthy drives the stated URE rates are at least 1 order of magnitude lower than actual rates that I see (maybe more) for drives on their way out the URE rates are understated.

ecktt · Mar 12, 2015

First I should say, RAID is for uptime and NOT a substitute for backups but I can see the authors point. The data center I work at, HD failures are few are very far between. So much so that scrubbing old disk for disposal it is a very regular occurrence. At home though, things are different. I could never get consumer grade stuff to do RAID 5. That like asking for a disk to drop and have a crawling system for up to 48hrs. The added wear and tear of rebuilds eats at the MTBF too.

aaronspink · Mar 12, 2015

Liger88 said:
A danger is clearly there, but how much of it is over-hyped and how much is because people don't take the appropriate steps to secure there data? I think the future lies in technologies at the file system level such as: ReFS and ZFS. It's a technology whose time has come.

Neither ReFS nor ZFS correct errors, they simply provide better FS detection of errors. They still rely on some underlying resiliency method to maintain working storage. This is largely double parity and mirroring.

Large scale storage implementations have discarded RAID entirely and are using strait replication/mirroring and/or erasure coding.

Aesma · Mar 12, 2015

I use RAIDZ3 and would go RAIDZ4 if it was available.

the-one1 · Mar 12, 2015

Something something cold. Something something driving......

mwarps · Mar 12, 2015

For performance I use RAID10. For redundancy I use raidz2.

Zarathustra[H] · Mar 12, 2015

Aesma said:
I use RAIDZ3 and would go RAIDZ4 if it was available.

I considered going RAIDz3, but concluded it wasn't worth it. Ran ZFS in a 6 disk, then 8 disk single RAIDz2 vdev for almost 5 years with a mix of bad old drives, many of them WD greens not really suitable for this application. Never came even close to having more than one drive fail at the same time. The smallest time period between failures was a year. If the unlikely occurred and I had two drives fail at the same time, I could still have rebuilt (but with risk of URE errors). 3 disks failing at the same time is just incredibly unlikely.

Since then I have moved to all new more reliable WD RED drives. I have 12 of them in the pool in two six drive RAIDz2 vdevs I have yet to have a failure (but the drives are new).

I'd imagine as you increase vdev size to more and more drives it makes sense to increase the parity drive count, as with more drives spinning at the same time, the risk of any one failing goes up, and RAIDz3 may make sense, but ZFS isn't recommended for vdevs above 10 drives anyway. Instead they recommend doing what I have done, combining smaller vdevs into one larger pool.

In my setup, up to 4 drives can fail without catastrophic data loss, it's just not the "any 4 drives" you'd expect with a potential RAIDz4. Any two drives in either of the two vdevs can fail without catastrophic data loss, but three fail in one and all data is lost. I consider this VERY unlikely though.

If it DOES happen, I have everything backed up on crashplan, though I'd rather not have to download everything from there, as it would likely take a while...

i960 · Mar 12, 2015

As others have pointed out, the article is garbage. The guy is basing everything on his opinion and experience. Everyone has different experiences. I've seen too many RAID 5 sets fail during rebuild to ever trust it, and that was before TB drives were common place.

The issue with RAID 5 goes beyond potential URE's. Even if you never see one, you can still lose the whole set during a rebuild. It's very common for people to buy all of their drives from the same vendor at the same time. If there was an issue with that particular production run of drives, and all of them are in the same RAID set, experiencing the same environment and the same amount of wear, the chances of 2 or more drives all failing at the same time is high. When one drive goes in a RAID 5 set and you start a rebuild, the remaining drives will see a lot more abuse, and this could accelerate the failure of one or more additional drives. And since RAID 5 takes forever to rebuild, you are left in a very vulnerable state for a long time. You are pretty much running on a RAID 0 set at that point as there is NO redundancy.

On top of that, performance of the set during the rebuild is going to be terrible, much more so than with RAID 10. A RAID 5 set that experiences a drive failure really should be taken offline and rebuilt out of production. And if you are doing that, then you are losing the main benefit of RAID in the first place, uptime. So really, the only time I would ever use RAID 5 (actually, I never would, but if I HAD to) would be in a backup server, and only if it is one of several backup servers that can be taken offline at anytime without issue. I would never ever use RAID 5 or 6 in a production server that wasn't part of some sort of cluster to insure continuous operation if the whole set fails. People use RAID 5 in production for only 2 reasons, it's cheaper than RAID 10, and people are ignorant and don't understand the implications. That's it. No one who really knows what they are doing would use it on purpose.

SirMaster · Mar 12, 2015

panhead said:
Not all drives have 10^14 UREs.
Consumer SATA drives are 10^14
Nearline Enterprise 10^15
Enterprise SAS/FC 10^16
some Enterprise SAS SSD's 10^17

Correction.

Consumer SATA drives are < 10^14 not equal to. Or at least all the datasheets I've seen on consumer disks said less than or used a < sign.

I have 12 4TB WD Red consumer drives and have scrubbed my zpool many times for over 100TB of data read without a single URE. So my consumer drives are performing currently at around < 10^15 which is still < 10^14 which is what the datasheet says it should do.

drescherjm said:
If you have enough disks and do regular scrubs you will know that this guaranty of a URE is absolute garbage. I mean for healthy drives the stated URE rates are at least 1 order of magnitude lower than actual rates that I see (maybe more) for drives on their way out the URE rates are understated.

Yep, this is what I have found in my experience. I see 1 URE in 100-200TB of data read on my consumer WD disks.

Zarathustra[H] · Mar 12, 2015

i960 said:
It's very common for people to buy all of their drives from the same vendor at the same time. If there was an issue with that particular production run of drives, and all of them are in the same RAID set, experiencing the same environment and the same amount of wear, the chances of 2 or more drives all failing at the same time is high.

Agreed. Not a big issue for me, as I tend to trickle drives in piece by piece, so they are almost certainly from different lots, shipped in different boxes ( but usually from the same online retailer

)

If you do buy a bunch at once though, doing some drive testing before using them is a pretty good idea.

I also make sure to replace drives quickly. I don't keep hot (or even cold) spares, but if a drive goes down, I amazon prime a replacement instantly.

Even so, I don't trust RAID5. I do two striped RAIDz2 vdevs, the ZFS equivalent of RAID 60.

brutalizer · Mar 12, 2015

This is old news and I have been talking about this for years, on this forum too. Ive posted this link from 2007:
http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

In the beginning, people said I was dumb, stupid, etc - that this was not a problem. That hardware raid was safe, raid-5 was safe, ZFS was totally unnecessary, etc etc etc.

That is why I have talked about ZFS, because there is a reason Sun Microsystem developed ZFS - because of problems like these. Sun also predicted the need for raidz3 several years ago, which might come in handy.

I suspect, in the future, if you have 1000s of disks, there are always a few disks that are resilvering. So, there is always some reparation going on. Maybe you need raidz4, soon?

Zarathustra[H] · Mar 12, 2015

brutalizer said:
That is why I have talked about ZFS, because there is a reason Sun Microsystem developed ZFS - because of problems like these. Sun also predicted the need for raidz3 several years ago, which might come in handy.

I'm curious. What is the argument behind the need for RAIDz3?

In ZFS having even a single parity drive left will result in automatic healing of any URE on read, so you wouldn't have this protection if your RAIDz1 vdev lost a disk and you had to rebuild.

This, and the risk of having a second drive fail during rebuild are the main arguments against RAIDz1 and RAID5.

RAIDz2 solves this, as as long as you are expedient in your disk swapping upon failure the chance of having a second drive fail in the same VDEV leaving you exposed to URE's or having a third drive fail and losing everything are very small.

So RAIDz3. What does it buy you?

I can see it being useful if you have a deployment that is difficult to access and thus a disk replacement may not be immediate, or if you have a reason to believe that your drives are particularly unreliable and you have an elevated risk of failing two at the same time, but these seem like corner cases at best.

This calculation doesn't change as disk sizes go up, like it did with RAID5/RAIDz1.

The risk of having a URE strike both the main block and the parity data for that block at the same time seems infinitesimal.

I mean, it's a play on risk. We shouldn't be talking about "losing all your data" if you have a hard drive/RAID failure. If that were the case, I can understand going a step higher.

We SHOULD be talking about "the inconvenience of having to restore from backup" if you have a drive/raid failure.

To safeguard against this, getting to the 99.99% level of protection against drive data loss should be sufficient. We shouldn't have to take it to the 99.9999% level, and all the costs that come with that.

Because, after all, RAID IS NOT BACKUP. Having a full separate copy (also redundant) preferably in a separate location is backup.

RAID protects against drive hardware failure. It will never (no matter how much parity you throw at it) protect against accidental deletions, file system errors, flipped bits in failing non-ECC RAM, corruption due to power loss, corruption due to hard crashes, etc. etc.

Based on that, I'd be curious about what Sun saw as the arguments behind the need for RAIDz3, and why you think there might be a future need for RAIDz4?

SirMaster · Mar 12, 2015

Well maybe the paper by the man who implemented RAIDZ3 himself will help to understand why he added it.

http://queue.acm.org/detail.cfm?id=1670144

I'm not sure RAIDZ4 will ever be a thing. There is a point where you outgrow parity RAID in general from a performance perspective. Eventually you will get to the point where your resilvers will be so long and your array will be so large that you will always have a disk resilvering and that is not optimal for performance compared to alternative data storage solutions.

I've seen many organizations switch to mirrors rather than RAIDZ as mirrors are higher performance, easier to expand, and vastly faster to rebuild.
http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/

Aesma · Mar 12, 2015

19-drives RAIDZ3 vdevs are what I use. Because I can, and also because it costs only a few more drives. A second backup, which is what I'd like to have, would cost me a lot more.

SirMaster · Mar 12, 2015

But adding more parity drives to a single pool only increases redundancy. It provides no additional protection to a whole slew of other possible failures that could cause pool or data loss.

Zarathustra[H] · Mar 12, 2015

SirMaster said:
I've seen many organizations switch to mirrors rather than RAIDZ as mirrors are higher performance, easier to expand, and vastly faster to rebuild.
http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/

IMHO it depends on the performance metric and what you are comparing to.

For sequential reads and writes, a mirror will never write faster than the slowest single drive, and will never read faster than two drives (but rarely gets that fast).

I can go to my pool with 12 drives in two RAIDz2's right now and pull close to a GB/s under some circumstances.

I'm gathering the performance that is better in mirrors is for IOPS/seek time type metrics? I think for that I'd rather use SSD's anyway

Zarathustra[H] · Mar 12, 2015

SirMaster said:
Well maybe the paper by the man who implemented RAIDZ3 himself will help to understand why he added it.

http://queue.acm.org/detail.cfm?id=1670144

Thank you.

Will definitely read that when I have a chance.

Zarathustra[H] · Mar 12, 2015

Aesma said:
19-drives RAIDZ3 vdevs are what I use. Because I can, and also because it costs only a few more drives. A second backup, which is what I'd like to have, would cost me a lot more.

And by doing so you go explicitly against the ZFS recommendations, which generally state not to use more than 10 drives in any single vdev.

It might work fine, but I see violating the projects recommendations as a unacceptable risk when dealing with my data

JoeComp · Mar 12, 2015

SirMaster said:
I'm not sure RAIDZ4 will ever be a thing. There is a point where you outgrow parity RAID in general from a performance perspective. Eventually you will get to the point where your resilvers will be so long and your array will be so large that you will always have a disk resilvering and that is not optimal for performance compared to alternative data storage solutions.

That is not true from a general perspective, especially when you take hardware advances into account.

Reasons why a ZFS resilver might be slow:

1) Poor ZFS implementation of 3- or 4-parity RAIDs
I have not seen any benchmarks of degraded read performance for ZFS RAIDZ3, but I'm pretty sure the implementation is far from optimized, especially for modern CPUs with special instruction sets like SSSE3 or AVX2. Why am I so sure? More below.

2) Insufficient I/O bandwidth
If you stick an HBA on a PCIe v1.0 slot (or worse) and run 24 drives or whatever off the HBA (with an expander), you will certainly have an I/O bottleneck. But with PCIe3 HBAs (and or multiple slots and multiple HBAs) you can eliminate any I/O bottlenecks.

3) Outdated hardware
CPU and I/O bus improvement has been significant since RAIDZ3 was first implemented. Old benchmarks on old hardware are not representative of what is possible today, let alone what may be possible with future hardware improvements.

4) Heavy access to the array from other programs during the resilver
While this can be a significant factor, I don't think it is relevant to most usage cases. If you expect heavy and sustained access to the array that cannot be curtailed during a resilver, then you likely are not contemplating a multi-parity RAID in the first place.

Note that SnapRAID, although it is snapshot RAID and not realtime RAID like ZFS, still has to make the same sorts of computations to recover data and still needs sufficient I/O bandwidth to read the drives to compute the recovery data. And SnapRAID has implemented 1- through 6-parity. How fast is it?

http://snapraid.sourceforge.net/faq.html#speed

With SSSE3, or especially AVX2, it is quite fast. Note that the last table is the one to look at for data recovery (aka resilvering) speed.

Code:

RAID functions used for recovering with 'fix':
            best    int8   ssse3    avx2
    rec1    avx2    1158    2916    3019
    rec2    avx2     517    1220    1633
    rec3    avx2     110     611     951
    rec4    avx2      71     395     631
    rec5    avx2      49     264     421
    rec6    avx2      36     194     316

That table shows the throughput, in MiB/s, that SnapRAID can achieve when computing the recovery data for various numbers of failed drives. For example, if your CPU has AVX2, and you have 2 failed drives, then SnapRAID can compute the missing data at a rate of 1633 MiB/s.

Now, you might hastily conclude that this shows that RAIDZ4 would be slow to resilver, even with AVX2, since you see the 631 MiB/s number and compare it to the aggregrate read bandwidth of 20 HDDs that might be left in a 24-drive RAIDZ4 with 4 drive failures. While it is true that 631 MiB/s is likely to be less than the aggregrate bandwidth of 20 HDDs, you also need to recognize that it is very unlikely that you will have 4 complete drive failures, even with 24 drives in a vdev. More likely is that you might have, say 2 complete drive failures, and then the occasional 1 or 2 read errors during rebuild. So most of the time your recovered data would be able to be computed at 1633 MiB/s, with it only slowing down to 951 or 631 MiB/s briefly when you have a stripe with a URE or two. And 1633 MiB/s is close to the aggregrate bandwidth of 20 HDDs. So the resilver times could be reasonably fast. Assuming a decent ZFS implementation, modern hardware, and no I/O bandwidth bottlenecks. I don't know about the first, but the latter two are certainly feasible for anyone contemplating a RAID with that many HDDs.

SirMaster · Mar 12, 2015

Zarathustra[H];1041480489 said:
IMHO it depends on the performance metric and what you are comparing to.

For sequential reads and writes, a mirror will never write faster than the slowest single drive, and will never read faster than two drives (but rarely gets that fast).

I can go to my pool with 12 drives in two RAIDz2's right now and pull close to a GB/s under some circumstances.

I'm gathering the performance that is better in mirrors is for IOPS/seek time type metrics? I think for that I'd rather use SSD's anyway

Well I was more referring to RAID 10, stripes of mirrors. I know several companies where friends/colleagues work who have migrated away from parity RAID in favor of striped mirrors due to the length of the rebuild times being far too long and the degraded performance not being good enough during.

JoeComp said:
That is not true from a general perspective, especially when you take hardware advances into account.

I'm referring to real companies that I know who use ZFS and their resilver times are measured in days and even weeks.

This is not due to any of the factors you have listed as their hardware is all up to par and configured correctly. It is due solely to the fact that the system is constantly under a moderate to heavy load by it's users. This is a business and these are production systems. Resilver is a lower disk priority because it cannot be allowed to affect production performance.

I don't really see how SnapRAID is relevant to this discussion point as it's not something that could be used in any production system running high performance applications to many users.

See this article for some examples about performance of which I am referring to as it explains it in much more detail.

http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/

What solution would you say to those users who are experiencing this kind of situation? Surely RAIDZ4 is not the answer. RAID 10 has already been proven can help immensely. Not only will the resilvers be orders of magnitude quicker, but they only rely on slowing down a single other disk during the operation, so the rest of the spindles can keep providing the performance they need to be.

It's a problem exacerbated by the very limited IOPS of HDDS and the fact that HDDs are also increasing in capacity more quickly than they are increasing in throughput and IOPS.

JoeComp · Mar 12, 2015

SirMaster said:
I'm referring to real companies that I know who use ZFS and their resilver times are measured in days and even weeks.

No, you were not. You were referring to RAIDZ4, which does not exist, and you were talking about whether RAIDZ4 will be useful in the future. So, no changing your story now.

And heavy access during resilver is not really relevant anyway, since with that kind of access pattern, most people are not considering multi-parity RAID anyway. And besides, if you heavily access the other drive in a mirror all the time, that is going to drastically slow down copying the data to a second drive anyway. So you are really talking only about a specific case where the heavy access is on other drives in the RAID 10, but not on the drive in the mirror that failed. While I agree that RAID 10 would rebuild faster in that case, that is hardly a good argument for the general case of 4+ parity RAID not being useful.

You make a bogus conclusion that just because 4-parity RAID is not useful for some usage patterns, that there are no situations where 4-parity RAID is useful. That is fallacious thinking. You made the extraordinary claim that 4-parity RAID would never be useful because of performance reasons. To support that extraordinary claim, it is not sufficient to point out some usage cases that 4-parity RAID would not work. You need to show that there are NO usage cases at all where 4-parity RAID would be useful.

Lastly, I already explained why the SnapRAID benchmarks are relevant. As I said, the computations are the same, and the I/O bandwidth required is the same.

Teenyman45 · Mar 12, 2015

How about just using 2 WD 4TB RE drives in a RAID 1 configuration? I figured that would be good enough for my personal needs.

SirMaster · Mar 12, 2015

JoeComp said:
No, you were not. You were referring to RAIDZ4, which does not exist, and you were talking about whether RAIDZ4 will be useful in the future. So, no changing your story now.

And heavy access during resilver is not really relevant anyway, since with that kind of access pattern, most people are not considering multi-parity RAID anyway. And besides, if you heavily access the other drive in a mirror all the time, that is going to drastically slow down copying the data to a second drive anyway. So you are really talking only about a specific case where the heavy access is on other drives in the RAID 10, but not on the drive in the mirror that failed. While I agree that RAID 10 would avenge faster in that case, that is hardly a good argument for the general case of 4+ parity RAID not being useful.

Lastly, I already explained why the SnapRAID benchmarks are relevant. As I said, the computations are the same, and the I/O bandwidth required is the same.

But the problems seen by these companies have only gotten worse from RAIDZ1 to RAIDZ2 to RAIDZ3.

The point of increased Z-level is so that you can maintain a larger array and larger disks. So you can sustain longer resilvers and have a lower chance of needing to restore. (I'm not making this claim, the developer of RAIDZ3 from SUN makes this claim).

This will only work and scale up to a certain point. The point where disk failures are happening more often than resilvers can finish. This is a real problem and there some papers on the topic that I can dig up.

As the arrays and disks have gotten larger and larger, resilver times have also gotten longer. RAIDZ3 already succeedes in facilitating arrays that takes weeks to resilver and on systems of this size they also see disk failures nearly as frequently as the disks finish resilvering and sometimes more often (which is the reason for RAIDZ3). This is the threshold of parity RAID.

So tell me how RAIDZ4 would improve this or what point it would have to systems this large and the larger systems that it would attempt to facilitate.

As disks have gotten larger they also have not gotten more reliable which is why we have been creeping toward this threshold. Mirrors also have a threshold (where rebuild will be longer than time between failure), but we are farther away from that whereas for parity RAID, RAIDZ3 already lets us reach it.

Zarathustra[H] · Mar 12, 2015

JoeComp said:
No, you were not. You were referring to RAIDZ4, which does not exist, and you were talking about whether RAIDZ4 will be useful in the future. So, no changing your story now.

And heavy access during resilver is not really relevant anyway, since with that kind of access pattern, most people are not considering multi-parity RAID anyway. And besides, if you heavily access the other drive in a mirror all the time, that is going to drastically slow down copying the data to a second drive anyway. So you are really talking only about a specific case where the heavy access is on other drives in the RAID 10, but not on the drive in the mirror that failed. While I agree that RAID 10 would rebuild faster in that case, that is hardly a good argument for the general case of 4+ parity RAID not being useful.

You make a bogus conclusion that just because 4-parity RAID is not useful for some usage patterns, that there are no situations where 4-parity RAID is useful. That is fallacious thinking. You made the extraordinary claim that 4-parity RAID would never be useful because of performance reasons. To support that extraordinary claim, it is not sufficient to point out some usage cases that 4-parity RAID would not work. You need to show that there are NO usage cases at all where 4-parity RAID would be useful.

Lastly, I already explained why the SnapRAID benchmarks are relevant. As I said, the computations are the same, and the I/O bandwidth required is the same.

From my perspective there is nothing necessarily wrong with RAID4 if created. Resilver times could be made manageable for most implementations.

I just question what benefit having that level of parity truly adds.

JoeComp · Mar 12, 2015

SirMaster said:
So tell me how RAIDZ4 would improve this or what point it would have to systems this large and the larger systems that it would attempt to facilitate.

I already explained how 4-parity RAID could be useful and why your arguments are fallacious.

SirMaster · Mar 12, 2015

JoeComp said:
I already explained how 4-parity RAID could be useful and why your arguments are fallacious.

So backup systems are the only use-case for RAIDZ4? (since they don't always need to have a load)

The vast majority of real systems would have loads that would cause the resilvers to crawl and then we have the problems that I outlined.

Note that this argument is not exclusively my own, and I have had and also read discussions from ZFS developers themselves about issues such as this and why they have said themselves they would probably not take the time to implement RAIDZ4. Also since none of their customers were asking for such a feature because it wouldn't be useful to them.

They told me it was not impossible, but unlikely that it would be implemented.

That's really the primary reason that I mentioned I didn't think it would be implemented, because the developers told me so, but it did make sense to me and I thought it was relevent to bring up in discussion since I saw it mentioned.

JoeComp · Mar 12, 2015

Zarathustra[H];1041480618 said:
From my perspective there is nothing necessarily wrong with RAID4 if created. Resilver times could be made manageable for most implementations.

I just question what benefit having that level of parity truly adds.

Space usage is more efficient. For example, a RAID 60 with three 8-drive RAID 6 arrays is probably roughly equivalent in fault-tolerance to a 4-parity 24-drive array (still slightly inferior, though). But the 4-parity array has 83.3% of the drive space for data, while the RAID 60 only has 75%.

Now, you could argue that a RAID 60 with two 12-drive RAID 6 arrays matches the 4-parity 24-drive array for storage efficiency, while fault tolerance is only slightly lower. While that may be true, I'd still rather use 4-parity RAID if the performance is good enough for my usage. Especially since it has the advantage that if you have two complete drive failures, the RAID 60 solution could have some data loss if a URE is encountered during the rebuild (if it is on the same sub-array that two drives failed in), while the 4-parity solution will still be able to recover the stripe.

redrage · Mar 12, 2015

very good link. Thanks!

Darth Millennial · Mar 12, 2015

EMC still recommends RAID 5. I trust them far more than any journalist or forum scrub.

Using RAID-5 Means The Sky Is Falling

[H] News

Supreme [H]ardness

2[H]4U

[H]ard|Gawd

Gawd

Supreme [H]ardness

[H]ard|Gawd

[H]ard|Gawd

Supreme [H]ardness

Extremely [H]

2[H]4U

[H]F Junkie

Limp Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

Extremely [H]

Limp Gawd

2[H]4U

Extremely [H]

[H]ard|Gawd

Extremely [H]

2[H]4U

[H]ard|Gawd

2[H]4U

Extremely [H]

Extremely [H]

Extremely [H]

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

2[H]4U

Extremely [H]

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

Limp Gawd

Gawd