IBM Building 120 Petabyte Drive

Mickey21 · Aug 28, 2011

nutzo said:
1st it's not a single disk.
2nd that's the reason for more complex array solutions.

For example, if you build an simple raid 5 array of 12 2TB drives (22TB) you'd likely have a rebuild failer if you had to replaced one of the disk. The reason has to do with the erorr rate on the drives being lower that the total number of blocks in the array. Using Raid 6 would reduce the size to 20TB, but would reduce the likelyhood of a rebuild failer to almost zero.

Incorrect. The failure rate of RAID rebuilds is theoretically greater than 2TB and is also not equated on the same limit as hard drive manufacturers either because the 2.2TB ceiling for RAID rebuilds is in binary 2.2TB and not 2TB decimally. It isn't till 2.5TB hard drives that this becomes an issue in theory. Again in theory, because the theory states that there is probable odds that you will encounter a fault in the rebuild when dealing with that much data, but it doesnt take into context code written to avoid such problems in a rebuild or ways data is stored to prevent this failure.

Also incorrect is that RAID 6 in theory would make this problem go away or make the rebuild failure rate be almost zero. Because the failure rate is suggested to the rebuild size of the drive. In theory if this problem occurred with 100% likelihood with drives larger than 2.2TB, than having to rebuild 2 drives would only mean your RAID would theoretically fail once two hard drives failed instead of one. It would have no affect on the ability to perform the rebuild itself.

In reality this doesnt happen much at all and it doesnt constitute a complete RAID failure, it only constitutes the need to do single drive rebuilds which would take much longer, but still be possible. IE you take each drive and rebuild the data one drive at a time assuming you have enough valid drives to reconstitute the data. It also doesnt keep you from sending the hard drive off and the possibility of a 100% data retrieval.

Also, in reality, large enterprise storage systems like those of IBM, EMC, NetApp, HP, etc dont work like the RAID most people would use on their PC's. The data is normally at the very least double striped. Why is something like double striping important? Well they already know limitations of RAID rebuilds and have long since devised ways around this problem. The drives arent written as single member drives, they are written as multiple drives. Imagine if you had 2 hard drives and wanted to make a RAID 5. Now most people know you cant, because you need as little as 4 to create the RAID drive, but in a large data storage system you can. Why? Because the drive can be multiple stripes. Think of it like making the drives get cut up into 4 or more drives, then striped into a RAID 5 volume. This is just a basic explanation to show how it is done but in practice no one would ever do this on a data storage system like enterprise storage systems.

Also, data on these storage systems are almost never saved in one place, no one in their right mind would have production data on a storage system that didnt have a backup. In reality, the same site normally has a redundant array that is EXACTLY the same as the primary array and sync's back and forth. If a drive failed and for some reason couldnt rebuild, it could copy the drive from the secondary array sitting possibly right NEXT to it on a dedicated 10Gb fibre multipathway connection and rebuild the drive as fast as the drive could possibly write. But what about access to the data? Wouldnt it be down while the drive rebuilds? Most of these enterprise data storage systems have GB's and even TB of "RAM". I have worked on storage systems that had over 16TB of RAM and there are larger ones. It can hold all the data of several drives if it wanted to, not that it does, it normally writes the data immediately to a "Spare" drive while the drive rebuilds, then resumes access to the replaced faulty drive once ready. Also remember that the fault theory talks about RAID REBUILDS, it doesnt mean you cant copy a hard drive from a known good data source and have an issue.

And drives almost never fail completely. The tolerances for the drives are so high that usually a replaced drive on any of their systems still works fine. The data storage system has determined the drive will with great probability fail soon or does not meet speed requirements anymore. Drives replaced are very rarely completely and unreadable.

Basically I just want to say, that your enterprise data storage system from the big guys have long since figured out how to do RAID properly and they continue to innovate better and faster ways to retrieve that data. Think of any company you can think wouldnt have large storage systems and they likely do. Even Walmart has large data storage systems in multi-redundant configurations because if that data ever went goodbye, it could mean the end of business as they know it.

For the home user, protection from this will likely require similar measures in the future or new software/hardware that can work around this potential problem. Alternative it might even take something like RAID50 to get around this.

Mickey21 · Aug 28, 2011

For reference, I have had to perform two separate occasion rebuilds over the last 1.5 years on my now 48TB RAID 5 volume of 2TB drives. No issues so far. Part of the secret though has been to buy identical drives from the same lot. I purchased 30 hard drives all at the same time and have a couple set aside for my 3 year life cycle replacement.

ZenDragon · Aug 28, 2011

Mickey21 said:
For reference, I have had to perform two separate occasion rebuilds over the last 1.5 years on my now 48TB RAID 5 volume of 2TB drives. No issues so far. Part of the secret though has been to buy identical drives from the same lot. I purchased 30 hard drives all at the same time and have a couple set aside for my 3 year life cycle replacement.

Just make sure you test your "spares" before you set them aside, not when you're trying to use it to rebuild another failed disk! Learned that one the hard way! haha

choppedliver · Aug 29, 2011

what a dumb fuck. I just asked a question. Sounds like you are an expert on butthurt.

nutzo · Aug 29, 2011

Mickey21 said:
Incorrect. The failure rate of RAID rebuilds is theoretically greater than 2TB and is also not equated on the same limit as hard drive manufacturers either because the 2.2TB ceiling for RAID rebuilds is in binary 2.2TB and not 2TB decimally. ... Alternative it might even take something like RAID50 to get around this.

This has nothing to do with some theoretical 2.2TB limit, but with the actual error rates of the drives.

There have been several in-depth articles on the web the last few years concerning the problems with large raids using SATA drives This wasnt a problem with smaller drives, as they had the same error rate, and it wasnt possible to build a large enough raid to run into this problem.
Specifically the problem is the "Non-recoverable read errors per bits read", which refers to how often you can expect to have a hard read error returned from the drive.

On consumer level SATA drives, it's usually 1x10^14
On an enterprise level SATA drive, it's usually 1x10^15
Some newer enterprise drives (usually smaller SAS drives) are now rated 1x10^16

Generally this is not a problem for a standard, non-raid drive (like on a desktop), as the OS will just try the read again. However, many raid systems will NOT retry the read, but instead will report the error to the raid controller/software. If this happens during a rebuild, the raid controller marks the entire raid as failed, resulting in a loss of data.

So, how many bits on a 2TB drive?
3,907,029,168 sectors x 512 bytes/sector*8bit = 16,003,191,472,128 bit, or 1.6x10^13

Now, assuming your drives are still at 100% quality on a consumer level 2TB drive, you should expect a read error every 6.25 times the entire drive is read.
So, what happens on a 6 drive Raid 5 when a disk fails? You have to read .8x10^14 bits for rebuild the drive, from drives that have an average error rate of 1x10^14th
If the raid is in use during the rebuild, whats the possibility that the number of reads will be even higher? Thats a little too close to a guaranteed fail for me.
How about a 20TB raid 5? Good luck, because the numbers say you should average 2 errors during the rebuild.

As for the Enterprise drives, you have a much better chance, since the error rate is 10 times better, but there is still a small chance of a failed rebuild.

If a drive has a read error during a rebuild of a single failed drive in a Raid 6, the error is corrected and the rebuild continues. You would have to have a 2nd disk fail, or have 2 read error at the same time on 2 different drives for the rebuild to fail. Very unlikely. The same holds true of more elaborate raids such as raid 50, 60 etc.

There is also the performance penalty when a raid 5 fails and is running in recovery mode. Raid 6 gets around this problem too.

As for Backup, of course I have a backup, but how long will it take to restore 10TB of data? My LTO-3 Tape drive has a transfer rate 270GB/hour. I dont want to have to explain why a server is down for 2 days while I restore all the files because I didn't setup the server with the proper raid settings.

IBM Building 120 Petabyte Drive

Mickey21

Limp Gawd

Mickey21

Limp Gawd

ZenDragon

[H]ard|Gawd

choppedliver

Limp Gawd

nutzo

Supreme [H]ardness