Check Data Integrity of Backups?

Silhouette · Oct 18, 2011

danswartz said:
I don't recall you mentioning that originally. If you did, my apologies...

No problem, it was in my first post in this thread:

Silhouette said:
ZFS is good at detecting several types of problems, but I like to keep separate checksums (.sfv, .par or similar). ZFS won't catch all instances of data corruption, e.g., when you're transferring data between systems.

brutalizer · Oct 19, 2011

Silhouette said:
Just a small observation from my systems: I regularly copy large amounts of data (10-30 TB) between computers and I very, very rarely have any problems with data corruption. I use a variety of HDDs, controllers and other components.

Did you read this thread at all?

The creator of this thread says he copied lot of data and everything was fine. Then some year later he tried to open some files, and only then discovered data corruption - this is SILENT corruption. You get no notification that data was corrupted, because the hardware did not notice it.

To do just a copy of all your data is no problem. The copy might be corrupted though. Have you checked every bit, that it has not been randomly flipped? Thus: you need to do a checksum of every file. MD5 is a good checksum. SHA-256 is also a good checksum (which ZFS supports)

And after doing checksum, you need to catch bit rot that might show up sometimes later. Thus you need to control the checksums are correct, regularly, maybe every week.

Trepidati0n said:
Be very careful putting PhD's and researchers on pedestals. Remember, they are academics by nature and thus tend to ignore (forget) reality in order to fit a hypothesis. It isn't intentional, it is the nature of the their efforts.

Fuck, we just had some of the best and brightest in the world say Einstein was wrong and then forgot to include, what should be to them, a rudimentary error source. For the past decade we had people believing that vaccines were evil due to "bad research" that was propped up as fact.

More simply, math can always be done to meet an expectation where as time always reveals the truth.

Yes, I am careful about PhDs.

But my point is this: I am not religious about ZFS. Religion to me (hope I am not offending anyone here) is a blind belief. It is based on nothing, or in best case anecdotical evidence (some Saint did a miracle one single time). The opposite to blind belief, is hard science where you can verify experiments again and again. And every time you get the same result.

Now, there are lot of research on data corruption and ZFS. Thus, I base my trust on ZFS because of science and research. Not on blind belief "I am an avid Oracle supporter". I am not blindingly trusting ZFS, with no reason. There is a reason I trust ZFS - research. If someone bases his beliefs on Research - is he religious then? By the very definition: No. He is the opposite: scientific. Thus, I do not really agree that ZFS users are "religious".

Would you prefer to trust on product XXX because of blind belief, or because of hard science?

Regarding your examples with Einstein and vaccines, sure you are right on that something that scientists believe today, might not be valid tomorrow. But I draw a very distinct line between sciences. Mathematics is the only deductive science, all other sciences are inductive: you observe something many times and draw a conclusion. For instance, all Swanes I have ever seen, are white birds. Thus, here is a truth: "all swanes are white". But there is a lake where the swanes are black! So induction is not really great in my view because the conclusion might not be correct.

OTOH, math uses logical deductions, things that are always TRUE. For instance, physics 2000 years ago, is not valid today. Not much in any science is valid today. Maybe in 2000 years, Einstein will be totally wrong? This applies to all sciences: physics, biology, psychiatry, chemistry, economics, etc.

But mathematics 2000 years old, is valid even today. Pythagoreas theorem is valid even today, and it will be valid 2000 years from now. It will be valid for eternity. Because it has been PROVED mathematically. If you prove something mathematically, then it is forever true. It is a true fact.

So, I am careful putting researchers on a piedestal, because I know that 2000 years from now, we might have changed our knowledge. But when you prove something deductively in math - it is forever true. And math, I put on piedestal.

Now, certain parts of Computer Science belong to math, and it is safe to trust them researchers. The research on ZFS, was not 100% deductively proven, so it might be proven later that ZFS is not safe. But initial research gives good support for ZFS.

I am not religious. I base my beliefs on hard science and research. That is why I am excited about ZFS, and do heavy ZFS masturbation every time I use my Solaris PC.

Rectal Prolapse said:
brutalizer: I already know all that - no need to _lecture_ me when I already came to the same conclusion in my earlier post.

Yes, I know, that you know. But others here dont know as much as you. It was more, for them.

tormentum · Oct 19, 2011

Trepidati0n said:
One thing I have been disapointed in, is that files do not have MD5 checksums built in by default. It is so eff'ing eay to do.

Yes and no. File's in and of themselves simply contain data. Some applications do have the ability to detect corruption in their data streams (ever come across a corrupted ZIP file?), however the vast majority do not. There are different types of checksum/verification that can apply here.

A simple file level checksum hashes the file and creates a string of characters that form a representation of the data within the file. Some common examples are MS5 and SHA1. This is all they do. At an application level this is next to useless, except for being able to tell the application that perhaps the file is corrupt.

File formats such as ZIP, 7z and CAB do internal checkumming on their data streams to verify sections or blocks within a file itself. This can be more useful as the application can determine which part of the file is corrupt, and potentially what data was within that section of the file. The problem is EVERY file format is different, and is used for different things (ZIP for archive/compression, DOC for word document, EXE for executables, etc). No one method of across the board checksumming would be necessarily useful except for being able to verify purely at a file level that the file is as it should be.

Step in the file system. Without making this a ZFS "masterbatorium" of a post, let me quote Jeff Bonwick on the subject of filesystems:

The job of any filesystem boils down to this: when asked to read a block, it should return the same data that was previously written to that block. If it can't do that -- because the disk is offline or the data has been damaged or tampered with -- it should detect this and return an error.

Incredibly, most filesystems fail this test. They depend on the underlying hardware to detect and report errors. If a disk simply returns bad data, the average filesystem won't even detect it.

Source: http://blogs.oracle.com/bonwick/en_US/entry/zfs_end_to_end_data

And this is true, most file systems do fail this; NTFS, EXT3, ReiserFS, etc. Now Jeff was the team leader of the ZFS development team, so he's biased toward ZFS of course, however his point is valid, and yes ZFS does deal with this quite well. However it would be great to see more universal adoption of this concept across the board. Imagine NTFS being able to detect silent corruption and actually notifying you of it? Don't get me wrong NTFS is very good, but there's room for improvement!

At the end of the day, it should not be the application's responsibility to verify whether what it wrote to disk is the same as what it is reading back. Good practice, but not a requirement!

Aesma · Oct 19, 2011

I use syncbackpro. It offers lots of options, and data integrity checking is offered in several flavors, basically any kind of hash method you can think of (you can add your own). For my use, it's useless, because I backup large quantities of data, so hashing everything would take forever, each time. I only use comparisons based on size and date.

Silhouette · Oct 19, 2011

brutalizer said:
Did you read this thread at all?

I have read this thread. On the other hand, you have clearly not read or understood my posts. I keep checksums of all my data, and I have more data stored online at home than the sum of the top 2-3 in the (perhaps outdated?) most total storage list in the 10+TB thread .

I think zfs is a step in the right direction, but it does not solve all problems related to data corruption. Online and offline backups with separate checksums is still necessary, especially if your job depends on not losing any data (e.g., people working with raw film material for movie productions).

ChrisBenn · Oct 19, 2011

Silhouette said:
I think zfs is a step in the right direction, but it does not solve all problems related to data corruption. Online and offline backups with separate checksums is still necessary, especially if your job depends on not losing any data (e.g., people working with raw film material for movie productions).

In that case wouldn't it be better to store your data online & offline against a ZFS filesystem? Assuming the online system has parity data you would also get real-time correction of the data - so accessibility of it wouldn't be compromised either.

Silhouette · Oct 19, 2011

ChrisBenn said:
In that case wouldn't it be better to store your data online & offline against a ZFS filesystem? Assuming the online system has parity data you would also get real-time correction of the data - so accessibility of it wouldn't be compromised either.

Yes, there are several good reasons for using zfs. I have used zfs on a couple of my systems for a long time now, but it has not been without practical problems either. It's getting better, but IMO it's still not quite where I want it to be.

My most problem free servers use linux/xfs, and as I stated I see very little bit rot.

IMO there are several more pressing issues, including controller/hdd failure, incompatibilities causing dropped drives, and various types of corruption that is not detected by zfs. Feed zfs garbage in and you get garbage out.

There is no substitute for backups.

ChrisBenn · Oct 19, 2011

Silhouette said:
Yes, there are several good reasons for using zfs. I have used zfs on a couple of my systems for a long time now, but it has not been without practical problems either. It's getting better, but IMO it's still not quite where I want it to be.

My most problem free servers use linux/xfs, and as I stated I see very little bit rot.

IMO there are several more pressing issues, including controller/hdd failure, incompatibilities causing dropped drives, and various types of corruption that is not detected by zfs. Feed zfs garbage in and you get garbage out.

There is no substitute for backups.

I don't think anyone is advocating not using a separate backup - that protects from a different set of problems (physical destruction, data corruption via a valid source (someone issued a delete command, etc.)).

But if, as you say, your job depends on maintaining the integrity of your original data would "very little bit rot" still be too much?

Of the issues you list
*controller/hdd failure - I don't think this has anything to do with ZFS - unless you are saying that either ZFS based systems are causing the controllers & hard drives to fail faster, or that ZFS is worse at recovering from a failure (than XFS in this case)

*incompatabilities causing dropped drives - I would need more information on this - is this with regard to hardware that works with XFS but not with ZFS? (or more aptly a linux vs. zfs hardware support?) - that is one big caveat of the solaris world - it's harder to swap to you, you need to really vet your hardware ahead of time. Are you actually having issues with hardware on the Solaris HCL?

*various types of corruption not detected by zfs - can you give me examples of what you are talking about here?

danswartz · Oct 19, 2011

The controller issue has nothing to do with zfs, rather ths host OS. Usually that would be some flavor of OpenSolaris.

Silhouette · Oct 19, 2011

ChrisBenn said:
But if, as you say, your job depends on maintaining the integrity of your original data would "very little bit rot" still be too much?

Of the issues you list

The things I listed were not issues caused by zfs, rather issues that I consider to be greater problems than bit rot. Major issues directly caused by zfs that I have experienced are a couple bugs causing pools to become inaccessible, along wth some other minor bugs that sometimes appear.

I expect to see around 0-1 checksum errors (usually sector sized) when I verify 200+ TB of data. That is a minor inconvenience. When I consider zfs to be more mature I expect to move all my systems over to that file system. But recovering a file or two from backups is less painful than recovering 40-60 TB.

As for problems not detected by zfs, a lot can happen to your data before/during the inital copy to your file system. A lot can also happen on the way from one server to another, and zfs won't help you in either of those cases. Every time I archive new data I compute checksums, verify the data is valid according to incoming crc, then I recheck the checksums.

tormentum · Oct 19, 2011

Silhouette said:
Every time I archive new data I compute checksums, verify the data is valid according to incoming crc, then I recheck the checksums.

A very wise practice. md5sum or using "rsync --checksum" is very useful in this regard.

danswartz · Oct 19, 2011

Silhouette, when you say zfs won't help you with corruption of data transmitted to another system, how are you copying the data? If you use 'zfs send', the stream is checksummed and verified at the other end.

Silhouette · Oct 19, 2011

danswartz said:
Silhouette, when you say zfs won't help you with corruption of data transmitted to another system, how are you copying the data? If you use 'zfs send', the stream is checksummed and verified at the other end.

I was very unimpressed by zfs send performance the last time I tried to use it to transfer data over the network. Also, it had some limitations that made it not very suitable for my situation.

That was a while ago though, maybe it's working better now?

danswartz · Oct 19, 2011

Were you using ssh? If so, that would kill performance pretty badly, IIRC. Also, from your earlier comment, does that mean you were not using zfs send? If so, how were you transferring?

Silhouette · Oct 19, 2011

No, ssh/sftp is pretty much ruled out when you care about performance. I tried a couple of things, I think netcat worked the best.

Keep in mind that I have 10 GbE in most of my servers, and my definition of slow is not necessarily considered very slow by others. But 1 Gb is slow when you have a lot of data.

I mostly use ftp for transfers.

Silhouette · Oct 19, 2011

Anyway, this thread got derailed.

My advice: Make sure that you have several copies of your data, preferably as independent as possible. Important stuff should exist somewhere else away from home, or at the very least in some form of offline backup. Use checksums to verify your data, keep backups uptodate and reverify/copy to new media at regular intervals (but not weekly).

tormentum · Oct 19, 2011

Silhouette said:
Anyway, this thread got derailed.

My advice: Make sure that you have several copies of your data, preferably as independent as possible. Important stuff should exist somewhere else away from home, or at the very least in some form of offline backup. Use checksums to verify your data, keep backups uptodate and reverify/copy to new media at regular intervals (but not weekly).

But derailing is fun

Back on topic: speaking of checksuming, is there a good windows based util for creating MD5's for reverification? Whilst I use rsync for large copies, my wife could use something more pointy-clickie.

BENN0 · Oct 20, 2011

I use JDigest:
http://code.google.com/p/jdigest/

It integrates nicely in the Windows explorer. Requires Java though.
To calculate checksums for a complete directory structure I also use Total Commander:
http://www.ghisler.com/

tormentum · Oct 20, 2011

Just found this: http://www.exactfile.com/

Looks exactly what I was after.

georgexi · Oct 20, 2011

I thought I read an article that the new WHS2011 was going to protect against bit-rot (at an additional cost of 12% disk space) but now I can't find any information about this. I guess MS dropped this feature.

Do RAID setups protect against bit-rot? Particularly if you do regular data scrubs?

drescherjm · Oct 20, 2011

Do RAID setups protect against bit-rot?

Yes, although the protection against bit-rot breaks down when during a rebuild to replace a dead drive.

danswartz · Oct 20, 2011

I think we're saying the same thing, but a deficiency in raid schemes is this: you suffer bit rot on one or more sectors of a hard drive in the array. You haven't read that sector since it went bad, so it hasn't been corrected. A drive other than the bit-rot one fails. You replace it and start the rebuild. You discover (to your chagrin) that you have data that is pooched. This is the scenario? I know at least some raid controllers can be set up to periodically verify the array - I assume that addresses this concern?

Aesma · Oct 20, 2011

ZFS is no different in that aspect, unless you use the option to have several copies on the same drive.

danswartz · Oct 20, 2011

True, except for zfs, if you are worried about this, you have a job that runs weekly or whatever that does a scrub. Also, I don't recall having said anything about zfs in that regard...

UhClem · Oct 20, 2011

brutalizer said:
As you have noticed, English is not my first language, and I am not very good at it. Some subtleties I miss. Like "QPing Patriot"?

Actually, I didn't notice (and I've always been pretty sharp at that). Your English is (damn close to) excellent. Even moreso when compared to the typical (USAn) college-educated netizen of today. (Had I noticed, I would not have chosen a euphemism rooted in American football. The language issue might also explain you missing the intended nuance of "schizoid role-reversal".)

But it can't excuse your failure to distinguish between [error] detection and correction; nor the blatant misuse of bit-rot. They are specifically germane to the discussion, and you introduced the latter to this thread.

For a moment, I was thinking you might be an A.I. program, but even ELIZA (40+ years ago) had the sense (and humility) to respond "I'm not sure I understand you ..." when appropriate.

Alas, you are just a well-meaning (but occasionally annoying) proselytizer. And, just as with the krishnas in the airports, I shall find it best to ignore you. Sincerely though, good luck!

--UhClem "I think I'll take one of those sacred tablets .... damn, I lost my mescaline."

ChrisBenn · Oct 20, 2011

drescherjm said:
Yes, although the protection against bit-rot breaks down when during a rebuild to replace a dead drive.

I'm not quite convinced that raid-5/6 setup gives you bitrot protection. The key difference is going to be that ZFS *also* has a per block checksum so it knows verifiably if a block is bad or not. If you don't have this checksum you can only tell if two blocks differ, or if I can't read one block.

So with a raid-5 setup we have 2 sets of data (normal and parity). A scrub will just verify I can read all the data, and if I can't it will re-allocte that block somewhere else. If parity and main data don't match it doesn't have any way of telling which is "correct"

Raid-6 could probabalistically guess since it has 3 sets of datay -main and 2 parity - so if 2 match and one is different it is more likely that the two that match are correct. I'm not sure if any raid-6 systems (hardware or software) actually do this though - and I guess there still is a minute chance that you had 2 bit flips.

It seems to me that you need the end to end checksum to identify which set of data is "correct" - without that you can't really have bit rot protection. (Why many SANS use 520byte block sizes - a standard 512byte block + 8 byte checksum).

UhClem · Oct 20, 2011

tormentum said:
Just found this: http://www.exactfile.com/

Looks exactly what I was after.

(I was curious, so I dl'ed the command-line version of that. I'm sure it uses the same hashing engine as the GUI mode.)

It's slow!! Unless you're just verifying a few photos, I suggest you extend your search. For example, the md5sum on an older CygWin is significantly faster (15 sec. vs. 25 sec). And, I know there are one or two seriously hardcore implementations out there, but I haven't undertaken that quest yet.

My focus is on finding the fastest engine; scripting the tires and steering wheel is trivial.

--UhClem

drescherjm · Oct 20, 2011

ChrisBenn said:
I'm not quite convinced that raid-5/6 setup gives you bitrot protection. The key difference is going to be that ZFS *also* has a per block checksum so it knows verifiably if a block is bad or not. If you don't have this checksum you can only tell if two blocks differ, or if I can't read one block.

So with a raid-5 setup we have 2 sets of data (normal and parity). A scrub will just verify I can read all the data, and if I can't it will re-allocte that block somewhere else. If parity and main data don't match it doesn't have any way of telling which is "correct"

Raid-6 could probabalistically guess since it has 3 sets of datay -main and 2 parity - so if 2 match and one is different it is more likely that the two that match are correct. I'm not sure if any raid-6 systems (hardware or software) actually do this though - and I guess there still is a minute chance that you had 2 bit flips.

In the case for raid 5/6 the raid does not have to guess since the drive will report an error on the sectors that have bit-rot. Remember that the drive keeps a checksum on every single sector. These will show up as CRC errors so the raid subsystem will know which one is bad.

danswartz · Oct 20, 2011

What are the odds of a silent error? Supposedly it's non-trivial, but I don't know the odds...

ChrisBenn · Oct 20, 2011

drescherjm said:
In the case for raid 5/6 the raid does not have to guess since the drive will report an error on the sectors that have bit-rot. Remember that the drive keeps a checksum on every single sector. These will show up as CRC errors so the raid subsystem will know which one is bad.

Good point, that will catch some - but there are ways (head positioning issues, etc.) that can still cause bad data with a good checksum (mostly stemming from keeping the checksum metadata in the same "place" as the data) - and because of where the checksumming happens in ZFS it covers more of the system (hard drive crc obviously just covers the physical reading/writing of the hard drive.

But this obviously becomes an incremental thing, and given that you depend on the crc based error detection of the drives I can see how you get some bit rot protection in raid-5/raid-6. It's just not as complete in implementation and effect as ZFS's.

BENN0 · Oct 21, 2011

ChrisBenn said:
Raid-6 could probabalistically guess since it has 3 sets of datay -main and 2 parity - so if 2 match and one is different it is more likely that the two that match are correct. I'm not sure if any raid-6 systems (hardware or software) actually do this though - and I guess there still is a minute chance that you had 2 bit flips.

HP's implementation of RAID6 / RAID60 on its 'Smart Array' controllers has an option called "alternate inconsistency repair policy" that can detect and correct errors. Obviously this is on block/sector level, not file system level.

From the Smart Array configuration utility help:
"RAID 6/60 Alternate Inconsistency Repair Policy
An inconsistency arises when, during a surface analysis scan, the controller detects that the parity information does not match the data present on the drives. Disabling the alternate repair policy directs the controller to always update the parity information, leaving the data untouched. Enabling the alternate repair policy allows the controller to update the data on the drives based on the parity information. This behavior applies to RAID 6 and RAID 60 volumes only."

brutalizer · Oct 21, 2011

georgexi said:
Do RAID setups protect against bit-rot? Particularly if you do regular data scrubs?

The answer is: No.

Hardware raid do not protect against data corruption, such as bit rot. Hw raid does parity calculations but not checksum for data corruption calculations. No normal raid setup protect against data corruption (including bit rot).

Parity calculations are mainly used to reconstruct a crashed disk. Those parity "checksums" are not monitored to see if they have been altered by external factors, such as cosmic radiation, current spikes, etc. The raid assumes the parity XOR checksums are unmodified and safe. There are no thorough checks to see if they have been modified.

Checksums against data corruption is a different design. Then you need double check things such as the parity checksum and then check that the parity checksum has not been compromised by additional checksums, unless you have another solution that incorporates parity checksums with data corruption checksums.

Someting like: "confirm that you understood the order I gave you, private!" and then the private repeats and you can check that the private understood everything correctly. But this repetition requires another procedure, than you just shouting the order. If you are going to verify that each order that is sent, is correct, then you need a new design and new working procedures than the old ones.

ZFS has this new design. Ordinary raid has not. Ordinary raid is old. Filesystems are old, and stems from decades ago, when 20MB disks were large. The basis of all filesystems is the same. The filesystems have only been patched and patched. There was a limit of 4GB, then they patched the filesystem and increased the limit, and then again, and again, etc. Instead of redesigning everything from scratch they patch and patch the sinking ship.

But Sun ZFS designers said they wanted to get rid of old misconceptions that are not valid today anymore and start from scratch, and ZFS was born. ZFS was completely different from other filesystems, for instance Linux kernel developers called ZFS "rampant layering violation" and mocked ZFS for being a big huge monolithic piece of code. Linux instead has several layers, filesystem layer, raid layer, etc. But having only one single layer as ZFS has, gives ZFS knowledge of everything, the entire chain, from start to end: end to end checksums. If you have layered solution, like all old legacy Linux solutions, then you have problem implementing end-to-end checksums. There are different tight layers, which dont know of each other - no end to end can pass unless you heavily modify the different layers.

Another design that ZFS has been mocked for, is its invulnerability to Write-Hole-Error which hw raid suffers from. This design choice leads to ZFS having bad IOPS unless you configure ZFS accordingly (for instance use SSD as caches). For this bad IOPS ZFS has been mocked a lot, but people have not understood it is because ZFS is now invulnerable to Write-Hole-Errors.

So there are a number of weird design solutions that ZFS have, and for that others mock ZFS. But there is a reason ZFS has those quirks. Sun has long Enterprise Storage experience and wanted to counter most common problems. They know what they are doing. I am hesitant to trust a single developer wanting to best ZFS with his new awesome filesystem ZFS wannabe, with no experience of Server Enterprise halls.

Here is an interesting paper about data corruption and they discuss raid:
http://www.raidinc.com/assets/documents/Whitepaper-Silent-Data-Corruption.pdf
They explain some common design flaws that raid have:
misdirected writes, torn writes, data path correction, parity pollution, etc.

For instance, high end storage vendor NetApp and some universities did a study, and examined 1.5 million disks during three years. It turned out that 8.5% of all SATA disks developed Silent Corruption - according to researchers and NetApp in conjunction.

"When you put those statistics together, you find that on average 1 in 90 SATA drives will have silent data corruption that is not caught by the background verification process. So when those data blocks are read, the data returned to the application would be corrupt, but nobody would know. For a RAID-5 (4+P) configuration at 930 GB usable per 1 TB SATA drive, that calculates to an undetected error for every 67 TB of data, or 15 errors for every petabyte of data. If a system were constantly reading all that data at 200 MB/sec, it would encounter the error in less than 100 hours.

Another very large academic study [2] looked at failure characteristics for entire storage systems, not just the disks. In the 39,000 storage systems that were analyzed, the protocol stack (firmware) accounted for between 5% and 10% of storage failures. These are the kinds of failures brought on by faulty code in the firmware.

Thus firmware is not error-free and may introduce silent corruption. Its clear that the introduction of data corruption in the data path is a very real scenario."

This paper above claims to reach a much better data integrity, by using the new standard EDIF. I am not convinced yet, as EDIT is not end-to-end checksum as I have understood it, after skimming the paper. But I do believe EDIF solution is far much safer than ordinary raid. So maybe you should look for an EDIF solution until data corruption protected file systems are becoming mainstream?

More on data corruption
http://www.raidinc.com/assets/documents/Whitepaper-Dirty-Little-Secret-of-the-Data-Center.pdf

I apologize for this heavy ZFS masturbation, but until I see another solution that is as well researched as ZFS, I will continue to use ZFS. When I see a better solution, I will switch of course. But then I need academic research support, I do not blindly trust company marketing. My data is far to precious to me, for me to rely on religion. I prefer to justify my tech switch by relying on hard science instead of blind faith on, say, IBM: "our new filesystem is safer than ZFS!" - where is the research? Unless there are research, I am not going to switch. I do not blindly trust IBM.

brutalizer · Oct 21, 2011

BENN0 said:
HP's implementation of RAID6 / RAID60 on its 'Smart Array' controllers has an option called "alternate inconsistency repair policy" that can detect and correct errors. Obviously this is on block/sector level, not file system level.

From the Smart Array configuration utility help:
"RAID 6/60 Alternate Inconsistency Repair Policy
An inconsistency arises when, during a surface analysis scan, the controller detects that the parity information does not match the data present on the drives. Disabling the alternate repair policy directs the controller to always update the parity information, leaving the data untouched. Enabling the alternate repair policy allows the controller to update the data on the drives based on the parity information. This behavior applies to RAID 6 and RAID 60 volumes only."

Regarding raid-6:

http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
"The paper explains that the best RAID-6 can do is use probabilistic methods to distinguish between single and dual-disk corruption, eg. "there are 95% chances it is single-disk corruption so I am going to fix it assuming that, but there are 5% chances I am going to actually corrupt more data, I just can't tell". I wouldn't want to rely on a RAID controller that takes gambles

"

drescherjm · Oct 21, 2011

Hw raid does parity calculations but not checksum for data corruption calculations.

You are forgetting that every single drive does checksum calculations for every single sector it reads and writes. If the checksum is wrong it will return an error. When a raid5/6 gets an error while reading a stripe it knows to ignore the device for that stripe. I am not saying this is fool proof.

There are certainly a few ways where this breaks down. The easiest case that this breaks down is if your array is in a degraded state such that there is no redundancy (raid 5 with 1 drive out / raid 6 with 2 drives out) then you hit some bit-rot there will be no way to recover the stripe with the bit-rot. If raid 6 has 1 drive degraded and 2 disks in the stripe have bit-rot (wow you are unlucky...) there is no way to recover. Also if as ChrisBenn said the drive returned the wrong sector and that sector had a correct checksum. And also mentioned firmware bugs. I actually have seen 1 case of this with my samsung F4s although I do not use them in raid. I detected the bug in my initial badblocks testing that I now put every drive through.

BTW, I highly recommend this (4 pass badblocks read / write) even on drives that come back from RMA. The last 2 RMAs I sent out I received 1 defective drive back out of 3 both times.

brutalizer · Oct 21, 2011

drescherjm said:
You are forgetting that every single drive does checksum calculations for every single sector it reads and writes. If the checksum is wrong it will return an error. When a raid5/6 gets an error while reading a stripe it knows to ignore the device for that stripe. I am not saying this is fool proof.

I mean that checksum calculations are not for data corruption, and hence are not safe. And as we have seen, there are numerous studies where hardware raid does get data corruption, and also Silent Corruption. For instance, the large study on hw raid by CERN.

ChrisBenn · Oct 21, 2011

brutalizer said:
I mean that checksum calculations are not for data corruption, and hence are not safe. And as we have seen, there are numerous studies where hardware raid does get data corruption, and also Silent Corruption. For instance, the large study on hw raid by CERN.

I think we are all on the same page now - basically

(a) We have a superset of data error type conditions - ZFS will protect from all of these
(b) A raid-5 setup running a scrub can handle a subset of those conditions via relying on drive CRC's. Only a subset though.
(c) A raid-6 setup covers the same subset as raid-5 and can, via the probabilistic methods referenced previously sometimes correct a larger subset of the errors that ZFS handles (if the firmware supports it). As was referenced this is a probability based guess so could, 5% of the time (based on the paper above) make matters worse.

That sound correct?

drescherjm · Oct 21, 2011

That sound correct?

Sounds fine to me.

Rectal Prolapse · Oct 22, 2011

brutalizer, it is great you are sharing your research results with us. And it provoked some interesting discussion by others.

A few weeks ago I was thinking about some of these issues and decided to put a large amount of my data into the "cloud" as a large offline backup with a 3 year expiry (USENET)...

Check Data Integrity of Backups?

Limp Gawd

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

Limp Gawd

Weaksauce

Limp Gawd

Limp Gawd

[H]F Junkie

2[H]4U

[H]ard|Gawd

2[H]4U

Limp Gawd

Limp Gawd

Limp Gawd

[H]F Junkie

2[H]4U

Limp Gawd

Weaksauce

[H]ard|Gawd

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

Limp Gawd

[H]F Junkie

Gawd