Check Data Integrity of Backups?

Luzer · Sep 25, 2011

I just found out that one folder that I copied over manually a few years ago to my server has half viewable images and a few .avis that VLC had some issues playing. My question to you is what do you do to verify that the backup?

I know that I can open each folder and check. The issue is that I have about 190Gb of media, photos and videos, 300Gb of virtual machines, and about 10Gb of documents that I would like to keep safe.

I backup to CrashPlan and it would suck to find out that Im backing up corrupt data. I'm going though all my important folders right now and checking the data.

brutalizer · Sep 25, 2011

This is Silent data corruption, or Bit Rot. This is the reason I use ZFS, which protects against this shit.

I dont know what to do, man. I hope you solve it. If you do, please post the solution here.

drescherjm · Sep 25, 2011

Luzer said:
I just found out that one folder that I copied over manually a few years ago to my server has half viewable images and a few .avi’s that VLC had some issues playing. My question to you is what do you do to verify that the backup?

Use a program to compare a backup versus the online copy. On linux one such utility is rsync. On windows there are several similar utilities. Remember to always keep at least 2 copies of everything you wish to keep on each on different storage devices.

Luzer · Oct 14, 2011

I decided to use SyncToy with a schedule task on my main desktop.

Every Sunday and Wednesday night it runs a job that backups the current folder year (i have my photos organized by year) and documents to my Home Server. From the Home Server CrashPlan checks for any changes and backups to the web every 15 minutes. I figure if I overwrite anything from now the revision will be in CrashPlan. I don't normally edit older pictures/video and if I do I have another task that runs one a month to "echo" the whole folder.

As far as my old data I'll have to go though it little by little to make sure there isn't any bit rot. The good news its that all my other data looks good.

How does ZFS protect against bit rot?

Christopher · Oct 14, 2011

Luzer said:
How does ZFS protect against bit rot?

ZFS uses end to end checksums of all data blocks. Coupled with ECC memory and there is a very low chance of any bit rot creeping in.

On a non-redundant storage pool ZFS can report that the data being read is not the same as what was written. On a redundant pool (zmirror, zraid or better) it can correct the error and return the data back to the state it was in when it was written to storage.

http://en.wikipedia.org/wiki/ZFS

Rowens · Oct 14, 2011

For error detection, ZFS uses multiple checksums; for data protection ZFS uses Copy On Write (which gives you cheap snapshots among other things) and can use multiple copies of data via mirroring or raidz{1,2,3}. The checksums are checked whenever you access data, and you can use the scrub command to have the comp. go through all the data in a given pool. See http://en.wikipedia.org/wiki/ZFS#Data_Integrity for more details.

Personal opinion: setting up a ZFS based file server is non-trivial, but the results are worth the effort if you really want to protect your data.

Rowens · Oct 14, 2011

Heh, great minds. Christopher's point about ECC RAM is a good one - ZFS is designed to deal with unreliable hard disks, not unreliable servers, so don't skimp on the rest of the box. Use a good CPU, ECC RAM if you can, and use as much as you can afford.

Silhouette · Oct 14, 2011

ZFS is good at detecting several types of problems, but I like to keep separate checksums (.sfv, .par or similar). ZFS won't catch all instances of data corruption, e.g., when you're transferring data between systems.

Rectal Prolapse · Oct 14, 2011

ZFS + ECC + PAR2 + offline backup = you will be a hero when the dung hits the spinning blades.

notarat · Oct 14, 2011

Luzer said:
I just found out that one folder that I copied over manually a few years ago to my server has half viewable images and a few .avis that VLC had some issues playing. My question to you is what do you do to verify that the backup?

I know that I can open each folder and check. The issue is that I have about 190Gb of media, photos and videos, 300Gb of virtual machines, and about 10Gb of documents that I would like to keep safe.

I backup to CrashPlan and it would suck to find out that Im backing up corrupt data. I'm going though all my important folders right now and checking the data.

If you already copied over corrupt data, well you're hosed unless you have the original.

However, if you do still have a good copy of the data, use XCOPY to back it up.

Initial Backup command:

xcopy X:\source Y:\destination /s /v /c /f /h /o /y /z >> C:\LOGFILE.txt

/s grab all subfolders too, unless they're empty
/v verifies it copied over properly (slows down the process though)
/c Continue on, even if you encounter an error
/f display full source and destination
/h Copy hidden and system files too (if needed)
/o Copy the ACLs currently in place (if needed)
/y Turns off confirmation if you are overwriting an existing file
/z Network restartable mode (if needed)

LOGFILE.txt is created using the output from the command. It allows you to comb through to see if any errors occurred during the copy process. Delete it if no errors were encountered or use it to locate the problem files and move them manually.

Subsequent Backups command:
xcopy X:\source Y:\destination /d /s /v /c >>c:\Backup2.log

/d (with no date given) will only back up files on X drive to the Y drive if they have been changed since the last backup was performed.

Put the second command into task scheduler and assign it to run daily/weekly/monthly/however you want...

You can get really creative and create a file listing only those files you want to copy, or you can exclude certain files or certain types, or use a text file as an input file...there are plenty of options and sites with more complex uses...

I use the following:

for /f "delims=" %%i in (Input.txt) do echo F|xcopy "L:\%%i" "F:\Backup\%%i" /c /i /z /y

It looks at the files listed in a file called Input.txt and copies it, goes to the next line and copies that file, then the next, then the next, and so on until it processes every file listed in the input.txt file.

In the last 2 weeks, I have archived over 800,000 files with a 99%+ success rate.

MS-DOS is still useful.

brutalizer · Oct 14, 2011

@notarat,
how do you know your backups are correct and not subject to data corruption? XCOPY does not detect data corruption as ZFS does.

tormentum · Oct 15, 2011

rsync has the --checksum flag which checksums both source and destination files to verify your copies. You can also configure rsync in a client/server mode so that checksum calculations can be handled by local CPU's, meaning you speed things up dramatically if you're doing copies from one computer to another.

Without client/server mode, rsync will need to copy the source/destination (depending on where you run rsync from) back across the network in order to verify. Slower, but sill useful.

Also agreed on ZFS. Once the files have been copied to ZFS and you've used something to verify checksum's, ZFS will prevent the silent bitrot from happening, so long as you have some sort of redundancy (mirror/raidz/raidz2).

In case you don't have client/server configuration, the best bet is to do a recursive md5sum or sha1sum of your file/folder structure and verify this way. This is probably the most accurate way of verification.

brutalizer · Oct 15, 2011

Agreed. Beware of bit rot though. Once you verified that the data are correct with checksums, it might happen that bits on the disk, flip at random. This might happen sometime later. Cosmic radiation, current spikes, bugs in firmware, etc.

Even though the data is checksummed correctly at the moment when you copied it to disk, the data might be corrupted later on disk. It happens quite frequently. I remember my Amiga disks when I played games: they copy worked fine. Several years later I tried to boot up the game, but it did not work. The disk was subject to bit rot, and data was not valid anymore. No more game. Bit rot.

Regarding doing a recursive md5sum or sha1sum of your folder structure, that is a good suggestion. But again, beware of bit rot later. You need to manually do a checksum of all data every week or so, to catch bit rot. And if you encounter any errors, you need to find a functioning backup file and manually copy the backup file. Of course, the backup file must not have encountered bit rot or any other problems.

Alternatively, you can use ZFS and do an automatic "zfs scrub" every week which will catch all sorts of data corruption (including bit rot) and corrects all errors (if you use redundancy such as zfs raid/mirror/etc). You can do a ZFS scrub while using the zpool as normal, thus, you dont need to cancel all disk activity. When you do a Linux "fsck" you need to cancel all disk activity and wait until all data is checked before you can use the raid again. How about Windows "chkdsk"?

ZFS has the ability to do a SHA-256 of every block. The ZFS default is to use fletcher-4 checksum, which is faster than SHA-256. When you create the zfs raid, you just specify if you want SHA-256 checksums of every data block, with a flag.

There is more information on this, on the wikipedia article about ZFS and "data integrity".

UhClem · Oct 15, 2011

brutalizer said:
Agreed. Beware of bit rot though. ... [lots of good stuff]

And I agree completely. [In summary, "When your adversary is Murphy, you must take C.Y.A. to the sixth dimension.

]

But, there is one point, that, while not really incorrect, warrants further examination.

You need to manually do a checksum of all data every week or so, to catch bit rot.

(The frequency [weekly?] of this ongoing CYA is personal choice, but) Once you have initially confirmed that the destination file was written correctly the first time, subsequent checksums are, essentially, a waste of cycles. [Bear with me, resist that "reply" button.] When bit rot has occurred, a straight read of the data will suffice to identify that it has occurred. Your disk's ECC et al will detect that fact (the rot).
I seriously doubt there has EVER been a documented case of bit rot occurring in disk data which went completely undetected because the relevant ECC had also rotted such that "all looked normal".

Once you have confirmed the presence of rot, you can then burn your checksumming cycles to locate the rot and correct for it.

But, isn't it true, that if this data is stored in a parity-protected array (not even ZFS, just, e.g., Raid[456]) the (unavoidable) ReadError itself will put into motion a self-repair (culminating in a Realloc_Sector)? And, even if, you had just done the suggested checksumming under similar conditions, you would never get a checksum mismatch (error) (because of the intervening self-repair), right?

Now, in the case of just a "straight" filesystem, you can rely, either on kernel error messages, or the value of Current_pending_sector before vs after the CYA run, to dictate the need, or [99% of the time]non-necessity, to do the actual checksumming to identify the file(s) containing rot, and then take corrective action. No point in scheduling surgery prior to any sign of a tumor.

Comments?

PS I still do the checksum-check, via a script invoking md5sum; cycles are abundant. But, I hope to hear from anyone who has had similar (or contradictory) thoughts.

--UhClem

odditory · Oct 15, 2011

Luzer said:
I just found out that one folder that I copied over manually a few years ago to my server has half viewable images and a few .avi’s that VLC had some issues playing. My question to you is what do you do to verify that the backup?

You'll have to forgive all the violent ZFS masturbation that invariably creeps into threads like these. People are excited.

To actually answer your question about how to compare two sets of files, it sounds like you're probably just looking for a quick and easy Windows tool, and for that I'd recommend BeyondCompare, which will analyze two folder trees and point out differences including CRC mismatches. That would be the fastest way rather than checking files one by one. There are other tools that are a little more hardcore and actually read the entire files being compared and generates checksums for comparison but BeyondCompare will be fine in most cases, and its absolutely the fastest folder comparison tool in general.

brutalizer · Oct 16, 2011

UhClem said:
(The frequency [weekly?] of this ongoing CYA is personal choice, but)

It is recommended to do a ZFS scrub every week when using commodity disks. If you use Enterprise SAS disks, a ZFS scrub every month is recommended - because Server Enterprise disks have much higher reliability.

The above official recommendation of once per week from Sun/Oracle, is the reason I suggest to do a checksum of all your data every week. Thus, my suggestion is taken from recommendations from Oracle/Sun, who certainly knows how to do Enterprise high end storage servers.

When bit rot has occurred, a straight read of the data will suffice to identify that it has occurred. Your disk's ECC et al will detect that fact (the rot).

Actually, I dont believe it is the case that disk ECC will detect bit rot. See links below.

Once you have confirmed the presence of rot, you can then burn your checksumming cycles to locate the rot and correct for it.

I dont believe is easy to confirm the presence of bit rot. See below for more information.

I seriously doubt there has EVER been a documented case of bit rot occurring in disk data which went completely undetected because the relevant ECC had also rotted such that "all looked normal".

If disk ECC were perfect there would be no data corruption. Real life studies (see below) shows there are corrupted data on disks. A disk devotes large parts of the surface to error correcting codes, and during usage lots of errors are corrected on the fly. But some errors are not correctable, the error correcting codes are not perfect.

For instance, here is the specification of a high end Enterprise Fibre Channel disk:
http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/10K.7/FC/100260916d.pdf
On page 17, we see some error rates:
-Unrecoverable errors (some errors are not correctable by the disk)
-Miscorrected errors (some errors are even miscorrected)
-Interface errors (interface also causes errors)
...

Here is a SAS disk, with similar error rates on chapter 5.1.2
http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15K.6/SAS/100466193b.pdf

Both disks are using the new special 520 byte sectors, which contains enlarged checksum part to catch data corruption - and still they can not handle all errors. Ordinary SATA disks will of course have much higher error incidence than High End Enterprise storage disks.

Apart from these weaknesses in all disks, there are of course other weaknesses as well on other levels in the entire chain.

For instance, bugs in disk firmware.

Badly designed filesystems: NTFS, ext3, XFS, JFS, ReiserFS, etc. See below for research on this.

Shaky power supplies:
http://blogs.oracle.com/elowe/entry/zfs_saves_the_day_ta
ZFS detects immediately errors that went unnoticed for years.

Flaky port in Fibre Channel switches:
http://jforonda.blogspot.com/2007/01/faulty-fc-port-meets-zfs.html
ZFS was the first to detect this faulty switch, that was unnoticed earlier.
Question: How can ZFS even detect a faulty switch??? Isnt ZFS a filesystem???

Answer: Have you played this game when you were a kid? You whisper a word in the ear of someone, and he whispers it to someone else, and when the word has traveled the entire chain you compare the two words. Never are the words identical. You need to compare both end of the chains, to be sure the data is correct. Only ZFS does this, no one else does this. If the data passes through a faulty switch, ZFS will notice this.

The reason ZFS can detect all sorts of problems, is because ZFS has end-to-end checksums. There is a long long chain where data travel: RAM - controller - disk. Each domain might have checksums (checksums in RAM, checksums on disk, etc), but there might be errors introduced when passing to another domain. Thus you need to compare the data in RAM with what actually ended up on disk - is it the same? Every solution on the marke has checksums, but only ZFS compares end-to-end. That is the ZFS magic: End-to-end.

Also, ZFS has got complaints to not work well with SAS expanders, people thinks that ZFS + expanders is a fragile solution. The problem is actually not in ZFS, but the problem is in low quality expanders. People have never used a filesystem with end-to-end checksums, which detects all errors immediately. So people believe the errors lies within ZFS. But no, the problem is in the hardware. If people switches to other solutions, the data corruption issues are not reported anymore - but the problems are still there. There are people that check the hardware first with ZFS to see if there are any errors detected, then later, they switch to Linux when they know the hardware is healthy and correct.

Microsoft did a study which showed that 20% of all Windows crashes depends on non ECC RAM. Such a crash depends not on Windows, but the fault is actually with the bad RAM memory stick. Windows is fragile, and exposes the bad hardware. Windows crashes, but the problem might not necessarily lie within Windows.

Here are some studies on the worst kind of data corruption: SILENT data corruption. The hardware or software, does not even detect the data corruption. (For instance, Ghost Writes: the data actually never was written, but disappeared in limbo, but the disk thinks everything was written down and reports "success")
http://en.wikipedia.org/wiki/Data_corruption

Here are some research on data corruption, from physics centre CERN:
http://www.zdnet.com/blog/storage/data-corruption-is-worse-than-you-know/191
Conclusion: silent data corruption is a fact of life, and you need end-to-end checksums. CERN is now implementing ZFS on long time storage of all particle data:
http://blogs.oracle.com/simons/entry/hpc_consortium_big_science_means

"Having conducted testing and analysis of ZFS, it is felt that the combination of ZFS and Solaris solves the critical data integrity issues that have been seen with other approaches. They feel the problem has been solved completely with the use of this technology. There is currently about one Petabyte of Thumper [ZFS] storage deployed across Tier1 and Tier2 sites. That number is expected to rise to approximately four Petabytes by the end of this summer. "

Here is research that shows filesystems NTFS, XFS, JFS, etc are very bad at detecting data corruption and does not cut it:
http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169?tag=content;siu-container

". . . ad hoc failure handling and a great deal of illogical inconsistency in failure policy . . . such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.. . We observe little tolerance to transient failures; . . . . none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy."

Lastly, here is research about ZFS vs data corruption which actually shows that ZFS does indeed gives good protection:
http://www.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf

But, isn't it true, that if this data is stored in a parity-protected array (not even ZFS, just, e.g., Raid[456]) the (unavoidable) ReadError itself will put into motion a self-repair (culminating in a Realloc_Sector)? And, even if, you had just done the suggested checksumming under similar conditions, you would never get a checksum mismatch (error) (because of the intervening self-repair), right?

Only ZFS has end-to-end checksums. Garbage in - garbage out. If you get faulty data from the controller, faulty data will be written to disk.

Hardware raid has no checksums, instead hardware raid has parity calculations. Parity calculations is not the same as checksums for data corruption. Thus, hardware raid does not have good data protection:
http://en.wikipedia.org/wiki/RAID#Problems_with_RAID

Now, in the case of just a "straight" filesystem, you can rely, either on kernel error messages, or the value of Current_pending_sector before vs after the CYA run, to dictate the need, or [99% of the time]non-necessity, to do the actual checksumming to identify the file(s) containing rot, and then take corrective action. No point in scheduling surgery prior to any sign of a tumor.

You need end-to-end checksums to catch all errors.

PS I still do the checksum-check, via a script invoking md5sum; cycles are abundant. But, I hope to hear from anyone who has had similar (or contradictory) thoughts.
--UhClem

This shows you are open minded and hungry for knowledge. Without knowledge, mankind would stagnate. There would be no research. I suggest you catch up on research on data corruption. Just read the research papers by computer scientists, I linked to above. There are other links to more research in the links, too.

Of course I am not claiming that ZFS is bullet proof. There are still bugs in ZFS. But ZFS is not broken by design as hardware raid is (for instance "write-hole-error"), and only ZFS is designed to catch data corruption. And ZFS is free. Maybe there is a good reason for all ZFS hype?

odditory said:
You'll have to forgive all the violent ZFS masturbation that invariably creeps into threads like these. People are excited.

It is well known that you know much about storage, but I think I prefer to learn from you odditory, instead, by reading your texts. I am new to storage and quite recently became active on storage, and I remember when I asked a noob question last month asking for help, and I remember your response.

I am glad there are people like "Drescherim"(?) here, who remembers how it once was to be a beginner, and helps people out even with any questions, even stupid ones. Everbody are not experts.

tormentum · Oct 16, 2011

odditory said:
You'll have to forgive all the violent ZFS masturbation that invariably creeps into threads like these. People are excited.

Oh ZFS.... oooooh.....

.....

Now, back to reality.

One option for being able to guard against bit-rot without using ZFS could be using QuickPar (http://www.quickpar.org.uk/) It's able to calculate file level parity (instead of disk level), generating some additional "parity" files. In the event you have some highly important data (or are exceptionally paranoid), this could be a useful to you.

This has been used a lot on Usenet groups etc. In the event that your original file is corrupt, it can be reconstructed using the relevant parity files. I'm not sure if it can be used on a folder structure however.

Again, the only sure fire way to verify your data is the same on both source and destination is to checksum (md5sum, sha1sum etc). To actually recover your data, you need to:

a) be able to detect the corruption (ZFS, md5sum,sha1sum)
b) be able to repair the corruption through parity (ZFS, Quickpar, multiple backups on different devices)

danman · Oct 16, 2011

brutalizer said:
This is Silent data corruption, or Bit Rot. This is the reason I use ZFS, which protects against this shit.

I dont know what to do, man. I hope you solve it. If you do, please post the solution here.

Use QuickPar to create a 20% or so set of parity files.

Rectal Prolapse · Oct 16, 2011

brutalizer: wow interesting post. I actually learned something from a "noob".

With that said - as great as ZFS is, it isn't bulletproof. Case in point: I have an Intel Q9200 CPU in an Asus P5Q-E motherboard. It only had 4 GB of RAM on 2x2GB OCZ Vista Platinum sticks. A year later I decided to add 2 more identical sticks. A couple of days later I noticed odd things - I got corrupted RAR files!

How can this be? ZFS should have saved me! But alas, this did not happen.

What happened was that the BIOS on the P5Q-E was out of date and had silent RAM corruption when all 4 slots were used. ZFS could NOT save me from this as the data I copied onto the pools were corrupted but this happened BEFORE the ZFS checksum stage.

Luckily for me I was able to recover my data using the PAR2 files I had provided.

ZFS will have no chance in this situation. RAR + PAR was what saved my data in the end.

So my next ZFS build will have ECC RAM.

drescherjm · Oct 16, 2011

Without trying to start another ECC debate, I recommend always test ram for a few days on all systems before putting important data on it.

modulus · Oct 16, 2011

So seeing how there seems to be some pretty knowledgeable people on this thread about zfs and data storage, and without trying to hijack the thread, I was wondering what people's thoughts were on btrfs?

I run a mirrored ZFS system now but was thinking about maybe moving over to btrfs once they get the official filesystem check going.

If you were to compare/contrast btrfs and zfs what would the be the big differences and is anyone else looking at switching to btrfs in the future or no?

brutalizer · Oct 17, 2011

Rectal Prolapse said:
With that said - as great as ZFS is, it isn't bulletproof. Case in point: I have an Intel Q9200 CPU in an Asus P5Q-E motherboard. It only had 4 GB of RAM on 2x2GB OCZ Vista Platinum sticks. A year later I decided to add 2 more identical sticks. A couple of days later I noticed odd things - I got corrupted RAR files!

How can this be? ZFS should have saved me! But alas, this did not happen.

What happened was that the BIOS on the P5Q-E was out of date and had silent RAM corruption when all 4 slots were used. ZFS could NOT save me from this as the data I copied onto the pools were corrupted but this happened BEFORE the ZFS checksum stage.

Luckily for me I was able to recover my data using the PAR2 files I had provided.

ZFS will have no chance in this situation. RAR + PAR was what saved my data in the end.

So my next ZFS build will have ECC RAM.

In this case, it is not ZFS' fault. If you have faulty RAM then ZFS can do nothing, and ZFS does not promise to fix RAM problems, nor CPU problems nor GPU problems. Garbage in - garbage out.

ZFS promises that the data it gets, will be saved correctly down to disk. In your case, ZFS has kept its promise. So ZFS did what it was supposed to do.

In fact, in one of links above, a team of comp sci researchers talk about this. They stressed ZFS and injected artifical errors directly on disk - and ZFS recovered from all errors (as long as there was one correct block saved somewhere - hence, you must use redundancy such as raid, mirror, etc).

The research team also corrupted the data before ZFS got it, and observed that ZFS did not correct the data. This is obvious. For ZFS to correct faulty indata, ZFS would need to have knowledge about all indata files - and that database would be huge. For ZFS to be able to correct binary windows files, ZFS needs information about windows binary files - only then ZFS can restore to the original state.

If I give you a painting that is half destroyed - can I request that you restore it to the original state? No, you need information about how the painting looked like from the beginning. You need a huge database with lots of paintings to be able to restore them. That is too much a request, from me. Instead, I can request that you keep every painting I give to you, you will keep them safe in a vault. Unmodified and unscratched and uncorrupted.

Thus, research shows that ZFS correctly saves to disk, what data it gets. If ZFS gets corrupted data, ZFS will save the corrupted data correctly.

Thus, you need ECC RAM if you are interested in ZFS. A chain is not stronger than the weakest link. Either you use ZFS together with ECC RAM, or you can scrap data integrity all together and might as well as use normal RAM and normal hardware raid such as ARECA cards or something. There is no point of using heavy data protection like ZFS, if you dont use ECC.

But remember, many motherboards support ECC RAM, but that does not mean that errors in RAM will be corrected. In addition, you need SECDED motherboards for ECC RAM to be activated:
http://cr.yp.to/hardware/ecc.html
Ordinary ECC RAM on the mobo will not do. There need to be a special functionality in the mobo, called SECDED, and then you need ECC RAM as well. Read the link.

Does anyone know more about this and can give additional info on ECC and SECDED?

drescherjm said:
Without trying to start another ECC debate, I recommend always test ram for a few days on all systems before putting important data on it.

As usual, you give good advice. I just want to chime in the following:
If you DONT use ECC RAM, then there is a 96% probability you will have corrupted RAM within 3 days of uptime. If you use ECC RAM, then the risk lowers to almost neglible levels:
http://lambda-diode.com/opinion/ecc-memory
Thus, ECC is important if you care about data integrity. As well as a checksum based disk storage.

danman said:
Use QuickPar to create a 20% or so set of parity files.

But remember to do a full checksum every week, to catch ongoing bit rot, that might later show up.

modulus said:
If you were to compare/contrast btrfs and zfs what would the be the big differences and is anyone else looking at switching to btrfs in the future or no?

First of all, there is research on ZFS and data corruption, and ZFS did pass all researchers scrutiny. The creator of ZFS explains:
http://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data

"The job of any filesystem boils down to this: when asked to read a block, it should return the same data that was previously written to that block. If it can't do that -- because the disk is offline or the data has been damaged or tampered with -- it should detect this and return an error. Incredibly, most filesystems fail this test. They depend on the underlying hardware to detect and report errors. If a disk simply returns bad data, the average filesystem won't even detect it."

As a result, it is slowly dawning to others that end-to-end checksums are great. However, it is non trivial to do a good implementation. I dont know of any research on BTRFS. Thus I would be a bit hesitant to switch to BTRFS.

The Sun Solaris team have vast experience of Enterprise Server storage and observed lot of data corruption and tried to solve this problem: ZFS. Now more and more understands this and tries to copy ZFS functionality. This is a good thing. I dont know how much experience the BTRFS guys have about Enterprise Server storage though. It is one thing to know what you are doing, and another thing to copy those who knows and believe it will lead to success.

If there is sloppy source code in your kernel and the kernel crashes - you loose a days worth of work. If your filesystem crashes - you loose many years of work. Thus, a filesystem has much higher requirements than a kernel. Linux kernel might contain sloppy code (according to Linux kernel developers) - that is fine. You just loose a day of work. But if a Linux filesystem contains sloppy code... Be careful and always use backups. (Even when you use ZFS of course)

Many sysadmins in the Enterprise refuse to use anything newer than a decade. ZFS is 10 years old, and still it has bugs. When BTRFS reaches v1.0 it will take another 10 years before it will be trusted to use in production.

However, the trend we can see is that software raid such as ZFS, BTRFS and Hammer are coming. And hardware raid will be slowly phased out. Maybe you should sell your hardware raid while there is a market, and switch to software raid?

danswartz · Oct 17, 2011

I thought i read somewhere btrfs was created because it wasn't possibly to easily port zfs to linux. I could be wrong, I thought that was the motivation though.

brutalizer · Oct 17, 2011

danswartz said:
I thought i read somewhere btrfs was created because it wasn't possibly to easily port zfs to linux. I could be wrong, I thought that was the motivation though.

Or the motivation was "Not Invented Here" syndrome. Linux people likes to develop their own version instead of porting. There are lots of FreeBSD tech Linux people could have ported, but instead they developed their own version.

danswartz · Oct 17, 2011

To clarify: the difficulty was because the sun CDDL is incompatible with the GPL, so a complete source rewrite was necessary. That was not the case with FreeBSD, which is why they have ZFS so much sooner...

tormentum · Oct 17, 2011

danswartz said:
To clarify: the difficulty was because the sun CDDL is incompatible with the GPL, so a complete source rewrite was necessary. That was not the case with FreeBSD, which is why they have ZFS so much sooner...

Correct. Unfortunately there will probably never be an official ZFS port to linux due to license incompatibilities. However, there are some non-official ZFS ports to linux, one of the major ones being: http://zfsonlinux.org/. I cannot vouch for their stability though.

UhClem · Oct 17, 2011

Hi brutalizer,

I hope I can express this reply adequately, and I hope you're able to accept it, in the proper spirit.

First, I am in no way offended. In fact, the whole thing is bizarrely amusing. But, I am not poking fun at you. Please try to understand that.

Somehow, and I hope you can help me understand how, you completely "misread" me. Not, in the sense, of reading comprehension (though I'll try to point out a few such tidbits, below), but in the sense of "mis-assessed" my background, technical experience, and my understanding of the subject matter. [That's not unreasonable; we (and certainly I) don't put our resum'e in our Profile, or even real name that you could Google. This recent thread [H]ere might provide a good example of how mistake(s) can be handled humorously, and constructively. Take a moment, please; it's short and apropos. Tnx]

Onward ... (but Tarantino style -- out of order)

brutalizer said:
I am new to storage and quite recently became active on storage ...

And I am very old to storage

, with a fair amount of hardcore hands-on experience (drivers, filesystem enhancements, etc.) That "old" does have potential downside(s): there is the possibility that concepts and/or technologies that I know, are no longer relevant, and, well ... just being old (but, thankfully, I'm still pretty sharp [I discovered, and just documented, a firmware bug in a drive many here are using; no alarm, though, it is totally benign (well, a wasted millisecond here and there), but still a flaw]).

This shows you are open minded and hungry for knowledge. Without knowledge, mankind would stagnate. There would be no research. I suggest you catch up on research on data corruption. Just read the research papers by computer scientists, I linked to above. There are other links to more research in the links, too.

A kind, generous, and sincere response. But, aside from the very general fact that everyone can improve their knowledge, it's (no offense intended, please) almost like some weird schizoid role-reversal

[Consider: there are at least 5 world-renowned computer scientists who know me professionally; without question, they don't, and shouldn't, consider me their equal, but they know my skills and have either offered me jobs/work or licensed my software. But, that was then ...]

Let me respond to a couple of things. Please note that I very specifically addressed only your sub-topic of bit rot.

UhClem said:
When bit rot has occurred, a straight read of the data will suffice to identify that it has occurred. Your disk's ECC et al will detect that fact (the rot).

brutalizer said:
Actually, I dont believe it is the case that disk ECC will detect bit rot. See links below.

(Important point: It most certainly detects it; and, in most cases, it just silently corrects it before satisfying your read request. In those unfortunate cases, where the rot is toxic, it detects that fact, and returns an UNC error. If you meant: it doesn't, without being asked, just go sleuthing around looking for rot, on its own. So true, and I ain't QBing the Patriots either.)

From the wiki link (not always the almighty gospel, but generally reliable ... agreed?):

[More commonly,] bit rot refers to the decay of physical storage media.

and, while it doesn't actually discuss disk drives specifically, it seems that

Bit rot is often defined as the event in which the small electric charge of a bit in memory disperses, possibly altering program code.

and

Bit rot can also be used to describe the phenomenon of data stored in EPROMs and flash memory gradually decaying over the duration of many years,

are both precisely applicable to a modern disk drive. But, do note the following general factum (I don't have a cite handy): Today's disk drives are so bleeding-edge, with amazing data density (>3 Gbits/sq. inch) and super-tight tolerances (200+K tracks/(radial)inch), that errors in the stored data are an anticipated, and expected, given.

If one had access to some proprietary, or otherwise locked-out, commands and could read the raw bits on a track, and knew the layout such that you could identify the data bits for each sector (and knew the proprietary RLL-type encoding so that you could actually recreate the uncorrected data, you'd crap (as would I). But, in addition to these mundane corrections, there are the more precarious ones. Like, when, even with this sophisticated correction technology, the firmware needs to make several (2-10) attempts (with a wasted revolution between each) just to get a properly-corrected sector to return. Those are your bit rot on the precipitous edge situations. Just a smidgen more, and you get an UNC[orrectable] error.

You did allude to some of that, and gave some links. Trust me, I already knew, and understood, all of it.

Hence, while ECC is not able to Correct such an error, it still allowed the error to be detected, and the driver was informed. And, that is the stage that one hopes their A is C, via parity, backups, ... prayer.

UhClem said:
I seriously doubt there has EVER been a documented case of bit rot occurring in disk data which went completely undetected because the relevant ECC had also rotted such that "all looked normal".

brutalizer said:
If disk ECC were perfect there would be no data corruption.

You've missed my point. (And, I wish I had a link to a paper for you which could explain it in layman's terms. Am I correct in assuming that you are not a (hardcore) programmer? I could explain it to one [at least, in person, with just a little hand-waving], but odds are good they already get it

.)

Your Seagate manual reference (pg17) does mention the term miscorrected error, but says nothing more about it (other than a rate). Oh, it does discuss, and define, the other error types, but not this one. While I respect Seagate, this almost sounds like something their lawyers would, wisely, have them add [the ultimate CYA?].

Almost everything else that it seemed you were addressing to me, was inapplicable, since I had right up front, laid out the premise

Once you have initially confirmed that the destination file was written correctly the first time, ...

which was an additional focusing on bit rot. I have my own procedure for attaining the end-to-end business. I understand that you are enthusiastic about ZFS. Fine--I think it is very nifty technology also. But, given my "neighborhood" and the "prevailing conditions" I have my own plan for getting to the market and back, safely, without driving an M4 tank. And, I have a backup.

You can have your religion, and others can have theirs

.
Come to think of it, backups is not a bad faith to follow.

Are we having fun, yet, brutalizer?

Lastly,

Microsoft did a study which showed that 20% of all Windows crashes depends on non ECC RAM.

I'm surprised that's not a higher number. But I am sure I'm not properly attributing the full range, and quantity, of [L]user errors.

In closing, brutalizer, I'm sincere when I say I am interested to understand what led to this mix-up. I am not looking to assign blame, or fault. If there was something I wrote, or didn't write, or the way I wrote it, let me know. Or, if you've got any ideas, upon reflection, how you might have gotten off-track, I'm curious to hear that also.

PM if you prefer--there's no dirty laundry, but it's not necessary to air it, regardless.

--UhClem "I think we're all bozos on this bus."

modulus · Oct 17, 2011

brutalizer said:
Many sysadmins in the Enterprise refuse to use anything newer than a decade. ZFS is 10 years old, and still it has bugs. When BTRFS reaches v1.0 it will take another 10 years before it will be trusted to use in production.

Thanks for explaining all that, it made a lot of sense.

Sgt_Strider · Oct 18, 2011

Interesting thread here and I have some questions to ask.

How does Windows Home Server 2011 deal with the issue of bitrotting?

Anyone here uses SyncBackPro to transfer their data? The software claims to have a data integrity check option.

brutalizer · Oct 18, 2011

Sgt_Strider said:
How does Windows Home Server 2011 deal with the issue of bitrotting?

Anyone here uses SyncBackPro to transfer their data? The software claims to have a data integrity check option.

Windows Home Server 2011 uses NTFS, I presume. And NTFS does not detect nor protect against bit rot. I would not be surprised if the creator of this thread, that reported bit rot, used NTFS? Read this link about NTFS and bit rot:
http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169

Regarding SyncBackPro and data integrity check option, that sounds good that they are at least aware of the problem of bit rot. However, I have not seen any research on SyncBackPro, so I dont know how safe their solution is. Everybody uses checksums, but not many (no one?) uses end-to-end checksums. Thus, to just use checksums does not give data integrity. You must compare checksums the end-to-end way to reach data integrity.

We store more and more data today, and there is a tiny probability of error for each bit. The spec sheets on SAS disks say 1 error for every 10^15 bits. If we read 10^15 bits, the SAS disk will show one error - on average. Long time ago, disks were small and it took a long time to read 10^15 bits. Today that is easy with fast and large raids. But there are also lot of other error sources than the disk, each having 10^14 or something:
disk firmware, controller card bios, cables not correctly tucked in, faulty ECC RAM, flaky power supply, bugs in OS, low quality hardware, etc etc etc

For instance, new SAS disks have 520 byte sectors instead of 512 bytes. The extra 8 bytes are checksums and support the new DIF Data Integrity standard. But STILL these SAS disks specify unrecoverable errors, miscorrected errors, etc - even though the disks use DIF they can have errors. So there are checksums everywhere, but that does not cut it. There can be errors when you pass from one link to another link in the chain, even though there are checksums in each level.

http://www.exlibrisgroup.com/files/Customer_Center/NorthAmerica/Rosetta-Alberta-clarke-1.ppt
"Data can be encoded more than five times from RAM to disk". And each encoding changes data in some way, introducing additional potential error sources.

UhClem said:
Hi brutalizer,

I hope I can express this reply adequately, and I hope you're able to accept it, in the proper spirit.

Of course, I think this is an interesting discussion that many are interested to follow.

First, I am in no way offended.

I hope not!

In fact, the whole thing is bizarrely amusing.

And a bit interesting for geeks, I hope

[I discovered, and just documented, a firmware bug in a drive many here are using; no alarm, though, it is totally benign (well, a wasted millisecond here and there), but still a flaw]).

It sounds as if I have much to learn from you!

Let me respond to a couple of things. Please note that I very specifically addressed only your sub-topic of bit rot.

Ok, I am talking about data corruption in general, not only Bit Rot. And I am most worried about Silent data corruption. Much of data corruption are noticed so we can deal with it and not a great worry, but some of the data corruption are not noticed at all - that is what is worrying me. And the creator of this thread, I presume, because he encountered Silent data corruption.

It seems to me, that you mean that ordinary bit rot error will get corrected. If the disk is not able to correct it, the disk will report an UNC error? Is this correct?

If this statement is your view point it might very well be that it is true.

However - please understand that I am not trying to offend you, this is just my way of discussing - if we talk about data corruption in general, not only Bit Rot: there exist data corruption that the disk will not notice. Thus you will get no notification nor an UNC from the disk - because the disk does not know. For instance, ghost writes, those writes that the disk believed were written down to disk - when in fact the write disappeared into thin air. This is clearly Silent data corruption, and the disk will not know.

Here is research from physics centre CERN, they examined silent data corruption. The research was something like this: CERN wrote a specific bit pattern (say "010101010101") to 3.000 hardware raid racks for two weeks. At the end, they examined all data on the raids and examined if it still was "010101010101". However, in 500 cases the bit pattern had changed, it was different from "010101010101". Thus, this was silent corruption because the rack server did not detect nor reported these errors. The computer was oblivious to the data corruption.
http://indico.cern.ch/getFile.py/ac...ionId=0&resId=1&materialId=paper&confId=13797

http://fuji.web.cern.ch/fuji/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf

Your Seagate manual reference (pg17) does mention the term miscorrected error, but says nothing more about it (other than a rate). Oh, it does discuss, and define, the other error types, but not this one. While I respect Seagate, this almost sounds like something their lawyers would, wisely, have them add [the ultimate CYA?].

Those error correcting codes are not perfect, and there might be corner cases when they can not correct, or even miscorrect.

Almost everything else that it seemed you were addressing to me, was inapplicable, since I had right up front, laid out the premise

Very possible. As you have noticed, English is not my first language, and I am not very good at it. Some subtleties I miss. Like "QPing Patriot"?

I understand that you are enthusiastic about ZFS. Fine--I think it is very nifty technology also....You can have your religion, and others can have theirs.

Yes, I am very enthusiastic. "Please forgive my ZFS masturbation. I am excited".

The reason I am excited is because many comp sci Researchers have looked into data corruption and also ZFS. Current research shows all solutions to be inadequate except ZFS - it seems. Thus, it feels my excitement is justifiable, because of all research on this.

If there were other solutions on the market that also provided as good protection then I would look at them as well. If researchers proved that BTRFS is superior to ZFS, then I would switch of course. My data is precious to me, and I would not use an unsafe solution because of politics. I want to use the safest solution. According to researchers, the answer is ZFS. Of course, if you can show me another solution XXX that is safer, then I will switch, as many others here. And then we would talk about XXX instead, and we would all do "XXX masturbation" and be very excited of course.

Until superior product "XXX" comes out I, and many more, will continue to use ZFS. And be excited. Thus, I think my excitement is nothing weird. Researchers say ZFS is safe. It is your choice if you want to reject the research done by comp sci PhD people? But I tend to trust researchers.

Are we having fun, yet, brutalizer?

Certainly!

In closing, brutalizer, I'm sincere when I say I am interested to understand what led to this mix-up. I am not looking to assign blame, or fault. If there was something I wrote, or didn't write, or the way I wrote it, let me know. Or, if you've got any ideas, upon reflection, how you might have gotten off-track, I'm curious to hear that also.

One explanation is that English is not my first language, so I misinterpret people. Another explanation is that I am excited, when I read this data corruption research. We don't have to pay millions anymore, to get good data protection.

People are sometimes reporting they see no issues with SAS expanders, and when they switch to ZFS they see lot of problems. The problem is that their first solution had no end-to-end checksums and did not observe all transient errors that actually occured. Me like the sensitivity of ZFS that exposes all underlying problems at once.

Again, I am not trying to offend you, this is just my way of discussing. Please forgive me if I have.

PS. I like this!

http://www.youtube.com/watch?v=fvnmb6JkFyg&feature=related
http://www.youtube.com/watch?v=NZfKXOlA7L0
http://www.youtube.com/watch?v=W_jrOF042Cs&NR=1
http://www.youtube.com/watch?v=2IoKYLqGaPc&feature=relmfu
http://www.youtube.com/watch?v=z2UMUOkEzF8&ob=av2e
http://www.youtube.com/watch?v=rVbbx5P4PNA&feature=related

Silhouette · Oct 18, 2011

Just a small observation from my systems: I regularly copy large amounts of data (10-30 TB) between computers and I very, very rarely have any problems with data corruption. I use a variety of HDDs, controllers and other components.

Trepidati0n · Oct 18, 2011

Silhouette said:
Just a small observation from my systems: I regularly copy large amounts of data (10-30 TB) between computers and I very, very rarely have any problems with data corruption. I use a variety of HDDs, controllers and other components.

Unfortunatley your evidence is the most dangerous kind since it is...anecdotal. Statistics are a bitch. One person may never see the issue while another just can't get a break.

danswartz · Oct 18, 2011

One also wonders how he knows he has such a small problem? Most filesystems will not detect corruption internal to data blocks - if it is say, a movie or other video clip, you could have a bad section of a few bytes and not even see it.

Trepidati0n · Oct 18, 2011

brutalizer said:
It is your choice if you want to reject the research done by comp sci PhD people? But I tend to trust researchers.

Be very careful putting PhD's and researchers on pedestals. Remember, they are academics by nature and thus tend to ignore (forget) reality in order to fit a hypothesis. It isn't intentional, it is the nature of the their efforts.

Fuck, we just had some of the best and brightest in the world say Einstein was wrong and then forgot to include, what should be to them, a rudimentary error source. For the past decade we had people believing that vaccines were evil due to "bad research" that was propped up as fact.

More simply, math can always be done to meet an expectation where as time always reveals the truth.

drescherjm · Oct 18, 2011

It is your choice if you want to reject the research done by comp sci PhD people? But I tend to trust researchers.

Being involved in research myself for the last 15 years. I can say 1 thing you can not always trust a study, paper ... We have from time to time contradicted and disproved the published results of others and backed it up with verified clinical results. I can not say much more than that since I am not anonymous.

Neb · Oct 18, 2011

For my critical data I archive on multiple devices/mediums and use a 20% par set. It's no ZFS

but it gets the job done.

Trepidati0n · Oct 18, 2011

One thing I have been disapointed in, is that files do not have MD5 checksums built in by default. It is so eff'ing eay to do.

Silhouette · Oct 18, 2011

Trepidati0n said:
Unfortunatley your evidence is the most dangerous kind since it is...anecdotal. Statistics are a bitch. One person may never see the issue while another just can't get a break.

Which is why you need to test your setup thoroughly.

The difference between anecdotal evidence and an objective study is not as great as you might think. I should know, I have a PhD

Also, the study referenced earlier explained 80% of the observed errors came as a result of a specific incompatibility between certain HDDs and a brand of controllers. It doesn't help to have a lot of data points if they are limited to a certain configuration. That actually makes my anecdotal evidence a more broad analysis, although not broad enough to draw any general conclusions.

danswartz said:
One also wonders how he knows he has such a small problem? Most filesystems will not detect corruption internal to data blocks - if it is say, a movie or other video clip, you could have a bad section of a few bytes and not even see it.

I make (and test) checksums for all of my data.

Rectal Prolapse · Oct 18, 2011

brutalizer: I already know all that - no need to _lecture_ me when I already came to the same conclusion in my earlier post.

danswartz · Oct 18, 2011

Silhouette said:
Which is why you need to test your setup thoroughly.
I make (and test) checksums for all of my data.

I don't recall you mentioning that originally. If you did, my apologies...

Check Data Integrity of Backups?

Weaksauce

[H]ard|Gawd

[H]F Junkie

Weaksauce

Gawd

n00b

n00b

Limp Gawd

Gawd

2[H]4U

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

Limp Gawd

Supreme [H]ardness

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

Gawd

[H]F Junkie

Limp Gawd

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

Limp Gawd

Limp Gawd

Limp Gawd

Gawd

[H]ard|Gawd

Limp Gawd

[H]F Junkie

2[H]4U

[H]F Junkie

[H]F Junkie

2[H]4U

[H]F Junkie

Limp Gawd

Gawd

2[H]4U