Data integrity: the risks of not using ECC with ZFS (a study)

iroc409

[H]ard|Gawd
Joined
Jun 17, 2006
Messages
1,385
So, a while back in one of the ZFS build threads, I suggested that someone building a new system should not purchase parts that do not include ECC RAM for their ZFS system. The dicussion continued, and someone asked me to produce any real tangible evidence that ZFS is at-risk without ZFS. I mentioned a study, but didn't have a link as I didn't have access to where I stored it. I found it again (the PDF, anyway), and figured it was worth sharing some details.

The study I found is titled "End-to-end Data Integrity for File Systems: A ZFS Case Study", by Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau (Computer Sciences Department, University of Wisconsin-Madison).

I did not re-read it again in its entirety (did when I originally found it), but they have a few interesting gems. The study is primarily about ZFS error recovery on-disk, but they tested RAM as well. They even calculated the probability of flipped bits based on workload type, and type of bit-flip.

Probably the best memory-related summation, from the study is:

5 In-memory data integrity in ZFS

In the last section we showed the robustness of ZFS to disk corruptions. Although ZFS was not specifically designed to tolerate memory corruptions, we still would like to know how ZFS reacts to memory corruptions, i.e., whether ZFS can detect and recover from a single bit flip in data and metadata blocks. Our fault injection experiments indicate that ZFS has no precautions for memory corruptions: bad data blocks are returned to the user or written to disk, file system operations fail, and many times the whole system crashes. This section is organized as follows. First, we briefly describe ZFS in-memory structures. Then, we discuss the test methodology and workloads we used to conduct the analysis. Finally, we present the experimental results and our observations.

Anyway, I thought it was worth sharing, since I finally found it. I'd be interested to hear others' thoughts about the study.
 
That's what I've been saying all along - running a server or any machine 24/7, without ECC, is just asking for trouble, even if it's just a personal file server.
 
running a server or any machine 24/7, without ECC, is just asking for trouble

I totally disagree with that. If your memory is good and you are not overclocking you should never ever experience a bit flip short of the unlikelihood of being hit by a cosmic ray which I believe you have a much better chance of wining the lottery. ECC does help you detect bad memory (and I do buy systems with ECC when possible) however you should have tested it before you installed it. As for memory going bad after it has been tested and installed. In the 15+ years I have administered machines for my work department I have yet to see a single dimm that has gone bad after being properly tested. All of my memory RMAs (on 200 to 300 machines) have been DOA situations.
 
Last edited:
If your memory is good and you are not overclocking you should never ever experience a bit flip short of the unlikelihood of being hit by a cosmic ray which I believe you have a much better chance of wining the lottery.

Not according to Google's RAM study. The study I referenced above also indicates your chances of random bit-flip by cosmic ray goes up substantially at higher elevations, so people in Denver, be ware! :)
 
I totally disagree with that. If your memory is good and you are not overclocking you should never ever experience a bit flip short of the unlikelihood of being hit by a cosmic ray which I believe you have a much better chance of wining the lottery. ECC does help you detect bad memory (and I do buy systems with ECC when possible) however you should have tested it before you installed it. As for memory going bad after it has been tested and installed. In the 15+ years I have administered machines for my work department I have yet to see a single dimm that has gone bad after being properly tested. All of my memory RMAs (on 200 to 300 machines) have been DOA situations.

I disagree, but only as much as you care about the integrity of your data....
If you look at an actual scientific study: http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

According to this study a DIMM has an 8% chance per year of getting a correctable error.
Okay... so only an 8% chance in a year, not bad odds right... wrong..... how many DIMMs do you have in your computer? 2? 4? 6 or more?
If you have 4 Dimms, that becomes a 28.3% chance of happening in a year on your system... now consider a ZFS file server (the original question in this post) The majority of it's function is to read/write files... so if you do get a flipped bit, chances are it will be on your data... now if I was building a system for ZFS storage (which I have, and used ECC) I would definitely want my files to not become corrupt during the time I have them stored, especially mission critical data or precious memories; pictures and videos. If the useful life of your file server is say 6 years... you are looking at that 28% chance happening on a 4-dimm system becoming a 86% chance over the 6 years of having an error....

But you are right that there is a point of diminishing returns... if you really wouldn't mind if a file or two became corrupt over the course of a few years because your data is either replaceable or not all that important... going without ECC will save you a lot of $$$ in a server build.
 
The study about ZFS and ECC can also found here as [17]
http://en.wikipedia.org/wiki/ZFS#Data_Integrity
There are lot of other studies on data corruption and ZFS cited here. Read them.

Here are information about ECC
http://en.wikipedia.org/wiki/ECC_memory
For instance, one satellite reports 280 bit errors in RAM, every day. Google reports that more than 8% of every DIMM gets error, each year. Google found that the error rates were several magnitudes larger than small scale studies showed.

http://en.wikipedia.org/wiki/Cosmic_ray#Effect_on_electronics
Studies in 1990 showed that every DIMM gets one cosmic ray induced error in RAM, every month. Therefore, Intel suggested that cpus should have cosmic radiation detectors built in. When radiation is detected, the cpu should redo calculations.

There is not much point in using ZFS, if you are using non-ECC RAM. The chain is not stronger than the weakest link. And if you are using ZFS, then you have a new weak link: the RAM.

When you strengthen your chain, other links becomes weak. But the whole chain becomes stronger. For instance, the network card has a small memory. Some NIC cards have ECC on that small memory. Others dont.
http://www.servethehome.com/intel-ethernet-controller-buffer-ecc-comparison/

My next storage raid server will use ZFS and ECC.
 
I totally disagree with that. If your memory is good and you are not overclocking you should never ever experience a bit flip short of the unlikelihood of being hit by a cosmic ray which I believe you have a much better chance of wining the lottery. ECC does help you detect bad memory (and I do buy systems with ECC when possible) however you should have tested it before you installed it. As for memory going bad after it has been tested and installed. In the 15+ years I have administered machines for my work department I have yet to see a single dimm that has gone bad after being properly tested. All of my memory RMAs (on 200 to 300 machines) have been DOA situations.

In the 12 years of being in IT in large installations I've seen countless memory modules go bad in service, in fact we sell it by the pound to a chip recycler. It's not just generic stuff either, I've seen Crucial/Micron, Hynix, Samsung, Infineon, etc. you name it, I've seen it go bad.

However ECC isn't about going bad, it's about bit flips, which can and DO happen more than most realize. It won't always be the bit that was heading to your disks and thus affect your storage, but someday it will. Using ECC adds another layer of protection to storage, so when combined with ZFS it greatly increases data integrity. In fact if you go look at every Sun server and appliance they sell, they all come standard with ECC memory. When it comes to the utmost reliability of data, cheaping out and using non-ECC is just installing a weak link.
 
I've used servers with both ECC and non ECC memory. The ECC ones are in the long run more reliable. Today when the price gap is so close it would be just plain dumb to not choose ECC memory.

I would even say for regular home use, ECC is a much better choice. Unless you only deal with read only data (like iPad).
 
The question is almost becoming moot as ECC really isn't that much more expensive and if you are going to the trouble of using ZFS for the extra data security you might as well get ECC. This can be done quite inexpensively with an AMD setup on a low end Intel Xeon (even some of the non-xeon cpus support ECC, just read the forums). Unless you are running an atom/e-350 you should get ECC it provides added protection with only a very small additional cost.
 
if you are going to the trouble of using ZFS for the extra data security you might as well get ECC

There's the key point. ZFS gives a false sense of security when not used with ECC RAM, as RAM is a pretty common source of corruption. There are, of course, still other sources of corruption but I think they are less common (and hopefully will be addressed in the future).

Granted, the raid and drive pooling features are still very robust, and if that is your main purpose ZFS is still very strong in that area.
 
just an addition

ECC RAM is an important key for software raid style protection, for examples: zfs, mdadm, and others
 
There's the key point. ZFS gives a false sense of security when not used with ECC RAM

Almost anyone I know that is versed enough to understand what ZFS is doesn't think that ZFS supplants or mitigates errors that ECC would catch. Maybe there are some people out there, but I don't think I've ever come across someone who thinks that using ZFS means that there is not a use for ECC ram in that build simply based on the implementation of ZFS alone.

Now the debates over ECC's usefulness in general, sure - plenty of material there - but as its relates to ZFS, I think most that have any experience or done any appreciable research prior to building would not have some false sense of security.
 
For me, ECC is not just the potential savings to my data; it's also the savings to my time when troubleshooting. For example, if something goes wrong with cables, before I have figured it out, I am probably not dedicating hours of my time running a memory checker on the RAM when I have ECC. I'm going to put memory on the bottom of the failure scenarios list and focus on other possible factors first, potentially decreasing MTTR.
 
One point I would like to make is that using NON-ECC on ZFS is no more unsafe than it uis when using any other file system setup (for example a windows server doing SMB file sharing, etc).
 
One point I would like to make is that using NON-ECC on ZFS is no more unsafe than it uis when using any other file system setup (for example a windows server doing SMB file sharing, etc).

I don't know if this statement is true or not..... I do know that ZFS calculates and writes checksum data in addition to writing data.... so that is two points where non ECC memory could cause an issue rather than one... also with ZFS you might be using ZIL or L2ARC cache devices... the point being that there are a lot more operations happening in a ZFS file system than a simple write on a NTFS formatted drive. I'm not an expert but I think that ZFS may be more vulnerable to memory errors when using non ECC memory than other file systems because of those reasons.
 
Zfs Will Do checksums, yes. But there is little overhead in checksums. The data is much more to write. Thus, zfs is hardly more sensitive to ram errors. Zil and l2arc are protected bychecksums, no worry for them.
 
Zfs Will Do checksums, yes. But there is little overhead in checksums. The data is much more to write. Thus, zfs is hardly more sensitive to ram errors. Zil and l2arc are protected bychecksums, no worry for them.

I realize that there is little overhead in doing checksums, thats not the point though, if you have checksums in memory in ADDITION to the data itself, and then again you have data and checksums being temporarily stored in memory while the data flows through ZIL and L2ARC, that is a lot more opportunities in the RAM for something to go wrong than a dumb drive system which goes straight from memory to disk...

It doesn't matter if the data is protected by checksums when using non-ecc memory because the memory bitflip could be on the checksums themselves...

I think the fact alone that ZFS thrives on and uses much more memory than other file systems would mean that it has that much more exposure to memory errors than other file systems.
 
One point I would like to make is that using NON-ECC on ZFS is no more unsafe than it uis when using any other file system setup (for example a windows server doing SMB file sharing, etc).

sharing level is on the top of software-raid where not related with software-raid level..

sharing is safe or unsafe? this depends on the sofware-raid.. below sharing level.
 
Zfs Will Do checksums, yes. But there is little overhead in checksums. The data is much more to write. Thus, zfs is hardly more sensitive to ram errors. Zil and l2arc are protected bychecksums, no worry for them.

once flip-flop on RAM, any software-raid (zfs or others) hardly can detect, since the writing to disk comes from memory, especially where any software-raid that really like a big memory allocation :).

flip-flop corrupted data are hard to detect on software-raid.. the common issue is "why the data that I copied months ago did not match with it's copied" or " I can't load these data on XYZ program . I Copied those data from an exact replica last year".
note: assuming the data is really big, for example: 10G or bigger..
^^^^
I faced that situation before realizing ECC ram is very important for software-raid
 
Here are information about ECC
http://en.wikipedia.org/wiki/ECC_memory
For instance, one satellite reports 280 bit errors in RAM, every day. Google reports that more than 8% of every DIMM gets error, each year.
This isn't 100% correct.

According to the paper here, it is "is more than 8% of DIMM affected by errors per year" and these are correctable errors

Looking into the paper:
For all platforms, the top 20% of DIMMs with errors make up over 94% of all observed errors
An important take away from this is most DIMMs don't have any errors but without ECC you are going to have a hell of a time isolating which DIMM is going wrong.

Google found that the error rates were several magnitudes larger than small scale studies showed
Another very important aspect is how strongly age correlates with errors, despite having no moving parts, DIMM do degrade over time.
 
I realize that there is little overhead in doing checksums, thats not the point though, if you have checksums in memory in ADDITION to the data itself, and then again you have data and checksums being temporarily stored in memory while the data flows through ZIL and L2ARC, that is a lot more opportunities in the RAM for something to go wrong than a dumb drive system which goes straight from memory to disk...

It doesn't matter if the data is protected by checksums when using non-ecc memory because the memory bitflip could be on the checksums themselves...
Let me try to explain again. ZFS does checksums, yes. But those checksums are tiny in comparison to the data. If you have 1000.000 bytes that you need to store with the ext3 filesystem, then ZFS might use, say, 1000.000 + 2000bytes. These extra 2000 byte checksum is a very small percentage of all data that needs to be stored, and therefore the risk for corruption in the checksums is small. Thus, ZFS uses a tiny bit of memory more than say, ext3.

If ZFS needed to store 1000.000 + 500.000 bytes, then the risk for corruption would increase much.


I think the fact alone that ZFS thrives on and uses much more memory than other file systems would mean that it has that much more exposure to memory errors than other file systems.
Why do you believe that ZFS uses much more memory than other file systems? It does not.

Sure, ZFS has a very advanced disk cache, but that is not unique, other filesystems also has disk caches. ext3 could also use GBs of RAM as disk cache. But ZFS in itself, does not use much more RAM. All disk caches use lot of RAM. Disk cache is not a unique ZFS design vulnerability.

If you turn off disk cache, then ZFS will use slightly more RAM than other filesystems. Not much more RAM.
 
.............

Why do you believe that ZFS uses much more memory than other file systems? It does not.

Sure, ZFS has a very advanced disk cache, but that is not unique, other filesystems also has disk caches. ext3 could also use GBs of RAM as disk cache. But ZFS in itself, does not use much more RAM. All disk caches use lot of RAM. Disk cache is not a unique ZFS design vulnerability.

If you turn off disk cache, then ZFS will use slightly more RAM than other filesystems. Not much more RAM.

ext3 is totaly different and old, comparing with zfs
ext3 does not have features in zfs, ext3 use cache GBs RAM? I do not realize that, could you point to references that ext3 uses GBs RAM?

ext3 does not support software-raid, you have to use mdadm on the top of ext-3 and not the same level as ext3. xfs has different approach..

ext3 has write cache mostly not big, and rely on HD write-cache ( option barrier=X)

comparing ext3 and zfs is like comparing orange and apple.
on native linux, there is no such zfs filesystem, unless ... btrfs filesystem (oracle again) willing to push hard the development to lonux

ext4 is just adding a refresh and enhancement, for example: checksum for journaling and others.

the question is "using RAM as a place holder for checking , calculating, and temporary big data holder, are safe without ECC?"
 
As for memory going bad after it has been tested and installed. In the 15+ years I have administered machines for my work department I have yet to see a single dimm that has gone bad after being properly tested.

REALLY?! :confused:
 
Why do you believe that ZFS uses much more memory than other file systems? It does not.

Sure, ZFS has a very advanced disk cache, but that is not unique, other filesystems also has disk caches. ext3 could also use GBs of RAM as disk cache. But ZFS in itself, does not use much more RAM. All disk caches use lot of RAM. Disk cache is not a unique ZFS design vulnerability.

If you turn off disk cache, then ZFS will use slightly more RAM than other filesystems. Not much more RAM.

Everything I have read about ZFS points to that it absolutely loves RAM and will eat up as much RAM as it can:

http://wiki.freebsd.org/ZFSTuningGuide
First statement: "ZFS needs *lots* of memory."

http://constantin.glez.de/blog/2010...oracle-solaris-zfs-filesystem-performance#ram
"Add More RAM"

http://www.trainsignal.com/blog/zfs-nas-setup-guide
"ZFS system needs a minimum of 6 GB of RAM to achieve decent read/write performance and 8GB is preferred. For systems with more than 6TB of storage it is best to add 1GB of RAM for every extra 1TB of storage."

I don't know of any other file system that many people including _Gea recommend 6GB of RAM or more....
If I copy a bunch of data to my ZFS file server I can see the ram usage being totally used up. Its not a bad thing, it is doing its job, but if I make the same writes to a windows server box, I don't see anything close to the ram usage of ZFS.
Because of this I would say that ZFS is much more exposed to RAM bitflip errors on non-ecc memory since it is using more memory for the "same" operation (storing files)
 
I've recently build a ESX+ZFS allinone system for home, I've looked at going ECC however the cost is not that small, if you want a high end system you need to go with Xeon, Xeon MB and ECC ram..

Xeon itself was +200$, MB was + 350$, and Ram was + 250 $ for 32GB (4x8GB), that's 800$ on a 2000$ system. I decided that the cost was not justifying it for me. I know it's BETTER to go with ECC, however I will no suffer from a massive issue if I loose a picture, or a song that is stored on my NAS, or if a VM i'm running crash. And I've been running a hyper-v on a non ecc system and it's been very stable.

I've worked in a computer store and cheap ram tend to die pretty much at anytime, brandnew and old ones. However higher quality non-ecc ram has been very reliable, usually it die in the first month, or it dosen't die.

There are better place to put the money for home user such as a true sin-wave UPS etc, but that's just my 2 cents.

If your money is infinite or someone would die if you loose some data go ECC, otherwise for home, I am not be too worried.
 
Hmm, on my ext3/4 systems, I have been running 4-8gigs of free ram, dedicated just to filesystem buffers. I also have several more systems that just won't perform, if I have <2gig of cache/buffers for ext4.

Think the main difference is the tree structure of zfs? when you update a block of data, the checksum gets updated in the tree level above that block, all the way up to the root. If all of these metadata items aren't cached in ram, it's going add more load to the system. Where for ext3/4 it just updates the journal entry, then updates the fs.
 
Any storage system really should have ECC, it's not just limited to ZFS. However, most people focus on ZFS because they want the ultimate in storage security (mostly bit rot)--at least from discussions here. If you're that concerned with it, and you're building a *new* system (not re-purposing an old desktop), then it should properly be ECC, that's all I'm saying. From build threads here I've read, I absolutely think people have a false sense of security with ZFS (some--not all people).

For me, converting to a ZFS-based operating system for my server is a pain in the rear due to limitations in the OS compared to what my server does for me. If I'm going to go to all the trouble to make sure my data is safe on ZFS, there's no sense in sparing a minor detail like ECC RAM. I've had a test OI+Napp-It running in practically minutes, but there's a lot of back-end work that needs to be done to get to the same setup I have now.
 
sharing level is on the top of software-raid where not related with software-raid level..

sharing is safe or unsafe? this depends on the sofware-raid.. below sharing level.

What in the world are you talking about? Your post doesn't even make sense, and I didn't mention anything to do with raid, much less software raid.


ext3 is totaly different and old, comparing with zfs
ext3 does not have features in zfs, ext3 use cache GBs RAM? I do not realize that, could you point to references that ext3 uses GBs RAM?

ext3 does not support software-raid, you have to use mdadm on the top of ext-3 and not the same level as ext3. xfs has different approach..

ext3 has write cache mostly not big, and rely on HD write-cache ( option barrier=X)

comparing ext3 and zfs is like comparing orange and apple.
on native linux, there is no such zfs filesystem, unless ... btrfs filesystem (oracle again) willing to push hard the development to lonux

ext4 is just adding a refresh and enhancement, for example: checksum for journaling and others.

the question is "using RAM as a place holder for checking , calculating, and temporary big data holder, are safe without ECC?"

Both in Windows and Linux, on ANY filesystem, any ram that is un-used by processes is allocated to disk cache. This is why going from 4GB to 8GB can speed up a windows box quite a bit, even if you arent maxing out the 4GB in the first place.
 
Xeon itself was +200$, MB was + 350$, and Ram was + 250 $ for 32GB (4x8GB), that's 800$ on a 2000$ system. I decided that the cost was not justifying it for me. I know it's BETTER to go with ECC, however I will no suffer from a massive issue if I loose a picture, or a song that is stored on my NAS, or if a VM i'm running crash. And I've been running a hyper-v on a non ecc system and it's been very stable.

I know, I think it is ridiculous that you have to bump up to a Xeon in order to get ECC with intel... almost all amd processors even some most of the cheaper ones all support ECC since athlon64 days as long as the motherboard supports it as well.... thats at least one reason to go AMD, you can build an ECC box for a bit cheaper.
 
I know it's BETTER to go with ECC, however I will no suffer from a massive issue if I loose a picture, or a song that is stored on my NAS

It'd have to be some pretty drastic memory problems before you got to that stage! :)


Also, how do people get content/data onto their server/NAS?
If it's a copy over the network, from a desktop PC (I suspect the majority of home users do it this way) then is that PC itself ECC protected? Is it protected against on-disk corruption?

I'm not saying ECC memory isn't worthwhile, but you do have keep it all in perspective!
 
What in the world are you talking about? Your post doesn't even make sense, and I didn't mention anything to do with raid, much less software raid.




Both in Windows and Linux, on ANY filesystem, any ram that is un-used by processes is allocated to disk cache. This is why going from 4GB to 8GB can speed up a windows box quite a bit, even if you arent maxing out the 4GB in the first place.

you first posting was not clear for me, I try the best on my perception

---------
on Linux, the sOS would try to claim available memory where does not mean using the whole claimed memory, we have swap file/partition when needed
speeding up on linux does not mean the OS using for write cache.
ext3 is not memory hunger as I know since ext3 is already outdated where memory was the main concern at that time.
ext4 has enhancement on ext3
as I said, linux mdadm is sitting on the top of ext3 in this scenario
could you refer to any article that ext3 using GB ram?...


I did not mention windows on my posting:p.
 
I know, I think it is ridiculous that you have to bump up to a Xeon in order to get ECC with intel... almost all amd processors even some most of the cheaper ones all support ECC since athlon64 days as long as the motherboard supports it as well.... thats at least one reason to go AMD, you can build an ECC box for a bit cheaper.

ditto!
going AMD is a lot cheaper :), I did three months ago with FX4100 with unbuff ECC ram

I wished... Intel will support ECC on ivy platform(update)
 
For me, ECC is not just the potential savings to my data; it's also the savings to my time when troubleshooting. For example, if something goes wrong with cables, before I have figured it out, I am probably not dedicating hours of my time running a memory checker on the RAM when I have ECC. I'm going to put memory on the bottom of the failure scenarios list and focus on other possible factors first, potentially decreasing MTTR.

Yep having and using ECC RAM almost completely removes it from the troubleshooting equation. And saves you from the joys of running 1-3 day memtest sessions to try to identify marginal DIMMs for RMA.

Intel needs to stop turning off ECC in their consumer grade CPUs and chipsets. It's really annoying.
 
Is it true that if ZFS metadata gets corrupted, your entire pool is fubar'd? Because I wouldn't want that. That seems undesirable.
 
Intel hardly charge any more for the equivalent Xeons, and often the workstationy/server mbs are cheaper than the good enthusiast mbs.

For instance, LGA 2011:
i7-3820: $310 vs e5-1620: $320
i7-3930k: $570 vs e5-1650: $580 (estimated, couldn't find it at newegg or blt)
i7-3960x: $1040 vs e5-1660: $1070

And on the LGA1155 picture side of things, often times, if you aren't interested in OCing, a Xeon e3-12xx can be a better deal than the comparable i5 or i7.

Basically the i5s and i7s sometimes give you overclocking, and the Xeons always give you ECC. I want, and would even pay a slight price premium, for both. Though if I can only have one, I'll usually choose the ECC.

Also this is a recent development, you could have both with LGA775 and the x38 chipsets. It's only with the move to on die memory controllers that they've started fusing out the ECC circuitry on the consumer CPUs.
 
Yup Xeon E3 + C206 board is a great choice for a low budget server. If you have an even lower budget you can always go AMD.
 
Back
Top