So, a while back in one of the ZFS build threads, I suggested that someone building a new system should not purchase parts that do not include ECC RAM for their ZFS system. The dicussion continued, and someone asked me to produce any real tangible evidence that ZFS is at-risk without ZFS. I mentioned a study, but didn't have a link as I didn't have access to where I stored it. I found it again (the PDF, anyway), and figured it was worth sharing some details.
The study I found is titled "End-to-end Data Integrity for File Systems: A ZFS Case Study", by Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau (Computer Sciences Department, University of Wisconsin-Madison).
I did not re-read it again in its entirety (did when I originally found it), but they have a few interesting gems. The study is primarily about ZFS error recovery on-disk, but they tested RAM as well. They even calculated the probability of flipped bits based on workload type, and type of bit-flip.
Probably the best memory-related summation, from the study is:
Anyway, I thought it was worth sharing, since I finally found it. I'd be interested to hear others' thoughts about the study.
The study I found is titled "End-to-end Data Integrity for File Systems: A ZFS Case Study", by Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau (Computer Sciences Department, University of Wisconsin-Madison).
I did not re-read it again in its entirety (did when I originally found it), but they have a few interesting gems. The study is primarily about ZFS error recovery on-disk, but they tested RAM as well. They even calculated the probability of flipped bits based on workload type, and type of bit-flip.
Probably the best memory-related summation, from the study is:
5 In-memory data integrity in ZFS
In the last section we showed the robustness of ZFS to disk corruptions. Although ZFS was not specifically designed to tolerate memory corruptions, we still would like to know how ZFS reacts to memory corruptions, i.e., whether ZFS can detect and recover from a single bit flip in data and metadata blocks. Our fault injection experiments indicate that ZFS has no precautions for memory corruptions: bad data blocks are returned to the user or written to disk, file system operations fail, and many times the whole system crashes. This section is organized as follows. First, we briefly describe ZFS in-memory structures. Then, we discuss the test methodology and workloads we used to conduct the analysis. Finally, we present the experimental results and our observations.
Anyway, I thought it was worth sharing, since I finally found it. I'd be interested to hear others' thoughts about the study.