Quiet high-end ZFS NAS home file server, Dec '09 edition

All the hardware RAID controllers have an option to do scheduled scans of the stripe integrity. It is not necessary to check on every read, if you do a scan, for example, once a week.
 
If you care so much about your data to use ZFS, why would you then trust it to a barely maintained fuse plugin?
True. So far the implementation in FreeBSD 8 is working very well for me.

Let's tell you guys a bit about my setup, and perhaps you'll understand my love for ZFS:

I have 5 workstations running Ubuntu Linux, but none of them have internal harddrives. They all use iSCSI-on-root, meaning they mount the system drive over the network, and also boot from the network. That works with the motherboard's NIC just fine, no special hardware needed.

The server runs FreeBSD 8 and of course ZFS. I actually have two servers with the second being passive and offline for the most part; every night it switches on to create a snapshot, sync all ZFS filesystems with the master fileserver and shutdown again. This is a very safe solution in my eyes.

So back to iSCSI, the system disks of my 5 workstations reside on RAID-Z with HDDs, so read latency is normally quite high. But by adding a RAID0 of two small SSDs as cache device, the portions that are most accessed will stay on the SSD and have read latencies from the SSD (very low). It's like having RAM filecache, but then my 2x30GB SSDs have much more storage than my system's RAM (8GB).

So essentially, all my system drives run on the SSDs on the FreeBSD server. Writes are buffered anyway so have no latency. The SSDs are not very high speed, but the low read latency are still a huge improvement over responsiveness. The cache device is mostly read-only, and only gets updated as my usage pattern changes. So that also means it automatically adapts to my usage pattern. The system drives on the workstations are only 8GB large, and mount the larger (shared) filesystem through NFS.

So far, i'm very happy with this setup. ZFS can correct any corruption, i have snapshots so can always go back to an earlier date. And, paranoid as i am, i also have a complete mirrored slave fileserver as true backup, only without the SSDs.

I would very much like an update to 10Gbps though, as this bandwidth limit now also applies to the system drive. But affordable products that are supported by FreeBSD would probably take awhile, although a bunch of 10Gbps adapters are already supported, none of those use the 10GBaseT ("copper") standard. Still i hope 2010 will see some cheaper products implementing 10GBaseT, which works on my existing Cat6 cabling.
 
Last edited:
True. So far the implementation in FreeBSD 8 is working very well for me.

Let's tell you guys a bit about my setup, and perhaps you'll understand my love for ZFS:

I have 5 workstations running Ubuntu Linux, but none of them have internal harddrives. They all use iSCSI-on-root, meaning they mount the system drive over the network, and also boot from the network. That works with the motherboard's NIC just fine, no special hardware needed.

The server runs FreeBSD 8 and of course ZFS. I actually have two servers with the second being passive and offline for the most part; every night it switches on to create a snapshot, sync all ZFS filesystems with the master fileserver and shutdown again. This is a very safe solution in my eyes.

So back to iSCSI, the system disks of my 5 workstations reside on RAID-Z with HDDs, so read latency is normally quite high. But by adding a RAID0 of two small SSDs as cache device, the portions that are most accessed will stay on the SSD and have read latencies from the SSD (very low). It's like having RAM filecache, but then my 2x30GB SSDs have much more storage than my system's RAM (8GB).

So essentially, all my system drives run on the SSDs on the FreeBSD server. Writes are buffered anyway so have no latency. The SSDs are not very high speed, but the low read latency are still a huge improvement over responsiveness. The cache device is mostly read-only, and only gets updated as my usage pattern changes. So that also means it automatically adapts to my usage pattern. The system drives on the workstations are only 8GB large, and mount the larger (shared) filesystem through NFS.

So far, i'm very happy with this setup. ZFS can correct any corruption, i have snapshots so can always go back to an earlier date. And, paranoid as i am, i also have a complete mirrored slave fileserver as true backup, only without the SSDs.

I would very much like an update to 10Gbps though, as this bandwidth limit now also applies to the system drive. But affordable products that are supported by FreeBSD would probably take awhile, although a bunch of 10Gbps adapters are already supported, none of those use the 10GBaseT ("copper") standard. Still i hope 2010 will see some cheaper products implementing 10GBaseT, which works on my existing Cat6 cabling.

Neat setup! You are basically exporting iscsi luns from the ZFS server, correct?
 
All the hardware RAID controllers have an option to do scheduled scans of the stripe integrity. It is not necessary to check on every read, if you do a scan, for example, once a week.
Still doesn't help RAID5. Still doesn't help with RAM bit flips either, as the hardware RAID controller assumes the data that it's been given is valid.
 
Neat setup! You are basically exporting iscsi luns from the ZFS server, correct?
Yes, they are on the RAID-Z array as zvol - so no ZFS filesystem just a zpool (the RAID-part of ZFS), the actual filesystem is Ext4 as that's what Ubuntu uses by default. So that means i have 5 zvol's x8GB of reserved space as a block device /dev/zvol/<name>. That device is then exposed to the iSCSI-target ("istgt") daemon in FreeBSD, which is not part of native ZFS; i believe in OpenSolaris it is. So that's it: Ubuntu workstation -> Network -> FreeBSD server -> SSD Cache -> zpool containing Ext4 filesystem.

And it works great, especially the active data being stored on the SSDs is clearly noticeable. Before when using HDDs, everything i did had a huge latency and i'm not known as a patient person. After i added the SSDs as cache device (in a RAID0) it runs very smooth, and all my storage is centralized. The actual HDDs are just 5400rpm WD Green drives, with TLER disabled.
 
Last edited:
True. So far the implementation in FreeBSD 8 is working very well for me.
....
Lovely setup. If you have the time, why don't you write a small how-to for it. I would love to replicate something like that, but I am lacking the time to scavenge together all the knowledge that I would need for such a project.
 
I really like ZFS. I wish Sun had not taken such a proprietary path with it, but instead fully open sourced it so it could be made part of the linux kernel.
The reason ZFS isn't in Linux is that the Linux license doesn't permit linking to non-GPL code (i.e., the CDDL-licensed ZFS code). Now, this raises the question "Why didn't Sun license ZFS under GPL", I guess, but as it stands it's Linux's license that stops its integration, not ZFS'. IIRC Windows would be on firm ground to include ZFS in their OS if they wanted to, provided that they release any changes they made to it (and no other part of their kernel!).
If that had happened, th community with have added reshaping support to it - there is nothing in the ZFS architecture that prevents it.
That doesn't make it easy. The problem is basically this: one block can be referenced in a bunch of datasets. You make a filesystem, put a file in it, snapshot it, clone that, take another snapshot, and so forth, and now a thousand pointers know where that block is supposed to be. Now you want to change where that block is located. Well, you can't just change the pointers one at a time to point to the new location. If something happens while this is going on (remember, all the filesystem operations are online: you can keep using the pool while they happen) that changes the block (or some copies of it) everything should see a consistent view of things. That means you need to find all the pointers to a given block and change them all in a single transaction.

So it's technically feasible, but computationally expensive---at least the naive way. You can play tricks like looking at the other pointers near the one you want to change, so you can change a whole bunch of blocks over at once, but it's still going to be a fairly arduous process. I wouldn't be surprised to see something like the dedup tables used to speed up the process of moving all the data to its new locations.
The other issue is that OpenSolaris has limited driver support for a lot of hardware that folks like us like to use.
Like what? OpenSolaris doesn't have support for my integrated graphics card, but it runs fine in console mode. Support for disk and network controllers is fairly good, in my experience.
This makes it non-trivial to build a system that can run ZFS well.
Intel or AMD CPUs, with ECC memory if you can get it. Intel NICs, LSI SAS controllers. Whatever disks you like.
Plus there are a lack of GUI tools for management that folks here like.

I know senior people at Sun, and they know what needs to change to make it work really well for our type of use.
GUI tools aren't everything. Being able to write down concisely how you got where you are ("zpool create foo raidz2 c1t0d0 ...", say) is quite nice. Remote administration over ssh is the most convenient system I've ever used... but I'll admit it's not easy to get used to.
However, Sun's fortunes were less than bright, and the Oracle purchase has basically gutted ZFS development. Oracle has a next gen system called BTRFS that isn't bad, but the conflict means that ZFS will be EOL'd at some point.
... or maybe the reason to buy Sun wasn't to spend $7.4 billion on things they wanted to throw out the window, but to get the existing, working, tested code.
Maybe they can get the whole ZFS code base open sourced and then it can live on past Sun, but Oracle isn't exactly that type of organization...
Wikipedia said:
ZFS is implemented as open-source software, licensed under the Common Development and Distribution License (CDDL).
Here, look at the source!
 
No, with RAID 6 there is enough information to correct a bit flip on one drive, as I already explained. Perhaps the problem is that you do not understand RAID 6. RAID 6 does not use a simple parity checksum, but rather a kind of Reed-Solomon error correcting code with Galois fields math. The computation is complicated, but I think I explained clearly in my previous post how an error could be detected and corrected with RAID 6.
Saw this post today.

I just wanted to add: no, you are wrong. RAID-6 is not safe and does not give you data integrity. Here are lots of research papers. One of the research papers is about RAID-6. It says:

"The paper explains that the best RAID-6 can do is use probabilistic methods to distinguish between single and dual-disk corruption, eg. "there are 95% chances it is single-disk corruption so I am going to fix it assuming that, but there are 5% chances I am going to actually corrupt more data, I just can't tell". I wouldn't want to rely on a RAID controller that takes gambles :)"

http://opensolaris.org/jive/message.jspa?messageID=502969#502969
If you are really interested in ZFS and file corruption, you should read this post.
 
No matter what RAID, if you're still using a legacy filesystem that provides no additional protection beyond what the hardware (an RAID is hardware from the filesystem's point of view) can provide. ZFS uses replicated meta-data; in essence all metadata is backed up, even on a single disk pool.

If you value data integrity, the only real option is a combination of solid backups and checksumming filesystems, like ZFS and Btrfs.
 
I just wanted to add: no, you are wrong.

No, you need to read what I wrote (and you quoted!). I said that RAID 6 has enough information to correct one bit flip. That is correct. Your point that RAID 6 cannot correct 2 or more bit flips is true, but does not invalidate my statement.
 
No, you need to read what I wrote (and you quoted!). I said that RAID 6 has enough information to correct one bit flip. That is correct. Your point that RAID 6 cannot correct 2 or more bit flips is true, but does not invalidate my statement.
What I am trying to say, is that raid-5 nor raid-6 is safe. In my link, there are lots of research papers that show hw raid is not safe. Read my link.
 
What I am trying to say, is that raid-5 nor raid-6 is safe. In my link, there are lots of research papers that show hw raid is not safe. Read my link.

Why? You write like there is some secret revelation in those papers. I am aware of the properties of various types of RAID. Nothing is completely safe. Yes, ZFS adds checksums. Good for ZFS.
 
Well, the point here is that the legacy filesystems like NTFS can not protect against BER; so you rely on the RAID layer to protect you against that. Traditional RAID5 and RAID6 are limited in theory to what they can do, and limited even more in the opportunities/safeguards that actual implementations use. A hardware RAID5 of 8 disks and NTFS on top of it, is just not reliable these days with 2TB disks with BER issues. If you lose one disk then you already in trouble if one of the remaining has BER. RAID6 can protect you from some of these headaches, but it remains unsafe by design that the filesystem does not handle redundancy itself. The design of ZFS is a logical next step in filesystem design, where the filesystem uses multiple disks directly and protects against many more dangers than traditional filesystems today offer you.

If you value data integrity, traditional RAID and NTFS just aren't your best choice. Those people would be served better by ZFS. But other users have just casual data and prefer convenience over serious data protection. I don't disapprove of those, but problems start arising when people thought they had a good protection with hardware RAID5, and later discover it's unsafe as hell.

For those people, they might end up running something like ZFS, but they already bought an expensive real hardware RAID controller because they thought that was a good investment, while now they end up with it being an annoyance, since the RAID controller still drops disks of it's own, even in JBOD/non-RAID mode. So those people would need to buy a new HBA as well.

These kind of learning experiences are painful. That's why forums are here, so we can learn from eachothers experiences. Everyone has a different viewpoint though, that's alright, as long as we respect the different desires and priorities of people.
 
As much as people might like to run ZFS, it is still constrained because it is not available on mainstream OSs. You are tied to freeBSD, a somewhat archaic Unix derivative with limited hardware support and still saddled with bugs/issues long since fixed in mainstream Linux. Or OpenSolaris, an OS that is not really open and - thanks to Oracle - is no longer even Solaris.

It also puts demands on your system that are by no means "free". It has obnoxious memory hogging behavior - and bugs with serious consequences if you don't feed the pig by giving it all the memory it wants. It also consumes significant processing resources doing its somewhat complex math. No disagreement that "modern" processors can "handle this", but you are definitely competing for resources which are not free. If you are running it on a dedicated NAS, yeah, OK, its probably no problem. But if you are running it on a workstation doing other work (video editing, for example) and need large quantities of local disk with very high performance and reasonable levels of protection then ZFS is absolutely the wrong answer. Hardware raid is - and will remain for a long time - the right answer.

While many might agree with the concept of ZFS, unless and until it makes its way into the mainstream OSs it will remain an interesting niche solution with a few relatively zealous proponents. And even then, it will not be the only solution for large protected datasets. It will remain one solution to a targeted set of problems, with other solutions - like hardware raid - remaining valuable solutions to other problem sets (problem sets that often overlap and cause confusion, BTW).
 
Last edited:
Then keep using NTFS. :D

No seriously, i do think many people are realising RAID solutions on Windows just aren't very reliable and silent corruption is a problem. There's nothing zealous about looking for good alternatives that solve this problem, especially if it can be used without learning the core operating system itself (Linux/Solaris/BSD).

And FreeBSD is everything but archaic. It's superior SMP implementation has benefited quite a few project, among which Firefox, the Linux kernel, and MySQL. It just has the problem that although technology can flow from BSD to Linux, there can't be any flow from Linux to BSD. On the other hand, its license does allow to import technologies like ZFS, D-trace, pf firewall and many more foreign technologies. Even Windows uses BSD technology, parts of the network stack were taken from BSD. FreeBSD has traditionally been the progenitor of network stacks. It's anything but archaic.

The Debian/kFreeBSD project also aims at replacing the Linux kernel with a BSD kernel, and is taking shape nicely. So perhaps Ubuntu 12.04 will support ZFS via FreeBSD kernel. :D

So i think there will be plenty of opportunities to use ZFS for those that want to. And that group of people just keeps growing until Microsoft addresses their archaic support for modern storage.
 
Silent corruption isn't as much of a real problem as it is a theoretical one...
 
Seriously? Well, there is performance, convenience, and OS compatibility.

There are numerous ZFS benchmarks in this thread alone, sometime once the new Intel SSDs drop I'll be building a ZFS RAIDZ SSD array with at least 5 drives

Unless I'm willing to add another $XXXX price for a decent hardware raid card and battery backup unit .... :(

If you have even tried submesas zfsguru you have no idea how easy it is to set a zfs array up.... and I don't know what OS you are running that doesn't have samba (windows file sharing). My Win7 and mac OS X laptops connect to the zfs guru shares at home (or even the opensolaris and openindiana shares at work)
 
When something as complex as zfs gets easy enough for an technically challenged person to set up, and when they(the technically challenged) do go live with zfs(the unbreakable, never buggy, fixes the unfix-able, magical fs) with their data, that is when one is begging for massive data-loss for that person, as they disregard the most important thing, backups, as they don't see why that would be needed since it's a magical fs<period>
 
There are numerous ZFS benchmarks in this thread alone, sometime once the new Intel SSDs drop I'll be building a ZFS RAIDZ SSD array with at least 5 drives

Unless I'm willing to add another $XXXX price for a decent hardware raid card and battery backup unit .... :(

If you have even tried submesas zfsguru you have no idea how easy it is to set a zfs array up.... and I don't know what OS you are running that doesn't have samba (windows file sharing). My Win7 and mac OS X laptops connect to the zfs guru shares at home (or even the opensolaris and openindiana shares at work)

I've never seen a ZFS benchmark where the performance is as high as I have seen from several good hardware RAID cards. This is not surprising, since the ZFS architecture (variable stripe size, etc.) was not built for performance.

Apparently you have limited experience with RAID applications. Some applications require local RAID with an OS that is not compatible with ZFS.

And even with a GUI, ZFS cannot compete with hardware RAID for convenience. Particularly with replacing failed drives, where the HW RAID card just lights up an LED next to the failed drive, and you swap it out, and everything proceeds automatically. This is helpful if you have racks full of hot swap bays.

Just because your application does not benefit from hardware RAID does not mean that there are no applications where hardware RAID is the best choice.
 
When something as complex as zfs gets easy enough for an technically challenged person to set up, and when they(the technically challenged) do go live with zfs(the unbreakable, never buggy, fixes the unfix-able, magical fs) with their data, that is when one is begging for massive data-loss for that person, as they disregard the most important thing, backups, as they don't see why that would be needed since it's a magical fs<period>

+1. Can't be said better.
 
The thing about ZFS is that theoretically, it is bulletproof. But in reality, not all storage devices behave as the theory assumes. One issue for ZFS is if a storage device reports that it has written data to non-volatile storage (rather than cache) but the device has not actually completed the write, then ZFS can have massive data corruption upon power loss or certain other failures (this problem is not unique to ZFS, obviously). Theoretically, of course, the storage device should not do that. But in the real world, some do. The ZFS zealots claim that ZFS does not need a fsck.zfs type program, since theoretically a ZFS filesystem cannot become corrupted. But in reality, it can and has.

Bottom line is that nothing is perfect. If you start assuming it is, then you are just setting yourself up for failure.
 
I've never seen a ZFS benchmark where the performance is as high as I have seen from several good hardware RAID cards.
Still not bad for something you get for free, and even while utilizing all its protections which adds upkeep, it still manages to perform very decently, well above gigabit limits.

If you want more ZFS performance, SLOG and L2ARC can help considerably for both random reads and writes (random or not). The money you save by not investing in a full-blown RAID controller and BBU and TLER-capable harddrives could be spend on a good SSD, especially when the new batches of supercapacitor-enabled SSDs arrive; this is a hot feature!

This is not surprising, since the ZFS architecture (variable stripe size, etc.) was not built for performance.
If this is your conclusion of our earlier discussion, then we still have a lot to talk about i guess. Variable stripesizes are not what makes ZFS slow; it is what makes random writes to RAIDZ/RAIDZ2 much faster than is possible with traditional RAID5 and RAID6, and it solves the write hole too, 2 wins in 1 blow.

If ZFS is slow or doesn't scale linearly, it is due to it being a complex piece of software. Some code like RAID-Z is really simple and tidy, but other code would need some maturing before ZFS is more streamlined, especially regarding transaction groups. The scaling will increase by using a SLOG, though.

Still i'm not disappointed with the results, i'd say ZFS performs great!

And even with a GUI, ZFS cannot compete with hardware RAID for convenience. Particularly with replacing failed drives, where the HW RAID card just lights up an LED next to the failed drive, and you swap it out, and everything proceeds automatically. This is helpful if you have racks full of hot swap bays.
Can implement something like that easily, to let all LEDs light up in a pool, and you will know which disk has failed since it does not light up. Also you give names to your disks, so you may know which disk is in which casing beforehand and can write labels near the HDD swap bays.

Just because your application does not benefit from hardware RAID does not mean that there are no applications where hardware RAID is the best choice.
I agree. :)

If you're on Windows platform, regardless of the reasons, and you want high-performance mass-storage then Hardware RAID can be a good solution. I still would think SSDs might be more suitable, and work well with onboard RAID0. It all depends on your situation; some people need high performance I/O but don't care if it get's destroyed; since they can re-create that data or re-download, or whatever. Other people need high-performance local storage that is so huge that SSDs is not an option.

In the future with 10 gigabit ethernet or similar LAN technologies, NAS may compete with local hardware/onboard RAID Windows solutions, but most people would be limited by gigabit speeds, and thus may depend on other solutions to get local I/O performance, depending on their setup.

So there is no good solution, it's not a case of hardware RAID versus ZFS, just a case of different needs, different solutions and common sense.
 
Can implement something like that easily, to let all LEDs light up in a pool, and you will know which disk has failed since it does not light up. Also you give names to your disks, so you may know which disk is in which casing beforehand and can write labels near the HDD swap bays.

On hardware RAID, the failure LED comes on automatically, no need to implement anything, or even touch a keyboard. And no need to play, 'which LED is not on' in a bank of hundreds of bays. And labeling each disk individually? That is not convenient. Hardware RAID cards are more convenient than ZFS for some applications, as I already said.
 
Still not bad for something you get for free...

Nothing is free. ZFS comes with a huge memory footprint, an almost insatiable desire for more memory, and consumes a significant number of cycles on your CPU that you may need for other purposes.

Perhaps you could get away with "something you get for less than hardware raid costs today". But the line above just calls you out as a Zealot and not objective.
 
On hardware RAID, the failure LED comes on automatically, no need to implement anything, or even touch a keyboard.
If that is a priority to you, to save a few minutes of your precious time, then by all means opt for the Hardware RAID solution.

Nothing is free. ZFS comes with a huge memory footprint, an almost insatiable desire for more memory, and consumes a significant number of cycles on your CPU that you may need for other purposes.
Well from all the money you save (TLER disks, Hardware RAID controller, BBU) you surely can spend 120 euro on 8GiB DDR3 ECC DRAM; or 180 euro for 12GiB. And depending on what features you use, ZFS will run decently on low CPUs like an Atom. If you are going to do alot of things on the background on your multi-purpose NAS then you would want something more powerful.

Perhaps you could get away with "something you get for less than hardware raid costs today". But the line above just calls you out as a Zealot and not objective.
It's amusing to see you're desperately trying to discredit ZFS, while ZFS is just so innovative and great that it basically sells itself! No pro-arguments needed; people are quickly convinced this is the next-generation storage that they want. Their only problem is accessibility; getting it to run on their systems, and lack of information sources and support. All these are valid arguments, but as each users needs are different those argument may be compelling or worthless.

But just look at this forum; many people want to run ZFS, as it provides alot of benefits these home users, giving them reliable network storage with great cost-per-gigabyte as well.

If you want to call me a zealot that's fine. I rather be a zealot than a hater.

Oh and another thing. You don't find me preaching ZFS in topics that clearly are Windows or WHS storage domain. It is somewhat appalling to see a witchcraft happening against ZFS in a thread that started clearly in ZFS context, and not with the desire to make it into a ZFS versus whatever discussion. Wouldn't you consider it fair to create your own thread to 'discuss the inferior ZFS filesystem' and suck all the negative air out of this thread into yours, what do you say?
 
Well from all the money you save (TLER disks, Hardware RAID controller, BBU) you surely can spend 120 euro on 8GiB DDR3 ECC DRAM; or 180 euro for 12GiB. And depending on what features you use, ZFS will run decently on low CPUs like an Atom. If you are going to do alot of things on the background on your multi-purpose NAS then you would want something more powerful.
Contrary to your claims here and elsewhere, TLER drives are not really needed for raid. TLER is an invention of WD in response to the fact that they overdid their retry algorithm. Many people have discussed this at length and shown that drives exist that do not suffer from this problem (e.g., Hitachi) and consumer grade disks are just fine. You conveniently ignore this every time you get onto your anti-raid crusade, even though it has been pointed out in some detail by the most respected members of this forum (odditory, nitro and others). Also, you really should actually read my posts before you respond to them. At no point have I said that ZFS is not appropriate for a dedicated NAS. As for the Atom, might be OK (although most Atom systems are too memory limited to actually run it safely - but that's quite another topic).

It's amusing to see you're desperately trying to discredit ZFS, while ZFS is just so innovative and great that it basically sells itself! No pro-arguments needed; people are quickly convinced this is the next-generation storage that they want. Their only problem is accessibility; getting it to run on their systems, and lack of information sources and support. All these are valid arguments, but as each users needs are different those argument may be compelling or worthless.
If ZFS "sold itself" it would be everywhere. It isn't. Why? Because it does is not supported on mainstream OSs, it is a PITA to administer, the OSs that it does run on have limited hardware support, it is a memory hog, on and on. It is a great filesystem with some interesting features. But it is far from perfect for all users.

But just look at this forum; many people want to run ZFS, as it provides alot of benefits these home users, giving them reliable network storage with great cost-per-gigabyte as well.
Yup. Same is true of Raid - even though you spout off at every turn how horrible that is. Same is true of WHS. In fact reading posts here its pretty clear that there are lots of good choices out there - one size does not fit all.

If you want to call me a zealot that's fine. I rather be a zealot than a hater.
Zealots need a counterweight. I don't hate ZFS, but it is also not the "one true religion".

Oh and another thing. You don't find me preaching ZFS in topics that clearly are Windows or WHS storage domain. It is somewhat appalling to see a witchcraft happening against ZFS in a thread that started clearly in ZFS context, and not with the desire to make it into a ZFS versus whatever discussion. Wouldn't you consider it fair to create your own thread to 'discuss the inferior ZFS filesystem' and suck all the negative air out of this thread into yours, what do you say?
I think you need to say "you don't find me preaching ZFS in topics that are clearly windows or WHS anymore". But it hasn't been that long...

Having said my peace, I'll leave your holy ground to your bare feet. Just please tell ZFS story and lighten up on the "raid is evil" BS. You'll draw fewer critics.
 
Silent corruption isn't as much of a real problem as it is a theoretical one...

Oh its quite real. I've hit it at home and we've hit it at work in the past as well. It's just in most cases people don't notice things are wrong because they aren't checking if they are wrong. It's like saying that you don't need ECC memory because you've never had a memory error in your memory without ECC.

Unless you have some sort of checksum on your data you generally won't notice SDC even though you will be getting incorrect results.
 
On hardware RAID, the failure LED comes on automatically, no need to implement anything, or even touch a keyboard. And no need to play, 'which LED is not on' in a bank of hundreds of bays. And labeling each disk individually? That is not convenient. Hardware RAID cards are more convenient than ZFS for some applications, as I already said.

To be fair, there are some implementation of ZFS on machines which already do this. Both Nextena and Sun/Oracle ZFS based storage lines support this.
 
Why? You write like there is some secret revelation in those papers. I am aware of the properties of various types of RAID. Nothing is completely safe. Yes, ZFS adds checksums. Good for ZFS.
I see that you didnt read my OpenSolaris link that shows lots of research papers. The point of ZFS is not that it adds checksums - every storage solution adds checksums. You dont understand ZFS.

The point is that lot of research by computer scientists shows that hardware raid does not give you data integrity. There might be corrupted data - without even the hardware raid knowing that!

For instance, CERN did a study on their hardware raid storage solutions. CERN will store Petabyte of data from their big new LHC collider, which took decades and costed billions of USD to create. What happens if some bits get corrupted, then they might not find the Higgs boson! Therefore CERN did a study. CERN repeatedly wrote a pre determined bit pattern to their storage solutions, and checked after 3 weeks. It turned out that the bit pattern was not intact, there was silent corruption. And the hardware did not even notice that! In 3,000 cases they found 152 errors. Silent corruption, that is.
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/

In another CERN paper, they say "such data corruption is found in all solutions, no matter price (even very expensive Enterprise solutions)"!!! From that paper (can not find the link now)
"Conclusions
-silent corruptions are a fact of life
-first step towards a solution is detection
-elimination seems impossible
-existing datasets are at the mercy of Murphy
-correction will cost time AND money
-effort has to start now (if not started already)
-multiple cost-schemes exist
--trade time and storage space (à la Google)
--trade time and CPU power (correction codes"

CERN writes: "checksumming - not necessarily enough" you need to use "end-to-end checksumming (ZFS has a point)".

There is a vast difference between checksums, and end-to-end checksums. End-to-End: The data in RAM down to controller card down to disk - is is still the same?

Ordinary checksums: There might be errors when data passes through the boundaries - but there is no check.

This explains why only ZFS could detect this error in a switch between servers. Hardware raid can not do this:

http://jforonda.blogspot.com/2007/01/faulty-fc-port-meets-zfs.html
"As it turns out our trusted SAN was silently corrupting data due to a bad/flaky FC port in the switch. DMX3500 faithfully wrote the bad data and returned normal ACKs back to the server, thus all our servers reported no storage problems.
...
ZFS was the first one to pick up on the silent corruption"


The same with XFS, JFS, ReiserFS, NTFS, etc - they are all unsafe. See a PhD thesis:
http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169


All common storage solutions by researchers have been shown to not be able to repair, nor detect all errors. They all lack data integrity.

On the other hand, researchers have recently examined ZFS, and it seems to be safe, it detected all errors. The first step is detection, then repair. In the research paper, ZFS could not repair all errors, because they did not use ZFS raid. They only used single disk.

Now CERN is migrating to ZFS based machines.




Silent corruption isn't as much of a real problem as it is a theoretical one...
See the CERN study, among others. Everything I said above, is taken from my OpenSolaris link I posted earlier. Here are lots of research papers that I have summarized in this post on OpenSolaris forum:
http://opensolaris.org/jive/message.jspa?messageID=502969#502969



So, for those of you who believe XFS, JFS, NTFS, hardware raid 5 or 6 are safe - forget it. Research shows the data is not safe. Read the research papers In that OpenSolairs link.

OTOH, research shows ZFS to be safe. Read the research in that OpenSoalris link.




BTW, It is a very dumb idea to combine ZFS with hardware raid - because the hardware raid will interfere and ZFS can not guarantee data integrity any more. ZFS need exclusive access to the drives.

At the top of the OpenSolaris link I posted, there is a guy, the OT, that tries to run ZFS on top on a SAN. He created the thread and asked why ZFS corrupts his data. It turned out that ZFS detected data corruption on his SAN. Earlier he never knew that his SAN corrupted data - ZFS was the first to detect it. The problem was that ZFS did not have direct access to the drives so ZFS could not repair those errors.

Sens morale: never ever run ZFS on top hardware raid.

Sell your hw raid card, before long everyone will run software raid. Hw raid is just software running on dedicated cpu. It is better to run the software on the CPU.




And for those of you who claim ZFS is a PITA to administer, you have never tried other raid solutions. Most commands in ZFS is one liner. I heard that in Linux LVM to create one raid is 25 lines of complex commands. In ZFS you use one line:
# zpool create MySafeZFSraid disc0 disc1 disc2 disc3

And you are done. No formatting is needed, just immediately start to backup your data. How difficult was that? In other raid solutions you then have to format your raid which can take hours. Have you tried the other commands? No, you have never tried ZFS. ZFS is meant to be easier that other solutions. It sounds like you have never tried ZFS, to me - "ZFS difficult to administer"?



And for those of you who claim ZFS has a huge RAM foot print: no it does not. ZFS grabs all RAM as cache, but it is configurable. You can for instance say that ZFS is not allowed to use more than 512MB as cache. Myself used ZFS raid on a 1GB OpenSolaris machine as a desktop, for 1 year without problems. Sure it was not fast because I used Pentium 4, which is a 32 bit cpu (ZFS needs 64 bit cpu for performance) - but my data was SAFE. I prefer safe and slow, than fast and corrupted data.

The reason ZFS is slow is because it does extensive error checks. No other file system does that. ZFS scales good: just add some discs, and performance will shoot through the roof: 2-3GB/sec and several 100.000 of IOPS is possible. That is hardly low performance.
 
Hey guys, reading through this very interesting thread today, I wonder why noone has mentioned Nexenta Core. This is based on Ubuntu and with that, compatible to Debian and includes the OpenSolaris Kernel.

I admit, I was also unsure if this would work, but I run it since Sunday and I just think it's very impressive. Great read & write speeds, nearly current ZFS version (v26 currently in Nexenta Core v3) and it's where you were looking for: a "mainstream OS", or do you guys just count Windows as mainstream? ;)
 
I see that you didnt read my OpenSolaris link that shows lots of research papers. The point of ZFS is not that it adds checksums - every storage solution adds checksums. You dont understand ZFS.

Come on! Of course I understand ZFS, and I am not going to read your links or everything you wrote below the line I have quoted. Yes, comprehensive checksums are the big feature of ZFS for preventing silent data corruption. No, not all storage systems do comprehensive checksums like ZFS does. No need to make this simple fact so complicated.
 
Last edited:
To be fair, there are some implementation of ZFS on machines which already do this. Both Nextena and Sun/Oracle ZFS based storage lines support this.

Have you used such a setup? If so, what specific hardware was required to get it working? Did it need any special software configuration?
 
Why? You write like there is some secret revelation in those papers. I am aware of the properties of various types of RAID. Nothing is completely safe. Yes, ZFS adds checksums. Good for ZFS.

Come on! Of course I understand ZFS, and I am not going to read your links or everything you wrote below the line I have quoted. Yes, comprehensive checksums are the big feature of ZFS for preventing silent data corruption. No, not all storage systems do comprehensive checksums like ZFS does. No need to make this simple fact so complicated.
I just wanted to point out that ALL storage solutions add checksums in one way or the other. But no one did it right, until ZFS. Comp Sci researchers examine raid-5/raid-6/filesystems and show they are all flawed. On the other hand, they show ZFS seems to be safe.

If you want to read the research papers, they are posted in my link.
 
I just wanted to point out that ALL storage solutions add checksums in one way or the other. But no one did it right, until ZFS. Comp Sci researchers examine raid-5/raid-6/filesystems and show they are all flawed. On the other hand, they show ZFS seems to be safe.

I want to point out that you should be more careful with your claims. The ZFS system of checksums is indeed a useful feature, but nothing is perfect. If you claimed that ZFS was safer than certain alternatives, then that would be at true statement (for many alternatives you could name). But to claim that there is no possibility of data corruption with ZFS is ludicrous. There is always the possibility of data corruption, both theoretically, and in practice. It has happened before, even to ZFS data, and it will happen again.
 
I do not claim that. Why do you think that? I suggest you read again what I say: That researchers shows that ZFS detected all the their artificially introduced errors - whereas other solutions did not.

The keyword is "ZFS seems to be safe", not "ZFS is 100% safe". Of course you can still loose data with ZFS. For instance, if the hardware is cheap, it will sometimes fool ZFS by confirming certain actions - when the action did not occur. Some hardware does not obey standards, they are cheating and only reports they did a certain action. In that case, when ZFS is cheated and fooled, ZFS can have problems.

But that is hardly ZFS fault. All solutions will have problems, if the underlying system lies and does not obey standards. Enterprise stuff does obey standards, and that is one reason they are safer.
 
Back
Top