the link you provided is from 2009. in the 3 years since then, any bugs directly related to zfs have likely been squashed. hba and or hard drive firmware bugs that may or may not have been referenced have also likely been squashed.
however, if you try to build a 100TB+ system using consumer desktop SATA drives you're going to have serious problems. no manufacturer will recommend using their consumer drives in RAID arrays for many reasons. even enterprise class SATA is discouraged for large deployments because SATA is only single channel and interposers aren't made from magic unicorn horn.
bottom line is no filesystem is truly 'safe' which is why you have backups. if for instance one of my large NAS boxes gets corrupt i can't run fsck on it, idc if it might fix the problem i CANT be down that long. Joe, have you ever tried running FSCK on even 1TB of data? it takes FOREVER. FSCK on say oh idk ... 55PB worth of data would literally take MONTHS to finish. you really think lawrence livermore would ever just say "well shit guys, we're going to be down for the next X months while we run FSCK".
No, they wouldn't, they can't.
oh btw fsck isn't a magical fairy duster either. i've had plenty of unrecoverable errors using fsck/chdsk in the past.
if you don't have backups, you're fucked. end of story. if you do have backups, there isn't a single instance where rebuilding/restoring from backup isn't faster than running FSCK when your data is measured in TB.
*edit* oh, also forgot to mention something. if you took the time to read the comments in your own link you would realize that zfs added something called PSARC 2009/479. this allows you to roll back up the the last (in theory) 127 transaction states which if you have a pool that can't be imported then presumably one of the last 127 states will be importable at which point the combination of replication, self healing, and scrubbing should fix everything. if it doesnt, stop trying to use horrible desktop class hardware.
however, if you try to build a 100TB+ system using consumer desktop SATA drives you're going to have serious problems. no manufacturer will recommend using their consumer drives in RAID arrays for many reasons. even enterprise class SATA is discouraged for large deployments because SATA is only single channel and interposers aren't made from magic unicorn horn.
bottom line is no filesystem is truly 'safe' which is why you have backups. if for instance one of my large NAS boxes gets corrupt i can't run fsck on it, idc if it might fix the problem i CANT be down that long. Joe, have you ever tried running FSCK on even 1TB of data? it takes FOREVER. FSCK on say oh idk ... 55PB worth of data would literally take MONTHS to finish. you really think lawrence livermore would ever just say "well shit guys, we're going to be down for the next X months while we run FSCK".
No, they wouldn't, they can't.
oh btw fsck isn't a magical fairy duster either. i've had plenty of unrecoverable errors using fsck/chdsk in the past.
if you don't have backups, you're fucked. end of story. if you do have backups, there isn't a single instance where rebuilding/restoring from backup isn't faster than running FSCK when your data is measured in TB.
*edit* oh, also forgot to mention something. if you took the time to read the comments in your own link you would realize that zfs added something called PSARC 2009/479. this allows you to roll back up the the last (in theory) 127 transaction states which if you have a pool that can't be imported then presumably one of the last 127 states will be importable at which point the combination of replication, self healing, and scrubbing should fix everything. if it doesnt, stop trying to use horrible desktop class hardware.
Last edited: