BTRFS broken:

Finally got it to load after a few attempts...

Btrfs RAID 5/6 Code Found To Be Very Unsafe & Will Likely Require A Rewrite
Written by Michael Larabel in Linux Storage on 5 August 2016 at 11:00 AM EDT. 111 Comments
linuxstorage.jpg

It turns out the RAID5 and RAID6 code for the Btrfs file-system's built-in RAID support is faulty and users should not be making use of it if you care about your data.

There has been this mailing list thread since the end of July about Btrfs scrub recalculating the wrong parity in RAID5. The wrong parity and unrecoverable errors has been confirmed by multiple parties. The Btrfs RAID 5/6 code has been called as much as fatally flawed -- "more or less fatally flawed, and a full scrap and rewrite to an entirely different raid56 mode on-disk format may be necessary to fix it. And what's even clearer is that people /really/ shouldn't be using raid56 mode for anything but testing with throw-away data, at this point. Anything else is simply irresponsible."

So hopefully you aren't making use of any Btrfs RAID 5/6 support as it turns out to be in very bad shape and may even be ifdef'ed out of the mkfs code. Unfortunately it could take some time to fix especially with the potential for a format change being necessary to address the problem. The


Coincidentally, I'm in the middle of some Btrfs RAID tests right now but will now be limited to 0/1/10 for the four SSDs.

Specifically it's just a built-in raid feature of btrfs that someone wrote that is currently broken... that doesn't surprise me...but that's why I use mdadm for all my software raid needs anyways... I trust it much more.

I just read the email trail, and basically it's a parity corruption issue that fixes itself for the most part, but if any parity is corrupt and the user has a disk failure, the corrupt parity will be used in the disk rebuild leaving the user with some data corruption.

Fun,

-- Dave
 
Last edited:
Also, it should be mentioned that the devs point out that their raid implementation is currently broken, so this really shouldn't come as a surprise:

RAID56

Status
The parity RAID code has multiple serious data-loss bugs in it. It should not be used for anything other than testing purposes.
From 3.19, the recovery and rebuild code was integrated. The one missing piece, from a reliability point of view, is that it is still vulnerable to the parity RAID "write hole", where a partial write as a result of a power failure will result in inconsistent parity data.

  • Parity may be inconsistent after a crash (the "write hole")
  • Parity data is not checksummed
  • No support for discard? (possibly -- needs confirmation with cmason)
  • The algorithm uses as many devices as are available: No support for a fixed-width stripe (see note, below)
The first two of these problems mean that the parity RAID code is not suitable for any system which might encounter unplanned shutdowns (power failure, kernel lock-up), and it should not be considered production-ready.

If you'd like to learn btrfs raid5/6 and rebuilds by example (based on kernel 3.14), you can look at Marc MERLIN's page about btrfs raid 5/6.

Note
Using as many devices as are available means that there will be a performance issue for filesystems with large numbers of devices. It also means that filesystems with different-sized devices will end up with differing-width stripes as the filesystem fills up, and some space may be wasted when the smaller devices are full.

Both of these issues could be addressed by specifying a fixed-width stripe, always running over exactly the same number of devices. This capability is not yet implemented, though.
RAID56 - btrfs Wiki
 
Also, it should be mentioned that the devs point out that their raid implementation is currently broken, so this really shouldn't come as a surprise:
I'm not surprised, the btrfs raid 6 on my test server has failed every time I've tried to test some rebuild scenarios over the last few years. That said, the wiki warning was added fairly recently though and I know people who have lost data to these bugs.
 
As people say in the many comments in my top link:
"Software that is designed/ intended to be reliable should not go through large periods of instability only to be written off as "prepubescence". BTRFS been in development for almost a decade and it STILL isn't ready. Sorry, we're not talking about a mission to the moon (which btw was done in less time)."

Kent Overstreet (author of BcacheFS, a promising next gen filesystem for Linux) explains the rationale why he creates BcacheFS (no filesystem for Linux is good):
Kent Overstreet is creating bcachefs - a next generation Linux filesystem | Patreon
"btrfs, which was supposed to be Linux's next generation COW filesystem - Linux's answer to zfs. Unfortunately, too much code was written too quickly without focusing on getting the core design correct first, and now it has too many design mistakes baked into the on disk format and an enormous, messy codebase - bigger that xfs. It's taken far too long to stabilize as well - poisoning the well for future filesystems because too many people were burned on btrfs, repeatedly (e.g. Fedora's tried to switch to btrfs multiple times and had to switch at the last minute, and server vendors who years ago hoped to one day roll out btrfs are now quietly migrating to xfs instead)."

So it seems that BTRFS has some problems as the raid5/raid6 functionality is broken and needs a complete a rewrite, after a decade of development. Basic raid functionality is mandatory for a "next-gen" filesystem, yes? It will probably take another 3 years or so, before BTRFS has a stable raid6/raid6 implementation. And after that, it will take another 5-8 years before BTRFS is deemed stable enough by companies. Or what do you think, how many years of development will it take before next-gen filesystem BTRFS is stable?

Yes, the company Facebook use BTRFS on their webserver clusters, but they dont use BTRFS for anything critical. If a webserver goes down, they just failover to another server. No problem with unstable filesystems. BTW, FB has hired Chris Mason, author of BTRFS and that is why they try BTRFS.
 
Back
Top