XFS Troubles and the quest for a reliable FS

Joined
Oct 28, 2004
Messages
722
So my workstation has been plenty fine for a long time. Incredibly stable, I've had multiple 30+ day uptimes, usually having to shutdown because I'm going on vacation and not because the system has issuse. I've lost power to the machine maybe twice in the past year and nothing ever directly cropped up as an issue... until now. For some reason, one of my 900gb XFS partitions (there are two) decided to die (/), and it causes all sorts of issues. I get maybe a day of uptime or so before the system dies, and I don't know exactly what triggers it. The FSCK displays some error, but the machine proceeds to bootup I'm currently offloading all my data to my new fileserver which is using JFS.

Once I get everything verified to be sane on the new system with the new system being stable, I plan on reformatting with the following setup (puffy specs):

/boot - 256m ext2
swap - 2g
/ - 80gb - ex3
/usr/portage - reiserfs? ~10-20gb so I don't have to clean distfiles often
/home - remaining jfs

Any ideas or suggestions concerning this setup? Is JFS reliability all its cracked up to be? I know it isn't good for /, that's why I'm going to opt for a smaller ext3 root partition so I avoid any of those troubles. I was interested for a bit in maybe using NFS to a Solaris machine running ZFS over iSCSI or something equally similar - but Solaris does not work so well with large arrays and the iSCSI driver is horrible.

The error (google wasn't all that helpful, XFS seems to just corrupt and suck sometimes):
Code:
XFS mounting filesystem sda4
Starting XFS recovery on filesystem: sda4 (dev: sda4)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffff8023a821

Call Trace:<ffffffff8023899b>{xfs_free_ag_extent+427} <ffffffff8023a821>{xfs_free_extent+193}
       <ffffffff802640e4>{xfs_efd_init+68} <ffffffff8027f49b>{xfs_trans_get_efd+43}
       <ffffffff80279bfd>{xlog_recover_finish+397} <ffffffff8029fe71>{__up_write+49}
       <ffffffff8027236b>{xfs_log_mount_finish+27} <ffffffff8027b9cd>{xfs_mountfs+3229}
       <ffffffff8028d76d>{.text.lock.xfs_buf+5} <ffffffff8028d05d>{xfs_setsize_buftarg_flags+61}
       <ffffffff802811dd>{xfs_mount+2445} <ffffffff80292e80>{linvfs_fill_super+0}
       <ffffffff80292f38>{linvfs_fill_super+184} <ffffffff80443553>{__down_write+51}
       <ffffffff802a016e>{strlcpy+78} <ffffffff8018365b>{sget+955}
       <ffffffff80183eb0>{set_bdev_super+0} <ffffffff80292e80>{linvfs_fill_super+0}
       <ffffffff8018402e>{get_sb_bdev+286} <ffffffff801842cb>{do_kern_mount+107}
       <ffffffff8019b0d8>{do_mount+1672} <ffffffff80194c4f>{dput+47}
       <ffffffff80195dd0>{__d_lookup+288} <ffffffff8015c062>{buffered_rmqueue+530}
       <ffffffff8018afc7>{do_lookup+103} <ffffffff8015c062>{buffered_rmqueue+530}
       <ffffffff8017614b>{alloc_page_interleave+59} <ffffffff8015c66e>{__get_free_pages+14}
       <ffffffff8019b4fc>{sys_mount+156} <ffffffff806229b4>{mount_block_root+244}
       <ffffffff80622df8>{prepare_namespace+216} <ffffffff8010b2e5>{init+661}
       <ffffffff8010e76e>{child_rip+8} <ffffffff8010b050>{init+0}
       <ffffffff8010e766>{child_rip+0}
Ending XFS recovery on filesystem: sda4 (dev: sda4)
VFS: Mounted root (xfs filesystem) readonly.
 

unhappy_mage

[H]ard|DCer of the Month - October 2005
Joined
Jun 29, 2004
Messages
11,455
What does xfs_repair do? Are you on an UPS?

I've been using XFS for a long time; it hasn't failed me yet.

Where do you see reports of problems with ZFS on large volumes? I've been interested in trying that out one of these days...
 
Joined
Oct 28, 2004
Messages
722
Where do you see reports of problems with ZFS on large volumes? I've been interested in trying that out one of these days...

ZFS itself is fine - Solaris on 32 bit platforms has no large block disk support (>2tb), it's kind of neat otherwise. I didn't have long to really test the performance but it didn't seem as fast as when I was running the software raid on Linux. This is probably because everything was over iscsi (and limited to 1gbit) and the CPU wasn't as fast - plus ZFS is many order sof magnitude more reliable on paper than what I'm using. Oh, and Solaris support for anything but Sun hardware is horrendous, aside from that it's not bad at all really.
 
Joined
Oct 28, 2004
Messages
722
I do have a UPS, from what I've heard the losing power during a write for XFS is not nearly as catastrophic as it used to be and they've improved that a lot. It seems though as if XFS can slowly corrupt or just run into issues over time. Googling "xfs corruption" yields quite a few different posts which seem to mainly concern the 2.6 kernel tree. Not sure what causes the problems, but I don't have time to deal with an FS that I'll have to repair or that will slowly eat my files (as it already has eaten a few). Unacceptable :(
 

unhappy_mage

[H]ard|DCer of the Month - October 2005
Joined
Jun 29, 2004
Messages
11,455
Googling "xfs corruption" yields quite a few different posts which seem to mainly concern the 2.6 kernel tree. Not sure what causes the problems, but I don't have time to deal with an FS that I'll have to repair or that will slowly eat my files (as it already has eaten a few). Unacceptable :(

Using new 2.6 for reliability is asking for disaster - anything over 2.6.8 has some serious problems with stability, I'm given to understand; my dad works with big boxen and they're on 2.6.8 unless they absolutely need newer kernels for hardware support. They really, really need to go back to the even/odd dichotomy - even for stable, odd for unstable - and get the unstable features and new hardware support away from my kernel.

If you really want stable, use ext3. If you want fast, use XFS and a backup power supply.

drizzt81: ext3 is all I can recommend. Reiserfs isn't terribly stable (some compare reiserfsck to mkfs.reiser :p), ext2 doesn't journal, and I don't think anyone uses JFS.
 
Joined
Oct 28, 2004
Messages
722
Using new 2.6 for reliability is asking for disaster - anything over 2.6.8 has some serious problems with stability, I'm given to understand; my dad works with big boxen and they're on 2.6.8 unless they absolutely need newer kernels for hardware support.

The 2.6 series is a lot more testing than I think people are used to (as to before when there was 2.5 and whatnot), so it's a lot harder to get a firm idea of how stable it really is. I've had really good luck up to about 2.6.14-2.6.15, I've had some 2.6.11's and 2.6.12's running stable for a pretty long time. The most recent kernels (2.6.18, 2.6.19, 2.6.20) in general have a lot more new experimental features than others. 2.6.19 brought the latest libata support and libata IDE chipset support so theres all sorts of weird things going on in regards to that. I try to stay away from the latest kernels if I can avoid it, though I always test the new kernels on a test server before I upgrade everything anyways. I think Debian is still around 2.6.8 or 2.6.9 for "stable" iirc, so if your dad is following Debian's definition of stable (which is likely, given the big boxen info) it makes a lot of sense.
 

unhappy_mage

[H]ard|DCer of the Month - October 2005
Joined
Jun 29, 2004
Messages
11,455
I think Debian is still around 2.6.8 or 2.6.9 for "stable" iirc, so if your dad is following Debian's definition of stable (which is likely, given the big boxen info) it makes a lot of sense.

Well, Redhat's definition of stable, but basically. And installing old kernels where the new ones aren't needed.

Meanwhile, at home, I'm running 2.6.19.1 :p New libata FTW.
 
Joined
Oct 28, 2004
Messages
722
Well, Redhat's definition of stable, but basically. And installing old kernels where the new ones aren't needed.

I would think that your dad probably has a stable definition closer to Debian, and that Red Hat keeps claiming to be in that area when we all know it isn't ;) We had a lot of issues with Red Hat on some of the systems at my old work. I think they ended up using Rocks with a bunch of custom stuff though.

And I ended up using the following layout:
128M /boot ext2
2G swap
10G /usr/portage reiserfs
remaining ~1.7tb / ext3

ext3 took forever to mkfs but it mounted pretty quick which is fine with me. Maybe someday I'll move to ext4 once it becomes stable but for now I guess ext3 will have to do :cool:
 

DiceMann

n00b
Joined
Jan 6, 2007
Messages
46
XFS does a good job, but lord knows what happens if you are compiling something or doing file transfers if you lose power. XFS is a good thing if you have proper power management, but it is a dangerous monkey without a UPS.

My laptop (will no longer) use ReisferFS. ReiserFS, especially 4, is a gamble nowadays. I quit. Who cares about the performance!


However, I cannot wait until ZFS becomes available for stable kernels! ;)
 
Joined
Oct 28, 2004
Messages
722
However, I cannot wait until ZFS becomes available for stable kernels! ;)

ZFS apparently requires a bunch of stuff that is pretty much only found in Solaris. I read somewhere that you will probably never find it in Linux. Also, the memory hog factor and the current state of not being able to boot off one even in its "native" place would probably heavily limit any interest in starting to port it over to anything else. I'm pretty happy with ext3, it seems quite a bit slower but so long as it never, ever crashes on me I'll be happy.
 

Bones

[H]ard|Gawd
Joined
Mar 11, 2000
Messages
1,220
I think that the major hurdle to ZFS in Linux is its license, which is not compatible with the GPL. It may never end up in the kernel, but there is an ongoing project to implement ZFS as a FUSE module in userspace. Never tried it myself, but I heard that it works (albeit slowly).

Ext3 isn't too bad as long as dir_index is turned on. I started using it for my Gentoo portage tree, and its performance good enough. It hasn't slowed to a crawl from fragmentation like reiser did after a dozen or so tree syncs.

I've heard that some work might be done to backport mballoc and delayed allocation from ext4 to ext3, which would be a big performance win.
 
Top