mhddfs write performance issue/limit or low powered cpu?

GioF_71

n00b
Joined
Mar 4, 2010
Messages
39
Hello everybody, I read these forums a lot but it's the first time I write something.

I build my home server with these parts:

- Intel Atom 330 Little Falls 2
- 2GB RAM
- Promise TX4 SATA300 (PCI Card, 4 s-ata II ports)

I am using various hard drives: one old pata maxtor 300GB for system, 2 x shiny new wd 2tb (4k models, partitioned correctly, so this is not the issue), 1 wd 1.5tb, 2 x seagate 1.5tb 7200.11, 1 samsumg 1tb.

All works perfectly with ubuntu server 9.10 (upgraded from 9.04), 32bit.

After many thoughts, after trying lvm (I don't use it anymore as it reduces, in my case, the reliability of the system as I don't want to use any kind of raid) unionfs, aufs, I finally discovered mhddfs.

I set it up with a huge mlimit, so it balances new files to the disk with more free space.
It works like a charm, especially in read mode. In write mode, instead, it limits the performance roughly to 27MB/s (from windows 7 through Samba), when writing to the same disk, excluding mhddfs (a dedicated share) I get about 60MB/s.
I noticed that mhddfs consumes a lot of cpu (over 50%) in 4 threads.

Did anybody try a similar config with a more powerful cpu? Unfortunately, I can't make a test myself at the moment, I wonder if anyone can help

Thanks a lot
 
Last edited:
So are you happy with those speeds? Is it enough for you?

The CPU usage may also be attributed to the PCI interface. In my testings, using PCI for only one disk in a RAID array jumps interrupt usage towards ~40%. Once i removed the PCI storage and connected to PCIe, it ran much faster instantly; even with the PCI never bottlenecked at 133MB/s.

But if performance and upgrade-path are not important, and it works, i would keep it this way.
 
Well I am quite happy with the single-disk write performance (about 60MB/s as I wrote) but I am not happy with the write performance through mhddfs (on the SAME disk!) as it drops to 27MB/s.
I know about PCI limited bandwith but I can't believe it is the limiting factor: I think it is the cpu, so I asked if anybody else uses mhddfs and has different results.

At the moment, the system fits my needs. When I will upgrade it, I will surely opt for a more powerful cpu, pci-ex slots (which I do not have at the moment: the atom 330 mainboard has only one pci slot).

Thanks!
 
You are using bare disks; thus no RAID on the Promise PCI controller?

I know about PCI limited bandwith but I can't believe it is the limiting factor: I think it is the cpu
Your CPU has to work very hard to transfer through PCI; the interrupts are really nasty CPU usage too that is 'pulled away' from normal applications. PCI is terrible for performance; so i would not dismiss PCI as a cause so easily. In fact, i think its more than likely PCI is your bottleneck. If you used PCIe the CPU usage would have been much lower.

Since you use Ubuntu, could you open a terminal and type "top" and watch the Interrupt usage during you transferring files? If it comes higher than 10%, i'd say PCI limits your performance by hogging your CPU.
 
pci-ex slots (which I do not have at the moment: the atom 330 mainboard has only one pci slot).
Yes but only because Intel doesn't want you to have PCI-express on Atom. That's why nVidia and Intel are becoming like enemies; especially now the ION2 platform packs PCI-express with the Atom chip from Intel; intel is not too pleased about this at all. After all, intel knows only few people actually need a powerful CPU and this combo might just destroy their empire; by making low-end products that satisfy the needs of many the higher-end products tend to be less used.

Also note that Intel dumps its old chipset stock on the Atom platform, that's why you see the CPU using 1-2W and the chipset using 10 times as much - and it needs a large heatsink+fan; LOL. That's raping your own product IMO.
 
You are using bare disks; thus no RAID on the Promise PCI controller?

True, NO RAID at all


Your CPU has to work very hard to transfer through PCI; the interrupts are really nasty CPU usage too that is 'pulled away' from normal applications. PCI is terrible for performance; so i would not dismiss PCI as a cause so easily. In fact, i think its more than likely PCI is your bottleneck. If you used PCIe the CPU usage would have been much lower.

Well, yes it could be... but this does not really explain why I can write to the same disk at 60MB/s through a dedicated samba share (directory points to the specified disk), and I can write at 27MB/s to the same disk through the mhddfs "virtual" share (and I know it is writing to that this as it is the emptiest. Anyway I check where the file is written).


Since you use Ubuntu, could you open a terminal and type "top" and watch the Interrupt usage during you transferring files? If it comes higher than 10%, i'd say PCI limits your performance by hogging your CPU.

Can you please point out which indicator I should monitor? System cpu%?

Anyway mhddfs is a FUSE filesystem, so *for sure* it introduces a bit of overhead. I'd like to know if I can reduce this overhead (to zero or so) with a new setup or it is a s/w limitation.
 
Yes but only because Intel doesn't want you to have PCI-express on Atom. That's why nVidia and Intel are becoming like enemies; especially now the ION2 platform packs PCI-express with the Atom chip from Intel; intel is not too pleased about this at all. After all, intel knows only few people actually need a powerful CPU and this combo might just destroy their empire; by making low-end products that satisfy the needs of many the higher-end products tend to be less used.

Also note that Intel dumps its old chipset stock on the Atom platform, that's why you see the CPU using 1-2W and the chipset using 10 times as much - and it needs a large heatsink+fan; LOL. That's raping your own product IMO.

Yes I know all the facts (limitations, 2 s-ata ports, vga and no dvi... single channel memory: all limitation posed by intel)

BUT if is really cheap... even if now I think I should have bought a different mainboard.
 
FUSE? yuck.. so not even kernel level.. You sure you want to store your files that way?

Anyway, i looked but Ubuntu does not list Interrupt usage separately, like it does on FreeBSD. So i guess you'll never know that exactly is your bottleneck.

Anyway, could you test with some real filesystems? Like Ext4, etc. If that gives you proper performance, the higher cpu overhead of FUSE, combined with higher CPU usage of PCI, may in the end result in your low write speeds.
 
Let me explain: I'm using XFS on all data partitions.
mhddfs is "on top" of the file system... it basically redirects read and writes to the actual drives.

I use mhddfs only to join the drives in a "virtual drive" without having to use RAID and/or LVM as these systems make upgrades expensive and slow. I handle the backup in other ways (a dedicated system).
 
I see, but that still means you got FUSE in your storage layer; all I/O will be handled by FUSE; not in your kernel anymore. This must cost CPU and reduce performance; how big i don't know.

If you wanted just one volume, why did you not opt of using software RAID instead?
 
As I pointed out before, I'd be happy to avoid LVM and/or RAID at home, as these subsystem make upgrades really complicated. I don't want to buy big Norco stuff with plenty of hard drives cages, but I want to be able to expand my system simply adding one new disk at a time (possibly removing an older one). I basically need to keep at least one s-ata port available for upgrade-phase.

Also, LVM in linear mode (JBOD) reduces the reliability as ONE fail causes the failure of the whole array, and that is not an option.
For similar issues I do not want raid, any level.

Se mhddfs looked definitely good for me, except for this performance issue.
I partially solved it, because I keep a dedicated share for writing (it points to one disk) and the "virtual" share for reading (from my HTPC, especially).
But this is sort of sub-optimal...
 
Personally i would feel much more at home with ZFS; it does create a single volume out of your multiple drives (which is what you want) but doesn't rely on any userland-drivers like FUSE does.

As you use another backup solution, the reliability of the NAS is not your primary concern, correct? Then exactly what does mhddfs provide which other solutions do not provide?

You said you don't wanted RAID because it makes your storage setup more complicated. You think that this doesn't count for FUSE? If there's something that is bloated, slow and sub-optimal its FUSE.
 
Personally i would feel much more at home with ZFS; it does create a single volume out of your multiple drives (which is what you want) but doesn't rely on any userland-drivers like FUSE does.

As you use another backup solution, the reliability of the NAS is not your primary concern, correct? Then exactly what does mhddfs provide which other solutions do not provide?

You said you don't wanted RAID because it makes your storage setup more complicated. You think that this doesn't count for FUSE? If there's something that is bloated, slow and sub-optimal its FUSE.

I understand your point, you're right.
But the thing that mhddfs gives me is the opportunity of reversibility: the disks remain independent. So if one fails, I only have to restore one.

Another option could be to build a samba module that does the same as mhddfs, but at a higher level. Unfortunately I'm not a samba developer :-(
Not yet, at least
 
Well, assuming that mhddfs/fuse causes lower performance, your options would be:
  • Buying more powerful hardware so performance of mhddfs would be acceptable
  • Switching to a different storage setup, not involving FUSE or mhddfs
  • Keep things as is, and live with the lower performance

So its a tradeoff between money, features and performance.

Since i don't know mhddfs, i can't comment on its speed. You may want to start a thread on the support forum/mailinglist of that project.
 
Again, you're right.
I think I have to re-evaluate raid and LVM, but for sure I won't do raid on this setup (intel atom 330 little falls 2), such a limited system is horrible to handle when you need to update. Lack of expandability is a major issue.

I don't see a forum about mhddfs anywhere, this is a issue too.
Another option is to test mhddfs on a more powerful system. Somehow I will try it in a while.

At the moment I live happy with this issue (as it is not major since I found the "last disk" workaround) and read performance is just "enough" for my needs. :)
 
FUSE does add some overhead, but I don't think it should be quite that significant. I took a very quick look at the sourcecode for mhddfs and the read and write paths look nearly identical, so it's curious that performance is so different.

Also PCI of itself isn't CPU intensive, but the quality of hardware and drivers as always varies widely. What's CPU usage and interrupt traffic look like with and without mhddfs? Have you tried using a different FUSE filesystem (like the example one it comes with that just binds the root directory)? Perhaps do some more indepth benchmarks on the local machine and take the network out of the equation (bonnie++ for example).
 
I will try to do some bench soon
Anyway I looked a bit more deeply in the source code, and I think I found something interesting.
It *looks* like there is a global list that is accessed sequentially. It is enquired at every read-write operation, so when dealing with a lot of files, this could be a major slowdown. I talk about function flist_item_by_id especially.
Also, I'd like to let you know that I experienced slowdowns only while writing, but only because this has been my usage pattern. Since I have installed mhddfs, all the reading is done through samba, for streaming large files, so slowdowns is not noticeable. But I think there is (another thing for me to check).

Back to the code, I think I could patch the code and keep a std::map, ordered by ID and give it a try. And eventually another one, ordered by filename, if needed. But I think most critical is the access by ID resolved sequentially.
 
I'm not sure that will be a performance problem, it only maintains state information about open files, which there should only be several of, and the array scan will be faster than using a more complex datatype. The read() path basically comes down to allocating space on the stack, scanning that hopefully very short list, and then calling the write syscall and checking its return value. Nothing else to it really, I don't see where there'd be a big bottleneck.

If you're opening lots of small files I could see the FUSE overhead being pretty significant, but for large reads/writes I'm not sure, it seems to perform fairly well for me with ntfs-3g as long as the application reads from it intelligently. However I'd benchmark FUSE itself more closely, certainly all that extra context switching and memory copying will take a toll, how big is hard to say.
 
Yes, the list maintains itself definitely short.
I made a try (a sorted array) and it performed exactly the same.
So it's not that.
Another this worth a check could be the continuous lock/unlock in read/write routine.

I access the file system from samba, can this be a issue?
specifically what benchmark would you run?
 
Run bonnie++ on the local machine, or even just a dd. Samba might or might not be the issue, but when you have problems it's a good idea to isolate as many variables as possible...
 
I tried bonnie++ but it takes too long, so for the moment I stick with dd
The results are painful, mhddfs introduces a huge overhead.


power@mini:/mnt/wd_1.5tb_01/media$ dd if=/dev/zero of=zero.dat count=1000000 bs=1k
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 15.8115 s, 64.8 MB/s

power@mini:/mnt/wd_1.5tb_01/media$ dd if=/dev/zero of=/unions/union/media/zeroUnion.dat count=1000000 bs=1k
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 152.672 s, 6.7 MB/s
 
Last edited:
Run bonnie++ with the -f switch to skip the stupid 'dumb' read/write tests that take forever.

Ouch that is bad indeed. Try with the FUSE example FS.
 
With your dd commands, you used a block size of just 1024 bytes or 1KiB. You may want to retry with 1 MegaByte:

$ dd if=/dev/zero of=zero.dat count=8000 bs=1m

$ dd if=zero.dat of=/dev/null bs=1m

Please note that 8GB test file is kind of low, caching may produce too high numbers especially for the second read test. Use a larger file (at least 8 times your RAM size) to get more accurate results.
Another trick is to unmount and then remount the filesystem after writing the test file; that would clear any file cache on that volume. After remounting, perform the read test. The benefit of this would be that you can do the tests quicker because you don't need such a large file size to get accurate results.
 
@sub.mesa
yeah I know, but the result are so distant that I didn't try anything else for now
I will try anyway with bigger block size and count=8000, thanks

@keenan
I don't know about che FUSE example FS, where can I find some info
 
Well I ran a test here and it wasn't much of a problem for mhddfs. Somehow using mddhfs increased the write speed lol.

Writing 10GB to disk w/dd:
No mhddfs, diskA3 (xfs) = 85.7MB/s
No mhddfs, diskA4 (ext4) = 140MB/s (err this is too high lol, this is 2TB WD GP drive)
Two drives with mhddfs, writes going to diskA3: 105MB/sec

Guessing kernel was caching something, I shoulda tried to sync after dd exited and included that in benchmark.

So mddfs didn't really slow anything down. It actually magically increased performance ;). I did keep an eye on top and mddhfs was sitting at 100% usage but it was all IO wait time. It wasn't actually doing anything.

BTW This system has 2 Xeon 5410 (quadcore 2.33ghz), 8GB RAM, the disks are both WD GP drives on a LSI SAS controller. System is running Ubuntu 10.04 and using mddhfs packaged with Ubuntu.

bexamous@nine:~$ time dd if=/dev/zero of=/mnt/diskA3/zero bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 122.358 s, 85.7 MB/s

real 2m2.361s
user 0m0.000s
sys 0m9.840s
bexamous@nine:~$ time dd if=/dev/zero of=/mnt/diskA4/zero bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 75.1051 s, 140 MB/s

real 1m15.125s
user 0m0.000s
sys 0m13.590s
bexamous@nine:~$ mhddfs /mnt/diskA3,/mnt/diskA4 /data/bling -o mlimit=20Gb -o allow_other
mhddfs: directory '/mnt/diskA3' added to list
mhddfs: directory '/mnt/diskA4' added to list
mhddfs: mount to: /data/bling
mhddfs: move size limit 52428800 bytes
bexamous@nine:~$ time dd if=/dev/zero of=/data/bling/zero2 bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 99.8065 s, 105 MB/s

real 1m39.810s
user 0m0.010s
sys 0m23.140s
 
Oddly performance seems to suffer with samba. Sharing the mddfs mount with samba, using Win7 to copy one of the 10GB files to the desktop (Intel G2) it only was able to read it at 40MB/sec. Copying the file back to the samba share its going at 55MB/sec. I don't know why this is so low. Usually I get 100MB/sec or whatever between these same systems.

I only recently upgraded to Ubuntu 10.04, I've been using ubuntu 9.04 for last year and using a copy of mhddfs I build myself from svn-- the one included in Ubuntu 9.04 had some bug that annoyed me, I forget what though.
 
Oops.. the board lost some messages
anyway I tried to reduce numbers of disks joined and tuned samba.
Also samba does not looks the culprit as performance hit is noticeable normally.
 
There's an example filesystem that comes with the FUSE distribution. I'm not sure where it'd be packaged (if at all) if you're using your distro's package system.
 
I tried some searching with google but I didn't find find.
Is it a sort of "transparent" file system, just to check performance/functionality?

Thanks
 
I think you're CPU limited. I've been using mhddfs for a while now and it's perfect for my needs. While there is a performance hit with mhddfs, it's not as severe as yours.

Source drive is a Intel G2 connected via gigE running Win7
Target is a Linux machine (Q6600) with a software raid5 and a hardware raid5 merged/presented to my family seamlessly with mhddfs.

Everything is untuned.

to the mhddfs fs
ftp: 886243328 bytes sent in 12.05Seconds 73571.59Kbytes/sec.
Windows 7 writing to the samba share seems to level off at ~50mb/s

to the underlying fs itself
ftp: 886243328 bytes sent in 11.10Seconds 79820.17Kbytes/sec.
Windows 7 writing to the samba share seems to level off at ~60mb/s
 
Thanks, that was my thought.
What I don't understand is why this heavy load comes from. It could be a lot of context switch, because as I wrote before, writes are fast on the single drive.

Thanks!
 
Hello, I'm posting some new info.
I tried mhddfs on a more beefy system, and it's just the same. Very low write performance.

Took a look again at the code. Yes, the list is definitely short, but it is locked for EVERY read/write operation. Accordingly, if I try a dd if=/dev/zero ... I get higher performance if I use a bigger block size (due to a lower number of synchronization events, I suppose).

I wrote an e-mail to the author, I'm not completely sure of what I found but at least it could be a hint.
By the way, I'm using aufs now and I'm very happy with it. No slowdown at all, similar features.

Bye
 
I finally solved my issues switching to aufs.
Similar functionalities, no slowdowns, no high spikes on the cpu(s)
Thanks for your help.
 
Back
Top