ZFS Performance Issues

OldSchool

Limp Gawd
Joined
Jul 6, 2010
Messages
477
So recently I have been working on building a second ZFS storage box, and so far it's performance is seriously underwhelming. This is certainly not my first rodeo, there is already a 65TB (usable) storage box encrypted with GELI on the network that performs faster. I am just at a loss for why this new build is so sluggish. Here are the stats:

Gigabyte GA‑870A‑UD3 (1 - 16x PCIe, 1 - 4x PCIe)
AMD Phenom II x4 965 Black Edition (3.4Ghz)
4 x 2GB Corsair XMS3 DDR3 1333mhz
Passive cooled ATI video card in 16x slot
LSI 9211-8i HBA in 4x slot
Norco RPC-4220
Antec TruePower 850w PSU
12 x 2TB Hitachi 0F10311 7200 RPM 32MB Cache
1 x 500GB Seagate SSHD Hybrid for OS
FreeBSD 10.1

8 of the drives are from an old zfs array that was encrypted with GELI, as far as I can tell I have cleared all remnants of metadata from both the previous encryption as well as zfs info. 8 of them are connected to the LSI HBA, and the remaining 4 drives are connected via a reverse breakout cable to 4 of the ports on the motherboard.

It seems that no matter what I do, the performance is always the same. I tried building a zpool with all 12 of the drives, as well as a zpool using just the drive connected to the HBA, the speed is always exactly the same.

Here is what I am getting when benchmarking with dd:

Write:

Code:
1024000000 bytes transferred in 6.611905 secs (154872156 bytes/sec)
        6.61s real              0.14s user              6.20s sys

Read:

Code:
1024000000 bytes transferred in 2.796616 secs (366156802 bytes/sec)
        2.79s real              0.11s user              2.67s sys


As you can see, this is way under what I would expect an array of 12 drives to be capable of, even with the not-so-cutting-edge core system hardware. I have tried a few things so far just to eliminate obvious potential issues - I moved the HBA to the 16x PCIe slot (since it is an 8x card using a 4x slot) and ran headless, in which case the speeds were exactly the same. I was looking at TOP while speed testing and I have never seen the processor exceed 33% utilization.

I intend to do some more testing tonight, such as starting a dd test to some if not all of the drive simultaneously and checking iostat to see what the total throughput is. I also intend to do some firmware flashing, particularly the HBA and possibly the motherboard, but still this should be faster than it is regardless.

Anyone have any other ideas?

*EDIT* - I forgot to mention that this is raidz2 and just to clarify if it was missed the OS is FreeBSD 10.1.
 
Last edited:
I assume you are running the 'dd' test on the box itself? Possibly one or more drives are having issues and dragging down the aggregate data rate? Try rebuilding the pool with a subset of drives (said raidz2 with 6 drives). Maybe make two pools, each with half the drives and test both pools. Try other combinations like a two-vdev raidz2 (e.g. two raidz2 vdevs, each with 1/2 the drives), etc...
 
I assume you are running the 'dd' test on the box itself? Possibly one or more drives are having issues and dragging down the aggregate data rate? Try rebuilding the pool with a subset of drives (said raidz2 with 6 drives). Maybe make two pools, each with half the drives and test both pools. Try other combinations like a two-vdev raidz2 (e.g. two raidz2 vdevs, each with 1/2 the drives), etc...

Yes, I am using dd as root from the console. Actually, it's funny that you mention that, another thing that had happened is that during the testing I noticed the smartd was reporting one of the drives that was connected to the HBA was reporting bad/unreadable sectors. I have since removed that drive and have ordered a replacement. Thorough checking of all of the other drives with smartctl results in no reported errors. However even after removing the bad drive the benchmark speeds remained unchanged.

I will try to create a few different zpools with different arrangements of disks to see if that changes anything. I am also going to try and just create a basic stripe and see how that performs as well....
 
That's a small blocksize. Try bs=1048576.

I see if=/dev/zero, which should compress really well. You're using lz4? Different results with random data?

Also, 33% CPU could still max 1 core. Worth a look.
 
That's a small blocksize. Try bs=1048576.

I see if=/dev/zero, which should compress really well. You're using lz4? Different results with random data?

Also, 33% CPU could still max 1 core. Worth a look.

Oh snap, what are the odds my dd benchmark was just a bad choice? :D

Code:
root@boxname:/zpool/zfs # /usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1048576 count=10000
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 14.765670 secs (710144553 bytes/sec)
        14.77s real             0.00s user              7.11s sys
root@boxname:/zpool/zfs # /usr/bin/time -h dd if=sometestfile of=/dev/null bs=1048576
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 14.679164 secs (714329511 bytes/sec)
        14.68s real             0.00s user              4.42s sys

Also, I'm not using compression.

So that leads me to the question of why mathematically I was ending up with the same result using that block size no matter the configuration?
 
Last edited:
Because it's most likely never writing to more than one spindle at a time? Also, you do NOT want compression on when doing these tests as the zero-compression will give grossly misleading write stats... I always use 1M blocksize.
 
dd is not a benchmark. You have no control over syncs and queue depths. There is a post from me in this subforum how to use fio to properly benchmark sequential write speed.

Are you sure you should get more an an encrypted block device? The processor is somewhat old and does not have hardware AES support. Please test plain encrypted vs unencrypted block device.
 
Last edited:
dd is not a benchmark. You have no control over syncs and queue depths. There is a post from me in this subforum how to use fio to properly benchmark sequential write speed.

Are you sure you should get more an an encrypted block device? The processor is somewhat old and does not have hardware AES support. Please test plain encrypted vs unencrypted block device.

I will have to check that out for benchmarking.

This array is not encrypted, the other box has 2 pools that are encrypted with GELI, but yeah I didn't encrypt this one because the processor lacks AES-NI.
 
dd is always a bad choice for benchmarking.

Plus you used /dev/zero. ZFS compresses data. When I run a dd from /dev/zero to my zpool I get several GB/s which is obviously terribly incorrect.

The primary benchmark tools I usually see recommended by zfs developers is fio and tio.
 
bonnie++ is a good benchmark for throughput.
dd i do use as a way of testing data can be written, not as a speed test.
rsync in the correct conditions actually makes a fairly good speed test. ive used it to test drives, local and remote shares, and protocols (CIFS,NFS,etc.) and even found flaky controllers due to long writes with rsync. the downside is you must actually have a mass of data you can move around to test with.
 
Well there's a couple of things and I think it's a collection of a list of things coming together that limits you. The first thing I would try would be to bypass the backplane on the 4220 and see what happens. Try like 4 drives in a raidz2 directly connected to the HBA.

Second you can use dd. Just do it with the fsync option (conv=fsync). I like Bonnie more though.

Third, is this just one vdev?
 
Back
Top