More vdevs, more problems

idea · Nov 3, 2011

I'm having an issue where if I add more mirror vdevs to my zpool the individual performance of each disk decreases exponentially. I calculated 10% performance loss per vdev added. That is a HUGE hit. After 4 vdevs, it equals 50%

Here is the setup: I am running Solaris 11 virtualized under ESXi. I have 8 of these disks connected to an LSI 1068e based HBA flashed with IT firmware: Fujitsu MAY2073RC 73GB 10000 RPM 16MB Cache Serial Attached SCSI (SAS) 2.5" Hard Drive.

Here is my testing:

Code:

# zpool create testpool mirror c9t7d0 c9t8d0
# mkfile 2g /testpool/testfile = 48MB/s and disks are 90% busy, not TOO bad. Wonder where the other 10% is going?

Code:

# zpool create testpool mirror c9t7d0 c9t8d0 mirror c9t9d0 c9t14d0
# mkfile 2g /testpool/testfile = 75MB/s and disks are 80% busy, WTF? I should be getting 100MB/s after that first test...

Code:

# zpool create mirror c9t7d0 c9t8d0 mirror c9t9d0 c9t14d0 mirror c9t15d0 c9t16d0
# mkfile 2g /testpool/testfile = 106MB/s and disks are 65% busy, OK what the hell is going on here

Code:

# zpool create mirror c9t7d0 c9t8d0 mirror c9t9d0 c9t14d0 mirror c9t15d0 c9t16d0 mirror  c9t17d0 c9t18d0
# mkfile 2g /testpool/testfile = 116MB/s and disks are 50% busy, F($*@ DAMN IT

More testing I have done:

I hooked up 6x 15K SAS 3.5" disks and everything looks normal. 300MB/s writes and and 600MB/s reads. Crazy fast. To me, this means it's not the HBA or any other hardware. It's the disks.
I also created a single RAID-Z vdev out of these 8 disks. Same result, 50% performance.
I have NOT yet ruled out the backplane that these disks are connected to. But I doubt that is the cause.
Just to clarify, when I say the disks are only 50% busy, I get that information from the %b column in "iostat"

danswartz · Nov 3, 2011

I was confused for a moment when you said 'added new vdevs to the zpool'. I thought you meant existing zpool until I re-read it. Anyway, is there any way to test this without the backplane. If the SAS disks work fast, you're right the HBA sounds off the hook, but so does OS11 and such.

idea · Nov 3, 2011

danswartz said:
I was confused for a moment when you said 'added new vdevs to the zpool'. I thought you meant existing zpool until I re-read it. Anyway, is there any way to test this without the backplane. If the SAS disks work fast, you're right the HBA sounds off the hook, but so does OS11 and such.

Yeah... ZFS is confusing to document at times. Tonight I will try it without the backplane. I think I have exactly 8 SATA power off the PSU so I should be able to. I will also paste the output of 15K disks to support my claim that those disks are not affected.

msitpro · Nov 3, 2011

Maybe there's not enough I/Os queued to push the drives to their limit.

With the comparison to the RaidZ1 pool it may have something to do with the amount of vdevs as to how many I/Os it queues....

Maybe it only spawns a single I/O per Zpool ?

Hmmm.......

idea · Nov 6, 2011

danswartz said:
I was confused for a moment when you said 'added new vdevs to the zpool'. I thought you meant existing zpool until I re-read it. Anyway, is there any way to test this without the backplane. If the SAS disks work fast, you're right the HBA sounds off the hook, but so does OS11 and such.

I can't test without the backplane. I discovered that the SAS connectors are a bit different than SATA. The direct connections from the PSU do not support SAS. So I can't test it without the backplane, unless I figure something else out

msitpro said:
Maybe there's not enough I/Os queued to push the drives to their limit.

With the comparison to the RaidZ1 pool it may have something to do with the amount of vdevs as to how many I/Os it queues....

Maybe it only spawns a single I/O per Zpool ?

Hmmm.......

What do you mean "not enough I/O's queued" and how do I test it?

idea · Nov 14, 2011

bump, this is holding me back. my next step is spend $500 on new drives and hope for the best. i know these are old disks, and they weren't the top performers for throughput when they were new either, but this still makes no sense to me

danswartz · Nov 14, 2011

I could only suggest the backplane experiment but you apparently can't do that

Anyway, to what the other guy said, I think there may be one I/O thread per pool.

MarkL · Nov 14, 2011

You need to use a real benchmarking program.. a single mkfile process is not enough.

Try to get either bonnie++ or iozone3 going and do your tests again..

More vdevs, more problems

idea

Gawd

danswartz

2[H]4U

idea

Gawd

msitpro

Weaksauce

idea

Gawd

idea

Gawd

danswartz

2[H]4U

MarkL

Limp Gawd