ZFS SSD Performance Issue

Suprnaut

Weaksauce
Joined
Feb 16, 2011
Messages
69
I'm having issues with ZFS performance on a proof-of-concept ZFS server I setup. I have 14 OCZ Deneva 2 SSDs connected to a Dell H200 HBA card (rebranded LSI SAS2008 card). I wanted to run a benchmark against a 14 disk Raid 10 SSD pool and was dissapointed by the results. I was only getting about 650MB/s write and 900MB/s read. I created a bunch of different pools and the numbers don't add up.

With a single drive configured I got about 350 write and 600 read

With a 14 disk RAID 0 I got the same 650 write and 900 read.

About the system:
Dell R910 with 4 8 core Xeons
128GB ECC Ram
Running Solaris 11 Express with Napp-it 0.8H
Using the Napp-it dd bench tool with default values. (Would rather use Bonnie++ but apparently it doesn't worh with Solaris 11)

Other troubleshooting I've done:
updated to the latest Dell BIOS
updated the H200 to the latest Dell firmware
checked that the OCZ drives had the latest firmware
tried using different pool versions v28 vs v31
 
No the drives are passed to Solaris as raw drives. I didn't create any Raid 0's in the HBA firmware.
 
Can you give us some more info on how your box is setup? zpool status outputs. Also, you could try iozone -M -e -+u -T -t 32 -r 128k -s 40960 -i 0 -i 1 -i 2 -i 8 -+p 70 -C which should run since you don't have bonnie++
 
Here is a zpool status output. Oracle 11 doesnt seem to come with iozone.

pool: ssdpool
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM CAP Product
ssdpool ONLINE 0 0 0
c0t5E83A97010000D29d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D3Cd0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D3Fd0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D47d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D4Fd0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D50d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D52d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D56d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D57d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D66d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D68d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D77d0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024
c0t5E83A97010000D7Cd0 ONLINE 0 0 0 240.06 GB D2CSTK251M11-024

errors: No known data errors
 
Ok, please do the following and post the results
zpool list
zfs list
zpool status -v
zpool -history
df-k
 
Last edited:
zpool list:

NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 136G 20.0G 116G 14% 1.00x ONLINE -
ssdpool 2.82T 122K 2.82T 0% 1.00x ONLINE -

zfs list:

rpool 53.9G 80.0G 94K /rpool
rpool/ROOT 3.86G 80.0G 31K legacy
rpool/ROOT/napp-it-0.8h_update_06.05 3.85G 80.0G 3.71G /
rpool/ROOT/pre_napp-it-0.8h 49K 80.0G 3.42G /
rpool/ROOT/solaris 11.3M 80.0G 3.69G /
rpool/dump 16.0G 80.0G 16.0G -
rpool/export 832K 80.0G 32K /export
rpool/export/home 800K 80.0G 32K /export/home
rpool/export/home/taylor 768K 80.0G 768K /export/home/user
rpool/swap 34.0G 114G 125M -
ssdpool 122K 2.77T 31K /ssdpool

zpool status:

pool: ssdpool
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
ssdpool ONLINE 0 0 0
c0t5E83A97010000D29d0 ONLINE 0 0 0
c0t5E83A97010000D3Cd0 ONLINE 0 0 0
c0t5E83A97010000D3Fd0 ONLINE 0 0 0
c0t5E83A97010000D47d0 ONLINE 0 0 0
c0t5E83A97010000D4Fd0 ONLINE 0 0 0
c0t5E83A97010000D50d0 ONLINE 0 0 0
c0t5E83A97010000D52d0 ONLINE 0 0 0
c0t5E83A97010000D56d0 ONLINE 0 0 0
c0t5E83A97010000D57d0 ONLINE 0 0 0
c0t5E83A97010000D66d0 ONLINE 0 0 0
c0t5E83A97010000D68d0 ONLINE 0 0 0
c0t5E83A97010000D77d0 ONLINE 0 0 0
c0t5E83A97010000D7Cd0 ONLINE 0 0 0

errors: No known data errors

zpool history:

History for 'ssdpool':
2012-06-06.15:59:21 zpool create -f ssdpool c0t5E83A97010000D29d0 c0t5E83A97010000D3Cd0 c0t5E83A97010000D3Fd0 c0t5E83A97010000D47d0 c0t5E83A97010000D4Fd0 c0t5E83A97010000D50d0 c0t5E83A97010000D52d0 c0t5E83A97010000D56d0 c0t5E83A97010000D57d0 c0t5E83A97010000D66d0 c0t5E83A97010000D68d0 c0t5E83A97010000D77d0 c0t5E83A97010000D7Cd0
 
so, you have a controller or backplane problem there. notice every disk is showing up on c0 t5. it looksto me like only one of the ports on that card are being used which would limit you to 6gbits.

now, i could be wrong but you should be seeing c0 with 8 total unique numbers after the tX like c0t5, c0t6, etc. you have 14 total drives though so some of the track channels will have multiple devIDs

as an example, from one of my boxes.

c3t10d0 1000.20 GB
c3t11d0 1000.20 GB
c3t4d0 1000.20 GB
c3t5d0 1000.20 GB
c3t6d0 1000.20 GB
c3t7d0 80.03 GB
c3t8d0 80.03 GB
c3t9d0 1000.20 GB
c4t10d0 1000.20 GB
c4t4d0 1000.20 GB
c4t5d0 1000.20 GB
c4t6d0 80.03 GB
c4t7d0 80.03 GB
c4t8d0 1000.20 GB
c4t9d0 1000.20 GB

each drive is directly connected though, there is no backplane or expander in play meaning each channel from each controller is dedicated.

if i were you i would back down to 8 SSDs and test. if your performance numbers are still the same then look at the 910 manual and see if the backplane can be setup differently.
 
i just looked, appears to be a fairly braindead sas backplane, presuming of course that you have the sas backplane and not a sata backplane.

if sata that is likely the problem.
 
drop down to 4 drives, use every 4th slot in the chassis. see if that comes up with different target number for each connected drive.

i may be off here idk but you should have 8 available target UUIDs with disks down stream from them and it appears to me that everything is down stream from a single channel on that controller.
 
I have 3x SAS backplanes hitting a single LSI 9211 and they all show up as C2T5xxxxx (72 drives). I definitely can get more than a single 3gbps link (the drives are all 3gbps sata) worth of data across it. I think a single "channel" is (in his case) 4x6gbps or a single SAS channel @24gbps - far more than he's seeing.
 
hmm, he has 2 4x channels though. idk, like i said i could be wrong but i would expect to see more than a single target.
 
Any other suggestions? Today I'm going to try flashing with LSI firmware. Maybe swithing OS to the full Oracle Solaris 11 and/or Open Indiana.
 
Updated the Dell H200 firmware to the newer LSI 9211 firmware and the results are the same. Switching the OS to Open Indiana now.
 
Can you give us some more info on how your box is setup? zpool status outputs. Also, you could try iozone -M -e -+u -T -t 32 -r 128k -s 40960 -i 0 -i 1 -i 2 -i 8 -+p 70 -C which should run since you don't have bonnie++

Here is the output of iozone not sure how to read it:



Children see throughput for 32 random writers = 1400575.40 KB/sec
Parent sees throughput for 32 random writers = 950491.35 KB/sec
Min throughput per thread = 29922.35 KB/sec
Max throughput per thread = 154532.92 KB/sec
Avg throughput per thread = 43767.98 KB/sec
Min xfer = 21504.00 KB
CPU utilization: Wall time 9.889 CPU time 304.316 CPU utilization 3077.33 %

Child[0] xfer count = 27648.00 KB, Throughput = 38518.02 KB/sec, wall= 9.873, cpu= 9.873, %=100.00
Child[1] xfer count = 26496.00 KB, Throughput = 37073.62 KB/sec, wall= 9.856, cpu= 9.856, %=100.00
Child[2] xfer count = 35712.00 KB, Throughput = 49977.48 KB/sec, wall= 9.111, cpu= 9.111, %=100.00
Child[3] xfer count = 21504.00 KB, Throughput = 29922.35 KB/sec, wall= 9.889, cpu= 9.889, %=100.00
Child[4] xfer count = 27008.00 KB, Throughput = 37692.62 KB/sec, wall= 9.869, cpu= 9.869, %=100.00
Child[5] xfer count = 29696.00 KB, Throughput = 41606.36 KB/sec, wall= 9.828, cpu= 9.828, %=100.00
Child[6] xfer count = 35072.00 KB, Throughput = 49009.46 KB/sec, wall= 9.121, cpu= 9.121, %=100.00
Child[7] xfer count = 26112.00 KB, Throughput = 36434.72 KB/sec, wall= 9.869, cpu= 9.869, %=100.00
Child[8] xfer count = 25600.00 KB, Throughput = 35935.43 KB/sec, wall= 9.829, cpu= 9.829, %=100.00
Child[9] xfer count = 26112.00 KB, Throughput = 36534.89 KB/sec, wall= 9.854, cpu= 9.854, %=100.00
Child[10] xfer count = 40960.00 KB, Throughput = 154532.92 KB/sec, wall= 7.022, cpu= 7.022, %=100.00
Child[11] xfer count = 27008.00 KB, Throughput = 37680.83 KB/sec, wall= 9.869, cpu= 9.869, %=100.00
Child[12] xfer count = 24704.00 KB, Throughput = 34767.34 KB/sec, wall= 9.767, cpu= 9.767, %=100.00
Child[13] xfer count = 29440.00 KB, Throughput = 41262.42 KB/sec, wall= 9.828, cpu= 9.828, %=100.00
Child[14] xfer count = 35840.00 KB, Throughput = 50371.75 KB/sec, wall= 9.048, cpu= 9.048, %=100.00
Child[15] xfer count = 27776.00 KB, Throughput = 38752.75 KB/sec, wall= 9.868, cpu= 9.868, %=100.00
Child[16] xfer count = 24192.00 KB, Throughput = 34099.70 KB/sec, wall= 9.742, cpu= 9.742, %=100.00
Child[17] xfer count = 29056.00 KB, Throughput = 40773.94 KB/sec, wall= 9.808, cpu= 9.808, %=100.00
Child[18] xfer count = 35200.00 KB, Throughput = 49256.40 KB/sec, wall= 9.106, cpu= 9.106, %=100.00
Child[19] xfer count = 26752.00 KB, Throughput = 37331.95 KB/sec, wall= 9.869, cpu= 9.869, %=100.00
Child[20] xfer count = 24320.00 KB, Throughput = 34134.00 KB/sec, wall= 9.828, cpu= 9.828, %=100.00
Child[21] xfer count = 26880.00 KB, Throughput = 37662.57 KB/sec, wall= 9.828, cpu= 9.828, %=100.00
Child[22] xfer count = 21504.00 KB, Throughput = 30001.63 KB/sec, wall= 9.882, cpu= 9.882, %=100.00
Child[23] xfer count = 36992.00 KB, Throughput = 52210.69 KB/sec, wall= 8.286, cpu= 8.286, %=100.00
Child[24] xfer count = 27136.00 KB, Throughput = 37815.04 KB/sec, wall= 9.886, cpu= 9.886, %=100.00
Child[25] xfer count = 25472.00 KB, Throughput = 35688.56 KB/sec, wall= 9.850, cpu= 9.850, %=100.00
Child[26] xfer count = 27136.00 KB, Throughput = 37863.05 KB/sec, wall= 9.870, cpu= 9.870, %=100.00
Child[27] xfer count = 37376.00 KB, Throughput = 52310.77 KB/sec, wall= 8.432, cpu= 8.432, %=100.00
Child[28] xfer count = 26624.00 KB, Throughput = 37524.50 KB/sec, wall= 9.743, cpu= 9.743, %=100.00
Child[29] xfer count = 29696.00 KB, Throughput = 41671.93 KB/sec, wall= 9.687, cpu= 9.687, %=100.00
Child[30] xfer count = 26496.00 KB, Throughput = 36968.51 KB/sec, wall= 9.868, cpu= 9.868, %=100.00
Child[31] xfer count = 39552.00 KB, Throughput = 55189.19 KB/sec, wall= 8.129, cpu= 8.129, %=100.00
 
Can you also do a ./iozone -a -f <path to your ZFS volume>

Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
64 4 110521 333126 1049372 1421755 842003 363829 942521 407457 1033216 447537 533875 1017549 1255511
64 8 404998 743988 2662899 2923952 1828508 679917 1599680 693980 1828508 778512 942521 1780008 1947927
64 16 429630 760859 2467108 2662899 1780008 727850 1638743 769584 2067979 633392 1163036 1828508 2222043
64 32 818885 1163036 5735102 6421025 3738358 1143223 2892445 1163036 3541098 926260 1933893 2561267 3165299
64 64 282027 390278 942521 1679761 725882 335206 1679761 1017549 2133730 561808 1392258 1492919 2772930
128 4 24082 88158 251014 250428 172305 148478 653282 294853 978253 518312 561694 1306873 1359835
128 8 365764 153094 442668 448961 321709 132110 301307 132077 325611 162206 579270 1229084 1349580
128 16 422464 895074 3468030 3759450 2511022 852438 2166499 907174 2969325 1254941 1584598 2571150 2905056
128 32 196310 300464 826202 848397 677178 277050 646205 268327 2166499 947186 1455701 1997245 2248149
128 64 1000121 1206978 2558895 7476717 1217930 325809 561694 332467 776042 309113 393945 442668 652488
128 128 1102844 1266785 2608629 7476717 5847904 1196221 2238774 1231904 2326073 587511 934004 1000121 1363288
256 4 145536 85763 263636 263378 171941 170792 649647 282801 757326 425951 491458 1327564 1347557
256 8 322851 573946 2228578 2266207 1543163 600928 1897721 658008 1911233 913973 991629 2077650 2187712
256 16 632056 947860 4009406 4117018 2872459 904732 2880164 937924 3087188 1280084 256213 446127 485678
256 32 904732 1112909 5569035 5687020 4264168 1048775 672010 282280 790209 316476 380920 600928 612584
256 64 1080434 1254656 2533571 7120034 5540300 1163562 2223962 1208072 4332998 562815 379842 472222 706964
256 128 373111 415566 792543 1332507 1137672 377041 1907837 1103757 2872459 695966 765970 1071806 1549845
256 256 1084800 1267990 1312954 1639786 1549845 1219045 1260547 1243036 1300235 530031 822913 270272 304020
512 4 64753 134631 1166599 1198502 860781 361115 1100818 405124 1082507 524056 100864 220123 228266
512 8 581216 153339 453074 466255 322599 151116 1273861 533827 1565443 791318 958781 2169601 2207515
512 16 817837 1051757 657969 664073 493242 209821 494720 212879 792486 761841 1062686 2180616 2235086
512 32 239359 930933 3684733 3970896 3104171 926116 3481620 1026126 4262523 1679288 2337255 3604335 3795443
512 64 283353 384384 751707 1219602 1004057 338623 1895721 1077618 4061007 1475116 1528669 1875849 4099771
512 128 1138158 1316016 2584820 6736026 5807059 1240025 2497638 1344020 5019764 772252 787834 1135150 449470
512 256 1264113 1373241 1387437 1766304 1760512 1273861 439445 417326 493242 219830 238826 714207 816594
512 512 379561 429170 453074 610982 1483267 1105920 1264113 1221684 1296159 468391 777002 790153 910797
1024 4 232200 481887 1570483 1597351 1058967 80338 642785 352977 1080275 523533 90015 821247 918354
1024 8 384103 155576 1558516 1703807 1267398 540396 1765437 672369 1943594 887425 162775 363278 609837
1024 16 187989 284683 2668003 2827858 2178189 806293 2852271 953418 3046495 1319577 1605111 486472 487410
1024 32 934337 1145383 5507740 5720477 802526 284759 755649 290498 2604895 1120288 1796451 3210456 3348104
1024 64 1000969 1276817 2666347 7020149 6093831 1200447 2528227 1270397 6244448 367160 395820 520362 834653
1024 128 1244266 1385560 2744728 7826025 6830356 1287921 768769 434106 1311117 215310 492496 1115632 1530742
1024 256 1097669 1304348 1370965 1768345 1771993 1289467 1385560 1381548 537555 222666 289656 792604 926676
1024 512 400171 1279861 1365299 1714690 1726407 1270397 1393200 1347311 676286 225708 239196 714065 824400
1024 1024 1067920 433537 465639 1276817 1501313 1164956 1313122 1287921 1352828 490696 242944 254730 456920
2048 4 108773 485669 263747 326479 705260 379400 368606 301922 1093450 138015 542368 499740 222486
2048 8 587474 770825 457451 669686 1132668 554267 2006147 425164 335193 569216 996534 2199836 2183064
2048 16 197971 850504 3624742 3792790 3029098 949499 3136379 227808 1038584 833181 1436250 2976615 3002627
2048 32 966485 1141549 927958 943761 763020 290457 3061485 1022391 4347560 1771556 2375662 638381 645238
2048 64 1114880 389640 759377 1276781 3200654 1047067 2421196 1258818 5626082 2081497 1708146 527287 742572
2048 128 1297023 1399056 819509 1403628 1264191 407650 2287717 1295458 5704543 735640 790692 386989 478286
2048 256 770480 448237 658949 1645012 1684034 1248028 1415424 1394287 563873 220238 695664 851516 1001063
2048 512 410161 1320349 1427182 1830057 1818820 1164925 479380 710863 1585208 684031 740460 272741 300738
2048 1024 1213300 1376414 1418228 1026055 570501 407650 1302925 1300361 1587551 714351 239255 484381 831487
2048 2048 1213986 1379952 1384846 1116619 544673 405629 1347270 1311879 569971 179226 724472 752064 365905
4096 4 328312 359646 374339 1137754 1009819 168316 394687 259178 932380 191859 346768 1164278 1365779
4096 8 298454 319448 2213956 2536221 1926684 308176 1560415 398754 1296612 813833 227506 2153727 2210537
4096 16 528240 885432 4003879 4007615 3023131 246600 2639481 973650 509700 525461 1588844 488372 599273
4096 32 324461 1044127 5565581 5572803 4623457 342995 3168077 1077661 4602401 303248 1759546 3842678 3879120
4096 64 1117258 393656 2249320 4607338 4380601 1158078 2592481 492770 1044381 869083 1665070 2002827 731157
4096 128 524851 1387957 2739658 7892238 1402804 407968 2469150 1388518 6703539 446186 414484 1222696 623072
4096 256 951909 468821 1379598 1824383 1824577 910635 514709 1351382 1792593 437371 411288 859216 502823
4096 512 1314470 447057 1397554 1823609 1841791 882747 488136 1339894 1767693 395477 452902 835401 402869
4096 1024 531459 737877 1428583 1836082 1333343 409524 1358542 1390879 1366213 280127 720215 290702 770508
4096 2048 866322 508568 1396532 1800484 1262315 411870 1353192 1366322 526896 468540 741348 325309 829312
4096 4096 1312862 445353 1373092 932836 549425 1190334 1379044 481825 1357576 220345 717717 307691 769334
 
______________________________random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read
1024 1024 1067920 433537 465639 1276817 1501313 1164956 1313122 1287921 1352828

I am cutting this out as an example. What is strange is that you are seeing write rates well in excess of your read rates. All of these are in Kbyte values. You are showing linear STR read of 465,639,000 (465MB/s, not bad for 1-2 drives but pitiful for 12-14). Random reads are much better which is also strange. You are also showing str writes 1,067,920,000 which is better but still not 12-14 drives worth. The biggest problem here is the H200 which is likely the culprit, even though it is not the fastest controller around I would still expect more in R0. I would suggest a better HBA, such as an Areca 1882 for better performance, but do you have any other HBA you can test? Also, this is the integrated H200, correct? God knows how much bandwith dell gives this on the PCIe bus. Did the box also come with an H700? Also, are you running Solaris on the bare metal or do you have a hypervisor in-between?
 
Last edited:
The H200 was pulled from another system. It is not the integrated version. Open Indiana is running on bare metal. The only other HBA's I have are the Dell Perc 6i cards which are much older. The system did come with an H700 I guess I can test with that but I would prefer to avoid hardware raid at all costs.
 
Update:

I connected the 14 ssds to the internal H700 Raid card. I created 2 7 disk Raid 0 stripes. In Solaris I created a RAID 0 of the two. The benchmarks were the same as with the H200.
 
Have you tried the Solaris11 liveCD? Now you are using the beta version of Solaris 11: Express. Have you tried OpenIndiana liveCD? And Nexenta liveCD?
 
Have you tried the Solaris11 liveCD? Now you are using the beta version of Solaris 11: Express. Have you tried OpenIndiana liveCD? And Nexenta liveCD?

I intially tried Solaris 11 Express. I then switched to Open Indiana with no noticible difference. Right now I'm installing ESXi 5 and passing through the HBA cards to see if that makes a difference.
 
I can't think of anything else that might be causing the issues you are having, unless there is some kind of PCIe problem in the box you are using (though that is unlikely and would be causing worse problems). I know you hate hardware R0, but just for the hell of it try creating a hardware R0 with the drives, and test it in Windows with HD Tune or CrystalDiskMark. If you get the same issues, it is likely a hardware problem somewhere. If it improves in windows it is likely a driver or other OS problem.
 
I think I have figured out the problem. The Dell H200 HBA card (rebranded LSI 9211) is simply not strong enough. It only supports 350MB per port and seems to have a max of 600MB so that makes sense that when testing individual drives, even the ZeusRam, we are only seeing 350MB/sec writes.
 
The IBM M1015 is a good and fast card. It is the same card as this, but rebranded:
http://www.servethehome.com/ibm-serveraid-m1015-part-2-performance-lsi-92208i/
On ebay it costs $80 or so. Get two of them?

I suggest you try to connect some of the SSD disks directly into the motherboard's SATA connections, and try again.

Money is not really an issue. What are the best performing HBA cards?

We can't connect the hard drives to the motherboard easily. There are only two onboard ports, 1 is used for the cdrom. There are no molex or sata power cables in Dell servers plus there is no place to put a drive, all the drive bays are connected to the backplane.
 
just curious you said you tried it with the h710 and it was just as bad, how then do you figure its the h200 causing the problem?
 
Back
Top