• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Bad ZFS Performance :(

hotzen

Limp Gawd
Joined
Jan 29, 2011
Messages
349
Hello

I just set-up the following Solaris 11 Express Box:
  • C2D 6600 (2*2,4Ghz; LGA775 with 965P Gigabyte-Board)
  • 4*1GB DDR2
  • Intel G2-V SSD as Boot-Drive (SATA-Port 0)
  • 3* Seagate Barracuda Green 5900.3 (5900rpm, 2TB, SATA 6Gbps interface; connected to SATA-Ports 1-3, which are actually SATA II/3Gbps)
  • Intel CT Gigabit Adapter

I am using Solaris 11 Express from the textinstaller and I am currently copying media-files over Gigabit (Solaris CIFS/SMB) to an encrypted ZFS filesystem.

Writing is maxed-out at 31,7 MB/secs which is .... not so good.

vmstat 5:
Code:
kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy id
 5 0 0 2072924 513004 0   0  0  0  0  0  0  0 158 161 158 2470 68 5830 0 87 13
 2 0 0 2072888 512968 0   0  0  0  0  0  0  0 209 207 208 2434 95 5757 0 86 14
 6 0 0 2072792 512872 0   0  0  0  0  0  0  0 163 164 162 2442 68 5730 0 87 13
 5 0 0 2072676 512756 0   0  0  0  0  0  0  0 159 158 159 2261 66 4949 0 82 18
 5 0 0 2072612 512692 0   0  0  0  0  0  0  0 104 105 104 2021 69 4277 0 73 27

fsstat zfs 5
Code:
kai@knecht:~$ fsstat zfs 5
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
1.29K   155    30 2.75M 4.04K   470K 4.81K 77.6K  250M 1.28M 40.8G zfs
    0     0     0  9.6K     0      0     0     2    72 4.80K  154M zfs
    1     0     0  9.5K     0     10     0     4 2.95K 4.77K  153M zfs
    0     0     0  9.6K     0      0     0     0     0 4.78K  153M zfs
    1     0     0 13.0K     3     28     0     7  204K 6.51K  208M zfs
    0     0     0 12.4K     0      0     0     0     0 6.20K  198M zfs

Hm, the only thing I can get out of these charts, is that the CPU is working quite hard but idle=13 should mean, there is some free utilization left, isn't it?
Secondly, the "sr" Scan-Rate for Memory Pages is zero so the memory seems to be sufficient.

These are some dd-results:
Code:
/tank1/crypt$ dd if=/dev/zero of=./test.bin bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 11.6997 s, 89.6 MB/s
/tank1/crypt$ dd if=/dev/zero of=./test.bin bs=10M count=100
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 12.9759 s, 80.8 MB/s
/tank1/crypt$ dd if=/dev/zero of=./test.bin bs=100M count=10
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 12.8888 s, 81.4 MB/s

So.... how the hell can I get at least 50MB/secs over SMB/CIFS?
Any advice appreciated
 
Last edited:
What OS are you transferring to? Could this be a SMB1 vs SMB2 difference? Also, what are you over the wire speeds without encryption?
 
I am transferring from Win7 x64 Prof SP1, using the Solaris built-in kernel-embedded CIFS/SMB Service.

Setting sync to disabled and restarting smb gets me about 38 MB/secs so it's a slight advancement.

For the sake of completeness, using dd on the unencrypted zfs-pool:
Code:
/tank1$ dd if=/dev/zero of=test.bin bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 6.54678 s, 160 MB/s

Thank you.
 
Could you let us know the performance on an unencrypted share across the network?
 
Will try that in a second. Currently the smb-service is bitching and does not let me access any shares FCUK
Code:
 knecht smbsrv: [ID 421734 kern.notice] NOTICE: [KNECHT\asdf]: ipc$ share not found

I remember from a test some hours ago it wasn't much higher, around 35MB/sec. Will retest that in a moment...

What about this message?
Code:
Apr 17 16:12:18 knecht unix: [ID 954099 kern.info] NOTICE: IRQ16 is being shared by drivers with different interrupt levels.
Apr 17 16:12:18 knecht This may result in reduced system performance.
Apr 17 16:36:13 knecht usba: [ID 912658 kern.info] USB 2.0 device (usb46a,21) operating at low speed (USB 1.x) on USB 1.10 root hub: device@2, usb_mid0 at bus address 2
Apr 17 16:36:13 knecht genunix: [ID 936769 kern.info] usb_mid0 is /pci@0,0/pci1458,5004@1d/device@2
Apr 17 16:36:13 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2 (usb_mid0) online
Apr 17 16:36:13 knecht usba: [ID 912658 kern.info] USB 2.0 interface (usbif46a,21.config1.0) operating at low speed (USB 1.x) on USB 1.10 root hub: keyboard@0, hid1 at bus address 2
Apr 17 16:36:13 knecht genunix: [ID 936769 kern.info] hid1 is /pci@0,0/pci1458,5004@1d/device@2/keyboard@0
Apr 17 16:36:13 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2/keyboard@0 (hid1) online
Apr 17 16:36:13 knecht usba: [ID 912658 kern.info] USB 2.0 interface (usbif46a,21.config1.1) operating at low speed (USB 1.x) on USB 1.10 root hub: input@1, hid2 at bus address 2
Apr 17 16:36:13 knecht genunix: [ID 936769 kern.info] hid2 is /pci@0,0/pci1458,5004@1d/device@2/input@1
Apr 17 16:36:13 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2/input@1 (hid2) online
Apr 17 16:36:14 knecht unix: [ID 954099 kern.info] NOTICE: IRQ16 is being shared by drivers with different interrupt levels.
Apr 17 16:36:14 knecht This may result in reduced system performance.
Apr 17 16:36:14 knecht unix: [ID 954099 kern.info] NOTICE: IRQ16 is being shared by drivers with different interrupt levels.
Apr 17 16:36:14 knecht This may result in reduced system performance.
Apr 17 16:44:50 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2/keyboard@0 (hid1) offline
Apr 17 16:44:50 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2/input@1 (hid2) offline
Apr 17 16:44:50 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2/keyboard@0 (hid1) offline
Apr 17 16:44:50 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2/input@1 (hid2) offline
Apr 17 16:44:50 knecht genunix: [ID 408114 kern.info] /pci@0,0/pci1458,5004@1d/device@2 (usb_mid0) removed
Apr 17 16:44:50 knecht unix: [ID 954099 kern.info] NOTICE: IRQ16 is being shared by drivers with different interrupt levels.
Apr 17 16:44:50 knecht This may result in reduced system performance.

How do I know which devices are sharing interrupts?
 
so far, problems with interrupts 16 and 19 are reported.
Working on interpreting the following mdb-output. Any advice appreciated.

Code:
root@knecht:/proc# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs sata sd ip hook neti arp usba uhci stmf_sbd stmf sockfs fctl md lofs random idm smbsrv nfs sppp crypto ptm fcp cpc fcip nsmb ufs logindmux ]
> ::interrupts
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
9    0x80 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
16   0x83 9   PCI    Lvl Fixed  1   2     0x0/0x10  uhci_intr, 0
18   0x81 9   PCI    Lvl Fixed  1   2     0x0/0x12  uhci_intr, ehci_intr
19   0x85 9   PCI    Lvl Fixed  1   2     0x0/0x13  uhci_intr, ahci_intr
21   0x84 9   PCI    Lvl Fixed  0   1     0x0/0x15  uhci_intr
23   0x82 9   PCI    Lvl Fixed  0   2     0x0/0x17  uhci_intr, ehci_intr
24   0x40 5   PCI    Edg MSI    0   1     -         ahci_intr
25   0x60 6   PCI    Edg MSI    0   1     -         e1000g_intr_pciexpress
160  0xa0 0          Edg IPI    all 0     -         poke_cpu
208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
209  0xd1 14         Edg IPI    all 1     -         cbe_fire
210  0xd3 14         Edg IPI    all 1     -         cbe_fire
240  0xe0 15         Edg IPI    all 1     -         xc_serv
241  0xe1 15         Edg IPI    all 1     -         apic_error_intr
 
Not sure of the interupt issue...however...re.
Try this from the command line:

zfs set sync=disabled tank1

and retest.

Don't do this - unless you like putting your pool at risk.

sync=disabled
Synchronous requests are disabled. File system transactions only commit to stable storage on the next DMU transaction group commit which can be many seconds. This option gives the highest performance.

However, it is very dangerous as ZFS is ignoring the synchronous transaction demands of applications such as databases or NFS. Setting sync=disabled on the currently active root or /var file system may result in out-of-spec behavior, application data loss and increased vulnerability to replay attacks.

This option does *NOT* affect ZFS on-disk consistency.

Administrators should only use this when these risks are understood.
 
Since it does not affect the consistency of the zpool, I think the option is worth mentioning, especially on a simple SMB Fileserver. But thanks for your warning.

Any idea what to try next?
Are there any solaris-pros that know how to specifically check what the bottleneck is?
For example, zfs-checksum-thread, zfs-encryption-thread, smb/cifs daemon, hdd-performance, etc?
 
Of course. I also did an netio bench and the results are awful: Around 50MB over TCP
Intel Gigabit CT Adapter Win7 SP1 <=> HP ProCurve 1800 <=> Intel Gigabit CT Adapter Solaris 11 Express

The next things to check:
- possible IRQ-interference on the PCIe Intel NIC
- disabling USB due to IRQ interference
- using cheap switch
- new Patchcables
- Using the IBM M1015 in the PCIe x16 as soon as the new board arrives to eliminate possible SATA II 3Gbps vs SATA 6 Gbps problems.
 
how are you testing the cifs speed?

you arn't using teracopy or other copy manager are you?

have you tested raw tcpip throughput with iperf?

what are the nfs and iscsi speeds?

also i note this thad is titled bad zfs performance...it seems more like bad cifs performance.
 
I tested CIFS-speed with Windows Explorer. Will try robocopy...

I tested raw tcpip with netio, will try iperf too, Thanks.
I do neither use NFS nor iSCSI, only CIFS/SMB. Any yes the thread-title is misleading, I am fighting bad CIFS performance.
 
Back
Top