Performance issue with 10 Gbps network

pclausen

Gawd
Joined
Jan 30, 2008
Messages
697
I recently upgraded my home network to include a central switch with a pair of SFP+ ports. The switch I got was the 48 port 500W model seen here:

https://www.ubnt.com/unifi-switching-routing/unifi-switch/

I got a pair of Intel X520 NICs. One installed in my FreeNAS server and connected to the switch via a SFP+ twinax cable.

The other X520 is installed in my Windows 10 workstation and connected to the switch via a 50 ft OM3 cable.

iperf running from the workstation shows the following:

Code:
C:\iperf>iperf -p 5001 -c 10.0.1.50 -w 512k
------------------------------------------------------------
Client connecting to 10.0.1.50, TCP port 5001
TCP window size:  512 KByte
------------------------------------------------------------
[  3] local 10.0.1.53 port 57211 connected with 10.0.1.50 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  5.74 GBytes  4.92 Gbits/sec

When I increase to 6 threads, throughput almost doubles:

Code:
C:\iperf>iperf -p 5001 -c 10.0.1.50 -w 512k -P 6
------------------------------------------------------------
Client connecting to 10.0.1.50, TCP port 5001
TCP window size:  512 KByte
------------------------------------------------------------
[  7] local 10.0.1.53 port 63293 connected with 10.0.1.50 port 5001
[  8] local 10.0.1.53 port 63294 connected with 10.0.1.50 port 5001
[  3] local 10.0.1.53 port 63289 connected with 10.0.1.50 port 5001
[  4] local 10.0.1.53 port 63290 connected with 10.0.1.50 port 5001
[  6] local 10.0.1.53 port 63292 connected with 10.0.1.50 port 5001
[  5] local 10.0.1.53 port 63291 connected with 10.0.1.50 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.42 GBytes  2.08 Gbits/sec
[  4]  0.0-10.0 sec  1.54 GBytes  1.32 Gbits/sec
[  5]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
[  7]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
[  8]  0.0-10.0 sec  2.42 GBytes  2.08 Gbits/sec
[  6]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
[SUM]  0.0-10.0 sec  11.0 GBytes  9.42 Gbits/sec

So my first question is, why does single thread performance only appear to be about 50% of what the link should be capable of?

When I copy from the server to the workstation, I'm only getting about 600 Mbps as seen here:

copytoworkstation.PNG


When I go the other way, I get about 2 Gbps as seen here:

copytofreenas.PNG


The workstation has 4 Samsung 128GB 840 PROs in RAID0 so that should not be the bottleneck. Crystal gives me the following:

crystaldiskmark.PNG


Any ideas about what the issue might be?
 
make a RAM disk to verify to make sure it isn't a HDD issue.

I always test netoworks using RAM and not disks
 
I don't know what the answer is, but I've experienced similar things. for kicks, use an Ubuntu USB boot disk on both machines and try again. Bet its basically 10g in iperf without even tweaking anything.
 
Excellent suggestions. Doesn't iperf use RAM to test network speed? That said, I will go ahead and setup a RAM disk, at least on the windows side, so run another test. I only got 8 GB of ram there, but I should still be able to test sending a ~5GB size file for testing.

I'll do the Ubuntu USB boot disk on the workstation side for sure. I suspect the Windows 10Gig driver is probably not as efficient as on the FreeNAS side. I tried tweaking some of the settings, but that actually made the result worse.

Good catch on C: at 100% busy on 2nd graph. Workstation has a Z87 chip set and I'm running Intel's RST. Do I need a special driver or something to fully take advantage of it? Crystal Disk Mark seems to provide the expected speed, but maybe that does not really reflect real work throughput?

Btw, hardware specs on each machine as follows:

FreeNAS server
SuperMicro X10SRL-F Motherboard
Xeon E5-1620 3.5GHz CPU
4x Samsung 16GB DDR4 ECC 2133
2x LSI SAS9200-e8 HBA Controller, each connected to external 24 bay backplane via 24 Gbps SFF-8088 cable
1x LSI SAS9211-i8 HBA Controller connected to internal 24 bay SAS2 backplane via 24 Gbps SFF-8087 cable
4x 10 2TB RAID-Z2
1x 10 4TB RAID-Z2
Intel X520 10 Gbps Dual Port NIC

Workstation
Asus Maximo Hero VI Z87 Motherboard
Intel I7 4770K CPU
2x 4GB G,Skill DDR3-1900
4x Samsung 840 PRO in RAID0
Intel X520 10 Gbps Dual Port NIC
 
It's likely that the NIC driver can't handle the full line rate on a single core, and by running multiple iperf threads in parallel the traffic can be spread across multiple queues at the destination and be processed in parallel. (A single 'flow' can't be processed by multiple CPUs at the destination due to issues around packet reordering.)

You might try looking at the individual CPU utilization when running with more iperf threads. I suspect you'll see more cores getting busy.

To process at line rate at 10G on a single core you generally need to do things like bypass the OS stack entirely with something like Intel's DPDK framework.
 
It's likely that the NIC driver can't handle the full line rate on a single core, and by running multiple iperf threads in parallel the traffic can be spread across multiple queues at the destination and be processed in parallel. (A single 'flow' can't be processed by multiple CPUs at the destination due to issues around packet reordering.)

You might try looking at the individual CPU utilization when running with more iperf threads. I suspect you'll see more cores getting busy.

To process at line rate at 10G on a single core you generally need to do things like bypass the OS stack entirely with something like Intel's DPDK framework.

thats another thing with RAID 0 you are lightly using all your CPU power in IOPS if it is small read/write. I know Toms hardware on IRST refer was at 70% CPU on a quad core xeon because the amount of CPU needed for IRST RAID.
 
I've had some issues over gigabit that were similar to this.

I was using Teracopy and could only copy at about 1/3 line speed from the server. switched back to windows copy utility and problem went away, I'm able to saturate the link now.
 
So I did some additional testing. First I booted the workstation into FreeNAS and did a loopback test the validate the hardware was ok. It was. I'm getting a little over 45 Gbps, which is actually a little better than the server, which get 41 Gbps on this test.

loopbackworkstation.JPG


I then created a RAM disk using ImDisk and performed the following test.

RAM Disk to Raid0:

ramdisktoraid0.PNG


Raid0 to RAM Disk:

raid0toramdisk.PNG


So those numbers all look good.

Next was FreeNAS to RAM disk:

freenastoramdisk.PNG


So terrible performance despite almost no CPU utilization.

RAM disk to FreeNAS:

ramdisktofreenas.PNG


Certainly better performance, but still way short of what it should be. And again CPU utilization is very low.

I'm beginning to think I'm fighting some issue with Windows 10 and the Intel X520 drivers. This is a clean win10 install from just 4 days ago. I'm on the fast ring, build 10565.

Btw, I get the same results using ftp vs windows file explorer.
 
Last edited:
TCP Window size could have a dramatic effect on throughput. Investigate by varying it for the applications if possible.
 
Did you get this resolved? What if you try booting into OmniOS and trying that? I have heard it is more performant.
 
Back
Top