Performance issue with 10 Gbps network

Discussion in 'SSDs & Data Storage' started by pclausen, Oct 20, 2015.

  1. pclausen

    pclausen Limp Gawd

    Messages:
    458
    Joined:
    Jan 30, 2008
    I recently upgraded my home network to include a central switch with a pair of SFP+ ports. The switch I got was the 48 port 500W model seen here:

    https://www.ubnt.com/unifi-switching-routing/unifi-switch/

    I got a pair of Intel X520 NICs. One installed in my FreeNAS server and connected to the switch via a SFP+ twinax cable.

    The other X520 is installed in my Windows 10 workstation and connected to the switch via a 50 ft OM3 cable.

    iperf running from the workstation shows the following:

    Code:
    C:\iperf>iperf -p 5001 -c 10.0.1.50 -w 512k
    ------------------------------------------------------------
    Client connecting to 10.0.1.50, TCP port 5001
    TCP window size:  512 KByte
    ------------------------------------------------------------
    [  3] local 10.0.1.53 port 57211 connected with 10.0.1.50 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  5.74 GBytes  4.92 Gbits/sec
    When I increase to 6 threads, throughput almost doubles:

    Code:
    C:\iperf>iperf -p 5001 -c 10.0.1.50 -w 512k -P 6
    ------------------------------------------------------------
    Client connecting to 10.0.1.50, TCP port 5001
    TCP window size:  512 KByte
    ------------------------------------------------------------
    [  7] local 10.0.1.53 port 63293 connected with 10.0.1.50 port 5001
    [  8] local 10.0.1.53 port 63294 connected with 10.0.1.50 port 5001
    [  3] local 10.0.1.53 port 63289 connected with 10.0.1.50 port 5001
    [  4] local 10.0.1.53 port 63290 connected with 10.0.1.50 port 5001
    [  6] local 10.0.1.53 port 63292 connected with 10.0.1.50 port 5001
    [  5] local 10.0.1.53 port 63291 connected with 10.0.1.50 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  2.42 GBytes  2.08 Gbits/sec
    [  4]  0.0-10.0 sec  1.54 GBytes  1.32 Gbits/sec
    [  5]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
    [  7]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
    [  8]  0.0-10.0 sec  2.42 GBytes  2.08 Gbits/sec
    [  6]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
    [SUM]  0.0-10.0 sec  11.0 GBytes  9.42 Gbits/sec
    So my first question is, why does single thread performance only appear to be about 50% of what the link should be capable of?

    When I copy from the server to the workstation, I'm only getting about 600 Mbps as seen here:

    [​IMG]

    When I go the other way, I get about 2 Gbps as seen here:

    [​IMG]

    The workstation has 4 Samsung 128GB 840 PROs in RAID0 so that should not be the bottleneck. Crystal gives me the following:

    [​IMG]

    Any ideas about what the issue might be?
     
  2. SomeGuy133

    SomeGuy133 2[H]4U

    Messages:
    3,447
    Joined:
    Apr 12, 2015
    make a RAM disk to verify to make sure it isn't a HDD issue.

    I always test netoworks using RAM and not disks
     
  3. tazeat

    tazeat [H]ard|Gawd

    Messages:
    1,253
    Joined:
    Jul 3, 2007
    I don't know what the answer is, but I've experienced similar things. for kicks, use an Ubuntu USB boot disk on both machines and try again. Bet its basically 10g in iperf without even tweaking anything.
     
  4. HammerSandwich

    HammerSandwich [H]ard|Gawd

    Messages:
    1,116
    Joined:
    Nov 18, 2004
    2nd graph is showing C: at 100% busy.
     
  5. pclausen

    pclausen Limp Gawd

    Messages:
    458
    Joined:
    Jan 30, 2008
    Excellent suggestions. Doesn't iperf use RAM to test network speed? That said, I will go ahead and setup a RAM disk, at least on the windows side, so run another test. I only got 8 GB of ram there, but I should still be able to test sending a ~5GB size file for testing.

    I'll do the Ubuntu USB boot disk on the workstation side for sure. I suspect the Windows 10Gig driver is probably not as efficient as on the FreeNAS side. I tried tweaking some of the settings, but that actually made the result worse.

    Good catch on C: at 100% busy on 2nd graph. Workstation has a Z87 chip set and I'm running Intel's RST. Do I need a special driver or something to fully take advantage of it? Crystal Disk Mark seems to provide the expected speed, but maybe that does not really reflect real work throughput?

    Btw, hardware specs on each machine as follows:

    FreeNAS server
    SuperMicro X10SRL-F Motherboard
    Xeon E5-1620 3.5GHz CPU
    4x Samsung 16GB DDR4 ECC 2133
    2x LSI SAS9200-e8 HBA Controller, each connected to external 24 bay backplane via 24 Gbps SFF-8088 cable
    1x LSI SAS9211-i8 HBA Controller connected to internal 24 bay SAS2 backplane via 24 Gbps SFF-8087 cable
    4x 10 2TB RAID-Z2
    1x 10 4TB RAID-Z2
    Intel X520 10 Gbps Dual Port NIC

    Workstation
    Asus Maximo Hero VI Z87 Motherboard
    Intel I7 4770K CPU
    2x 4GB G,Skill DDR3-1900
    4x Samsung 840 PRO in RAID0
    Intel X520 10 Gbps Dual Port NIC
     
  6. cbf123

    cbf123 n00b

    Messages:
    50
    Joined:
    Dec 12, 2013
    It's likely that the NIC driver can't handle the full line rate on a single core, and by running multiple iperf threads in parallel the traffic can be spread across multiple queues at the destination and be processed in parallel. (A single 'flow' can't be processed by multiple CPUs at the destination due to issues around packet reordering.)

    You might try looking at the individual CPU utilization when running with more iperf threads. I suspect you'll see more cores getting busy.

    To process at line rate at 10G on a single core you generally need to do things like bypass the OS stack entirely with something like Intel's DPDK framework.
     
  7. SomeGuy133

    SomeGuy133 2[H]4U

    Messages:
    3,447
    Joined:
    Apr 12, 2015
    thats another thing with RAID 0 you are lightly using all your CPU power in IOPS if it is small read/write. I know Toms hardware on IRST refer was at 70% CPU on a quad core xeon because the amount of CPU needed for IRST RAID.
     
  8. Tim_H

    Tim_H n00b

    Messages:
    48
    Joined:
    Jul 19, 2011
    I've had some issues over gigabit that were similar to this.

    I was using Teracopy and could only copy at about 1/3 line speed from the server. switched back to windows copy utility and problem went away, I'm able to saturate the link now.
     
  9. pclausen

    pclausen Limp Gawd

    Messages:
    458
    Joined:
    Jan 30, 2008
    So I did some additional testing. First I booted the workstation into FreeNAS and did a loopback test the validate the hardware was ok. It was. I'm getting a little over 45 Gbps, which is actually a little better than the server, which get 41 Gbps on this test.

    [​IMG]

    I then created a RAM disk using ImDisk and performed the following test.

    RAM Disk to Raid0:

    [​IMG]

    Raid0 to RAM Disk:

    [​IMG]

    So those numbers all look good.

    Next was FreeNAS to RAM disk:

    [​IMG]

    So terrible performance despite almost no CPU utilization.

    RAM disk to FreeNAS:

    [​IMG]

    Certainly better performance, but still way short of what it should be. And again CPU utilization is very low.

    I'm beginning to think I'm fighting some issue with Windows 10 and the Intel X520 drivers. This is a clean win10 install from just 4 days ago. I'm on the fast ring, build 10565.

    Btw, I get the same results using ftp vs windows file explorer.
     
    Last edited: Oct 23, 2015
  10. SomeGuy133

    SomeGuy133 2[H]4U

    Messages:
    3,447
    Joined:
    Apr 12, 2015
    Thats weird. Sorry i am of no help in this.
     
  11. nomas

    nomas Limp Gawd

    Messages:
    246
    Joined:
    Apr 19, 2015
    TCP Window size could have a dramatic effect on throughput. Investigate by varying it for the applications if possible.
     
  12. lordsegan

    lordsegan Gawd

    Messages:
    624
    Joined:
    Jun 16, 2004
    Did you get this resolved? What if you try booting into OmniOS and trying that? I have heard it is more performant.