High latency / low bandwidth with Infiniband

Thaxll

n00b
Joined
Jul 12, 2013
Messages
2
Hello,

I'm trying to debug a latency / bandwidth issue with 2 servers ( DELL 860 ) connected by infiniband.
Servers use 10Gb cards connected on a MTS2400, here are the result for bw and latency:

Code:
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[  3]  0.0-600.0 sec    374 GBytes  5.35 Gbits/sec  0.002 ms 433015/273309755 (0.16%)

Code:
[root@node-002 network-scripts]# ib_write_bw 10.0.0.1
------------------------------------------------------------------
                    RDMA_Write BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Mtu             : 2048B
 Link type       : IB
 Max inline data : 0B
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x05 QPN 0x70407 PSN 0x7dd459 RKey 0x340041 VAddr 0x007fbc2f49a000
 remote address: LID 0x01 QPN 0x60407 PSN 0x7bc639 RKey 0xa00041 VAddr 0x007f281be84000
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 65536      5000           935.36             935.34 
------------------------------------------------------------------


Code:
[root@node-002 network-scripts]# ib_send_lat 10.0.0.1
------------------------------------------------------------------
                    Send Latency Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 50
 CQ Moderation   : 50
 Mtu             : 2048B
 Link type       : IB
 Max inline data : 0B
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x05 QPN 0x80407 PSN 000000
 remote address: LID 0x01 QPN 0x70407 PSN 000000
------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]
 2       1000          5.52           32.15        5.62   
------------------------------------------------------------------

Code:
[root@node-002 network-scripts]# ping -c 5 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.179 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.247 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.246 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.247 ms
64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.250 ms

Code:
[root@node-001 network-scripts]# iblinkinfo
CA: node-002 mthca0:
      0x0002c90200201d21      5    1[  ] ==( 4X           2.5 Gbps Active/  LinkUp)==>       3    2[  ] "MT47396 Infiniscale-III Mellanox Technologies" ( )
Switch: 0x0002c9010c57f7f0 MT47396 Infiniscale-III Mellanox Technologies:
           3    1[  ] ==( 4X           2.5 Gbps Active/  LinkUp)==>       1    1[  ] "node-001 mthca0" ( )
           3    2[  ] ==( 4X           2.5 Gbps Active/  LinkUp)==>       5    1[  ] "node-002 mthca0" ( )
           3    3[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    4[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    5[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    6[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    7[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    8[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    9[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   10[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   11[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   12[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   13[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   14[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   15[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   16[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   17[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   18[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   19[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   20[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   21[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   22[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   23[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   24[  ] ==(                Down/ Polling)==>             [  ] "" ( )
CA: node-001 mthca0:
      0x0002c90200008879      1    1[  ] ==( 4X           2.5 Gbps Active/  LinkUp)==>       3    1[  ] "MT47396 Infiniscale-III Mellanox Technologies" ( )

Are those values 'normal'? I was hoping to get more bw and less latency :/
I'm using CentOS 6.4 with ib_ipoib driver, it's using stock settings.

Thank you,
 
Last edited:
Your latency is a result of resistance over copper. See infiniband still at the core uses copper. Also unless you are direct connected the fastest latency is going to be based on the slowest network interface processor (card processor). If you are going through a switch your connection is going to be as fast as the switch can process and forward the information.

Your latency appears to be very low indeed. I am not sure how lower you could expect to see networking over copper be?

I run a Cisco 4948-10G switch which is a fantastically low latency top of the rack of switch using 10GB Short Reach Fiber connections and my latency is a little lower than yours but that is due to my switch and being that fiber optics have virtually no resistance at all from point of transmission to point of reception unlike copper which may have impurities, different densities of material, foreign particles, alien cross talk, EM interference and a number of other environmental factors both internal and external to the copper.

Of course im running a what was $37,000 dollar 48 port switch that I purchased used for around $1200.00 now that they are End of Life and replaced by the 4948E series.
 
I would dare say you shouldn't use ping as a latency indicator for the entire setup.

The benchmark above it shows 0.005 msec, which seems more in line, but a little high for what you're working with.
 
I'm just using simple connected-mode 20Gbps ipoib between my primary storage box and a linux KVM server. On startup one of my init scripts does:

Code:
echo connected >`find /sys -name mode | grep ib1`
echo 65520 >`find /sys -name mtu | grep ib1`

Code:
root@titan:~# ping ib-storage
PING ib-storage (192.168.14.10) 56(84) bytes of data.
64 bytes from 192.168.14.10: icmp_req=1 ttl=64 time=0.086 ms
64 bytes from 192.168.14.10: icmp_req=2 ttl=64 time=0.080 ms
64 bytes from 192.168.14.10: icmp_req=3 ttl=64 time=0.109 ms
64 bytes from 192.168.14.10: icmp_req=4 ttl=64 time=0.100 ms
64 bytes from 192.168.14.10: icmp_req=5 ttl=64 time=0.085 ms
64 bytes from 192.168.14.10: icmp_req=6 ttl=64 time=0.081 ms
64 bytes from 192.168.14.10: icmp_req=7 ttl=64 time=0.085 ms

Code:
root@titan:~# ibstat
CA 'mthca0'
        CA type: MT25208 (MT23108 compat mode)
        Number of ports: 2
        Firmware version: 4.7.600
        Hardware version: a0
        Node GUID: 0x0002c902002783d8
        System image GUID: 0x0002c902002783d8
        Port 1:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0002c902002783d9
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 1
                LMC: 0
                SM lid: 2
                Capability mask: 0x02510a68
                Port GUID: 0x0002c902002783da
root@titan:~# ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c902:0027:83d9
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            10 Gb/sec (4X)

Infiniband device 'mthca0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c902:0027:83da
        base lid:        0x1
        sm lid:          0x2
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            20 Gb/sec (4X DDR)
 
Back
Top