Degraded performance over time (Bonded Intel gigabit adventures)

hokatichenci · Dec 14, 2006

I've got my two systems, each with 4 gigabit ports, 2 from skge/forcedeth and 2 from the Intel Pro/1000 PT Dual Port Server card (bonded into two different virtual nics using roundrobin). Linux 2.6.18. The Intel nic's are connected to each other using a crossover, so they are directly connected using an MTU of 9000. When I run iperf for a short amount of time (~5s), everything is peachy - 1.8-1.9gbit and lookin pretty. When I bump it up to about 15-30 seconds, performance degrades very quickly. It'll go from 1.8gbit to 800mbit, even 500/400mbit. Sometimes it'll report outrageous numbers like 10-50mbit. I'm thinking that something is going on and the longer I run it the greater chance that something bad happens. I've tried multiple TCP congestion versions (bic, highspeed, reno). I'm not maxxing out memory or CPU - cpu stays stable around probably 30-40% with jumbo frames enabled (80+ without). I'd like to move to 4gbit connections by this weekend, so any help/suggestions would be greatly appreciated.

Here's an example output:

Code:

iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3044
[  4]  0.0-10.0 sec  2.22 GBytes  1.90 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3045
[  4]  0.0-10.0 sec  2.15 GBytes  1.85 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 4795
[  4]  0.0-15.0 sec  3.23 GBytes  1.85 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 4796
[  4]  0.0-15.0 sec  3.19 GBytes  1.83 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 4797
[  4]  0.0-15.0 sec  3.32 GBytes  1.90 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3968
[  4]  0.0- 5.0 sec  1.08 GBytes  1.85 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3969
[  4]  0.0- 5.0 sec  1.04 GBytes  1.79 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3970
[  4]  0.0- 5.0 sec  1.03 GBytes  1.77 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3971
[  4]  0.0-20.0 sec    333 MBytes    140 Mbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 2837
[  4]  0.0-20.0 sec    376 MBytes    158 Mbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 2838
[  4]  0.0- 5.0 sec  1.09 GBytes  1.87 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3988
[  4]  0.0- 5.0 sec  1.11 GBytes  1.91 Gbits/sec
[  4] local 192.168.1.10 port 5001 connected with 192.168.1.11 port 3989
[  4]  0.0-30.0 sec  2.46 GBytes    704 Mbits/sec

Madwand · Dec 14, 2006

Round Robin isn't exactly kosher, which is why it isn't part of the standard.

Frame ordering must be maintained for certain sequences of frame exchanges between MAC Clients (known as conversations, see 1.4). The Distributor ensures that all frames of a given conversation are passed to a single port. For any given port, the Collector is required to pass frames to the MAC Client in the order that they are received from that port. The Collector is otherwise free to select frames received from the aggregated ports in any order. Since there are no means for frames to be mis-ordered on a single link, this guarantees that frame ordering is maintained for any conversation.

http://linux-net.osdl.org/index.php/Bonding

This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput. This comes at a cost, however: the striping often results in peer systems receiving packets out of order, causing TCP/IP's congestion control system to kick in, often by retransmitting segments.

Some trace analysis tools might help you see what's going on in greater detail.

Otherwise, you might have to do what mere mortals do --- rely on multiple independent connections ("conversations") to get parallelism.

hokatichenci · Dec 15, 2006

I actually found out what the problem was - iperf itself. Something was bugged, I upgraded iperf and I am now getting completely perfect speeds. I was also able to tune a about another 100mbit out of it by upping the default Linux settings to use more memory. Out of about 2-3gb of data transferred I think I only ended up with 1 or 2 out-of-order issues that I could see.

Madwand · Dec 15, 2006

That's impressive if correct -- and with standard TCP/IP?

hokatichenci · Dec 15, 2006

Madwand said:
That's impressive if correct -- and with standard TCP/IP?

With the standard Linux stack yes - I've got advanced routing and some other things going on but nothing that isn't available to anybody else or anything that should effect it. I think the linux bonding module is just really good at handling those sorts of issues.

Degraded performance over time (Bonded Intel gigabit adventures)

hokatichenci

Gawd

Madwand

Gawd

hokatichenci

Gawd

Madwand

Gawd

hokatichenci

Gawd