Testing 25 gbps to 100 gbps networks

ochadd

[H]ard|Gawd
Joined
May 9, 2008
Messages
1,297
This year I installed our first network that wasn't 1 gbps. Have servers connected with 25 gbps DACs with several teams of 2x25 gbps. I've used iperf for performance testing for many years but I've yet to get it over 5 gbps and change. Wondering is there's something else that's a go-to tool for testing the big stuff?

Edit: 2/6/23. It was a corrupted Hyper-V vswitch. Had to remove the host from the cluster. Delete the vswitch. Recreate the vswitch. Add it back to the cluster. So far it's been running fine for about a week. Only took six months to figure out.
 
Last edited:
I can't get 1Gb even on a 1Gb sometimes without the -P switch (parallel threads). I've had to use as many as 20 to make my powerline adapters max out in testing, and usually 10 to max out my IPsec tunnels. It also helps to use the -d or -r switches as well to test in both directions and/or simultaneously.
 
Are you doing a single thread test? If so probably be CPU bound.
I've tried 2, 4, 6, and 20 using the -P flag but the most I've got is 4.55 Gbits/sec. 4 gets me about 3.2 gbps and 20 gets me 4.55 gbps. All Qlogic 2x25GE NICs.

edit: Tried the -w 128M flag as well.
The following commands got me 5.36 Gbits/sec.

iperf3.exe -c hostname -P 20 -w 128M

1659735553800.png
 
Last edited:
Hmmm...that's a bit strange. Have you tried the -d and -r switches yet?
 
What's the entire network look like? Need details before really being able to troubleshoot beyond blind guessing.
 
128M is size flag? If So You many need to bump that up see higher throughput.

The maximum TCP window size that can be achieved using TCP window scaling is for a scaling factor of 2^14=16,384, which would give a window size of 16,384 x 65,535 = 1,073,725,440 bytes, or approximately 1 GB. It should be noted that the TCP counters are 32 bits in size, so they can still be used without change because they can hold a maximum value of 4 GB.
 
I can routinely saturate 40Gbps with iperf3 running direct between two servers. When I toss my ICX between them I get a slight hit, down to ~35Gbps. Switch-to-switch (I have two ICXs, one on each floor of the house, the 40Gbps is the "backbone" between them), I can't accurately test but I can max out 10Gbps on two workstations hitting one server simultaneously.

Also noticing OP is using Windows as well. No idea how good their drivers are out of the box there. Or how the OP "teamed" things (and why). Or even what NICs are being used. Need a ton more details before we can really give meaningful input here.
 
Three servers each with two Qlogic 2x25gb NICS. Two ports for ISCSI and two ports for LAN traffic. Qlogic 2x25GE QL41262HMCU and Qlogic FastLinQ QL41262-DE NIC in each server. Connected to two Dell 5224F-ON switches using SFP28 DACs. The cables are like 3' long. LAN teaming is done within Windows and each port is seen as 25 gbps link with 50 Gbps aggregate, Microsoft Network Adapter Multiplexor Driver. The teaming in this way is Microsoft best practices for HA clustering.

Edit: Tried upping the iperf window size to 256M and that got me 6.87 Gbits/sec. 512M got me 12.8 Gbits/sec but I did see one burst to 42.9 Gbits/sec. Not sure I'm reading this correctly or if the speeds are just this erratic. Keep in mind the highest performance stuff I've worked on is mostly gigabit and a little 2.5gbps stuff before this. The business application performance is fine but live migrations are crazy variable. Can take 3 seconds to migrate an 8GB memory VM one time and 6 minutes the next time. I have a managed services group and Microsoft itself looking into but no one has been able to figure it out.

The folks I'm paying to help me seem to be ok with performance as is. Everyone involved agrees it should be faster than it is but haven't figured out yet. I'm not leaving 20 gbps off the table. Could have spent 1/8 as much if that were the case. As of this morning they're pointing at driver problems or OS compatibility problems. This exact hardware combination is installed in other companies but they're using VMware and I'm running Hyper-V.

C:\iperf\iperf3.exe -c hostname -P 20 -w 512M
Output
1659976141777.png


C:\iperf\iperf3.exe -c hostname -P 20 -w 512M -d
Output
1659977628592.png
 

Attachments

  • 1659976336919.png
    1659976336919.png
    137.1 KB · Views: 0
Last edited:
Aggregation of tests. Single client might not be able to push enough to fill the pipe. You may need many.
 
  • Like
Reactions: xx0xx
like this
I've not done any Windows network optimization, but I know there's some good things. I can't find a good tutorial, but you may want/need to enable Receive Side Scaling, somehow. If your servers are multi-socket, you might need to fiddle with getting the iperfs bound to the same socket that the NICs are connected to. If your CPU threads are bouncing around, you might be getting inconsistent results; good when there's no cross-cpu pciE traffic, and bad when there is? Could be similar concerns during VM migration as well?
 
Reduce the complexity first:
What happens if you take the switch out of the equation?
Have you tried undoing the "teaming" and tested?

Remove drivers (and other system / application / service overhead) from the equation:
Can you try booting a linux liveCD on two servers and running iperf3 on those?
 
How saturated are the CPU cores when running iperf? What are the server specs? Got enough PCIe lanes for everything installed?
All three are near idle during the test. Xeon Gold 6244 x 2 in each server. This is a production environment so I can't do all the tests I'd like to without potential downtime. I like the Linux LiveCD idea and could possibly try that on one of the servers. Wouldn't be able to have two to test against though as these three machines are all that is connected at 25gbps. Might have to get a 25 gigabit NIC for my machine to try against.

1659983076029.png
 
Personally, and I recognize this ship has sailed, but I wouldn't have ever put that into production. It's broken.
 
Personally, and I recognize this ship has sailed, but I wouldn't have ever put that into production. It's broken.

I agree. When we did the handoff the migrations demonstrated worked fine. I witnessed a VM with 240GB of memory move in a few seconds and back again. The installers told me they had this problem early on but got it resolved. Now the problem exists on all the hosts and I can't Live Migrate off. Quick migrate works fine.

The firewalls are off. I did find a config discrepency this afternoon in some of the NICs. Virtual Machine Queues and Virtual Switch RSS isn't set all the same between the NICs and the teams. Need to reboot the hosts though to see if that improves anything.
 
Aggregation of tests. Single client might not be able to push enough to fill the pipe. You may need many.
I've not had to do this though with iperf when using the -P switch. You can crank this up high enough to saturate without running parallel sessions. Something else here is wrong.
 
Can you try booting a linux liveCD on two servers and running iperf3 on those?
This is my first thought. My hunch is that it is some sort of windows issue and doing this test would confirm it.
 
All three are near idle during the test. Xeon Gold 6244 x 2 in each server. This is a production environment so I can't do all the tests I'd like to without potential downtime. I like the Linux LiveCD idea and could possibly try that on one of the servers. Wouldn't be able to have two to test against though as these three machines are all that is connected at 25gbps. Might have to get a 25 gigabit NIC for my machine to try against.

View attachment 498884
Hmmm...might be a bit of an oddball test, but since you technically have 2x different cards and 4x nics, you could run tests on that server to and from different nics on itself. This would almost be a bit of a loopback test and not do anything but validate the switch and dacs, but it might shed some light on something.

Otherwise, I would not try a testing scenario where you put a card in your machine as that's a completely different scenario and may just cause confusion if it works correctly.
 
Hmmm...might be a bit of an oddball test, but since you technically have 2x different cards and 4x nics, you could run tests on that server to and from different nics on itself. This would almost be a bit of a loopback test and not do anything but validate the switch and dacs, but it might shed some light on something.

Otherwise, I would not try a testing scenario where you put a card in your machine as that's a completely different scenario and may just cause confusion if it works correctly.

That iSCSI VLAN is going to be unrouted for two of the nics, so I don't think he'd be able to do it against itself. Probably break the team and give each nic it's own IP and test them against themselves to see if 25Gbps bandwidth is available across the card though.
 
That iSCSI VLAN is going to be unrouted for two of the nics, so I don't think he'd be able to do it against itself. Probably break the team and give each nic it's own IP and test them against themselves to see if 25Gbps bandwidth is available across the card though.
Oh, I meant this in the livecd test scenario. No way I would do this under the current setup as I don't think it would give us anything useful.
 
So I forgot my backups box has 25 gig NICs. It has Broadcom NICs with 8 partitions per port, 16 partitions total. Two teams of two partitions each at 25 gbps. Something to do with datacenter bridging. (This is why I hire some stuff out. Can't know everything. Also why I hate farming stuff out)

A different testing scenario showed that all three hosts can send 6-8 Gbps. Two of the hosts can receive at about 3 Gbps. One of the hosts only receives 1.6 megabits per second... Copying a 5 GB iso to it gave an ETA of 21 hours.
"iperf3.exe -c hostname -w 512M -R" returns 2.2 Mbits/sec
"iperf3.exe -c hostname -w 512M -d" returns 7+ Gbits/sec.

I don't know what this tells me but it's repeatable. I thought the -d debug mode did tests in both directions but this has me doubting that. Copying a file from the server I get 800 MBps copy speeds, about 6.4 gbps. Copying to it I get 158 KBps.

1659995605769.png
 
I've not had to do this though with iperf when using the -P switch. You can crank this up high enough to saturate without running parallel sessions. Something else here is wrong.
Actually, no, because then you're at the mercy of the CPU. YMMV of course, You may have a CPU that is capable. Driving at that high of a speed requires muscle.
 
So I forgot my backups box has 25 gig NICs. It has Broadcom NICs with 8 partitions per port, 16 partitions total. Two teams of two partitions each at 25 gbps. Something to do with datacenter bridging. (This is why I hire some stuff out. Can't know everything. Also why I hate farming stuff out)

A different testing scenario showed that all three hosts can send 6-8 Gbps. Two of the hosts can receive at about 3 Gbps. One of the hosts only receives 1.6 megabits per second... Copying a 5 GB iso to it gave an ETA of 21 hours.
"iperf3.exe -c hostname -w 512M -R" returns 2.2 Mbits/sec
"iperf3.exe -c hostname -w 512M -d" returns 7+ Gbits/sec.

I don't know what this tells me but it's repeatable. I thought the -d debug mode did tests in both directions but this has me doubting that. Copying a file from the server I get 800 MBps copy speeds, about 6.4 gbps. Copying to it I get 158 KBps.

View attachment 498995
If you've got something repeatable, I would try swapping the dac cables and see if the problem moves with them. If so, bad/cheap cabling.
 
Actually, no, because then you're at the mercy of the CPU. YMMV of course, You may have a CPU that is capable. Driving at that high of a speed requires muscle.
But doing that in actual different sessions uses even more cpu, so it's the same thing. Personally, I've not seen iperf hammer my cpu in tests as high as 30 threads. But this was only for gigabit but was on a s478 p4 2.6ghz.
 
But doing that in actual different sessions uses even more cpu, so it's the same thing. Personally, I've not seen iperf hammer my cpu in tests as high as 30 threads. But this was only for gigabit but was on a s478 p4 2.6ghz.
Gigabit is not the problem.
 
Just to post an update. The problem has been through the installer/integrator company, two Dell senior engineers, and now Microsoft and Qlogic/Intel are involved via Dell. No one has been able to correct the issue yet. I don't feel so bad for not figuring it out. Sounds like we're moving towards a server replacement by Dell. Every related component and config has been analyzed and verified to be correct.

My troubles started because of chance. I was doing my testing on the cluster server that uncovered the network performance issues. 21 gbps in one direction and 1 mbps in the other. On the other two cluster servers and some VMs I can get around 21 gbps via iPerf. The following deliver the best results.
iperf3.exe -P 8
iperf3.exe -P 8 -w 512M

1667319586346.png
 
Will be really interesting to see what the root cause is. I wonder if some fake goods got in the normal supply chain.
 
FMD! Reading this thread makes me so happy I no longer do performance testing of multi vendor firewalls.
 
Gonna necro this to provide the solution. It took months and three outside companies to figure it out. It was a corrupted Hyper-V vswitch. It took many months of gentle troubleshooting to find. Dell and Microsoft both exhausted all the tiers of support. Eventually telling me to start pulling machines from the cluster and begin deleting components. Adding it back to the cluster. Now I get 11-12 gigabit during most tests with an occasional 40-43 gigabit in there. There's allot of moving parts as it's a production environment so I'm happy with that.

There were no errors being thrown, every Windows data file integrity check came back clean, config was confirmed to be perfect every time. The vswitch was just broken and required being deleted.

1675704298833.png

1675704234540.png
 
Last edited:
Yes, thank you! Good to know that virtual hardware can also 'break' at times so you have to look at them as points of failure as well. :)
 
Back
Top