nicholasfarmer
Limp Gawd
- Joined
- Apr 19, 2007
- Messages
- 238
I need some extra Brains here.....
Hardware: ESXi Server with a ton of resources... CPU/Mem running below 10%
Intel x540-T2 - connected via Cat7 cable to Netgear ProSafe XS708E.
Running ESXi 5.5 build 1331820.
One VM on ESX Server with local SSD array (trust me, its not an IOPS or GBPS issue from the disk...)
VM is running Windows 2012 Server with a VMXNet3 nic driver.
Hardware V10 and tools are updated.
Four Physical Desktops with SSDs as local drive.
Test 1) I can start a file copy from the VM to each local desktop, one at a time, and get full saturation on the physical desktop NIC. (99%. 113MBps flat line) All four desktops can receive the full 1Gb stream. Each desktop has its own network port on the netgear switch.
Issue : When I start multiple flows (copy data to two PCs at once) the bandwidth goes to insanity. Network bandwidth will spike to 224Mbps then flop down to 5Mbps. Very random, very poor throughput. Its actually faster to copy one at a time then to attempt multiple flows.
Test 2) I found an updated driver for ESXi 5.5. A VIB LIST showed 5.5 came with driver ver 3.7.x.x and I updated it to ixgbe-3.15.1.8.
New issue: I can still hit saturation when doing a single flow to a single physical PC, but now when I attempt multiple flows (one flow per Physical PC), it perfectly balances the flows. 113MBps will drop down to two flows at 56Mbps to two PCs. Four file transfers will drop down to 28Mbps each to four PCs. So the random, up and down speed went away but now the NIC/Driver is balancing the flows at a 1Gb limit....
No QOS, im using standard switches so no vDS control issues. I want to keep adding more info to help narrow it down but if this gets too long you will all ignore it!
Please help! I hope someone has exp with this Nic and can help with a quick config or command to get me up to full bandwidth!
..
..
Update 1: I found Driver/VIB 3.18.7 and 3.19.1. Upgraded and tested both drivers with the same "balance" issue but now I can get 38.8 times four PCs....
MysticRyuujin : I would LOVE to do a 10Gb to 10Gb test but I do not have another 10Gb adapter hanging off of this switch. I do have the x540-T2 (two port) adapter so maybe I can wire something up to go out one Nic and in the other port for a 10Gb>10Gb test. I checked the switch firmware and its running the latest. As you said, its new so the firmware options are 1.00.06, 1.00.08, and 1.00.10 and im running .10. I've also reviewed release notes etc and do not see anything about this issue.
Update2: I created another VM on the host and installed Windows 7 (just a quick + different OS to test with) it has the same characteristics as the Win2012 server.
The VM is connected to the same vswitch etc.
This helps me know its not Windows Server jacking with me.
On to the iPerf testing!
Update 3 : iPerf!! (this will show you the balanced flows even on a single PC to PC connection)
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55956
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 628 MBytes 526 Mbits/sec
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55957
[ 5] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55958
[ 4] 0.0-10.0 sec 535 MBytes 449 Mbits/sec
[ 5] 0.0-10.0 sec 532 MBytes 446 Mbits/sec
[SUM] 0.0-10.0 sec 1.04 GBytes 895 Mbits/sec
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55959
[ 5] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55960
[ 6] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55961
[ 4] 0.0-10.0 sec 370 MBytes 309 Mbits/sec
[ 5] 0.0-10.0 sec 384 MBytes 322 Mbits/sec
[ 6] 0.0-10.0 sec 330 MBytes 277 Mbits/sec
[SUM] 0.0-10.0 sec 1.06 GBytes 905 Mbits/sec
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55965
[ 5] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55964
[ 6] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55962
[ 7] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55963
[ 4] 0.0-10.0 sec 261 MBytes 218 Mbits/sec
[ 5] 0.0-10.1 sec 256 MBytes 213 Mbits/sec
[ 6] 0.0-10.0 sec 310 MBytes 259 Mbits/sec
[ 7] 0.0-10.0 sec 279 MBytes 233 Mbits/sec
[SUM] 0.0-10.1 sec 1.08 GBytes 922 Mbits/sec
I found with iPerf, if I force each Physical PC to do three flows each to the server, I can reach 300+Mbps.
So, the bandwidth is there. I just need to find out why the Hypervisor or the x540 nic is chopping up my bits....
Update 5ish....
I think the issue is with the x540-T2 card. To check the Netgear switch, I connected Physical PCs to ports 1-6. I then moved 50GB files back and forth between the PCs all concurrently. Port 1>2, 3>4, 5>6 and I could get full 1Gbps throughput on each movement. When I move files to/from the PCs through the x540 card, its limited at 2.5Gbps.
Test3??) I placed the x540-T2 into a PC with windows 7. Installed the updated driver. installed some SSDs and attempted to move data between the PC and other PCs. Using multiple SSDs as source so I do not hit a storage limitation.
The card still maxed out at 2.5Gbps.
--
--
--
--
Update 8Jan2014:
I have a support ticket open with Vmware and Intel atm to help identify two things:
1 - (vmware) Why are the flows being balanced with a max of 1Gbps (1 flow = 1X 113Mbps | 2 flows = 2 X 56Mbps | 3 flows = 3 X 38 Mbps | 4 flows = 4 X 28Mbps) Perfect balance!
2 - (intel) Why can I not reach above 2.5-3Gbps on the adapter (When tested in ESX and full Windows build using RAM drives etc)
The second one sounds like the adapter is only using part of the PCIe lanes of the x8 slot OR negotiated down to PCIe v1.. but that is a guess.
Running Linux based iPerf testing now.. results to follow!
--
--
--
Update 9 Jan 2014
I ran a bunch of Linux based iPerf tests. Using my VM as a server and all six physical desktops as clients, I can "receive" over 6Gbps concurrently on the VM
But, When I made all desktops as iperf servers with different port numbers, then turned the VM into the client, I found I could not push more than 1.5Gbps "Transmit" out of the VM. I ran each client connection as a background process. The first client process hit 1Gbps instantly, the second process made the total network bandwidth spike to 1.2Gbps then it started to crash all over the place. Adding more client instances, changing the window size, number of parallel processes, etc did nothing.
Back to VMware to help research the performance issue now that I can reproduce the issue in many different Guest OS.
.
.
.
Update 16Jan2014
I was able to get my hands on another 10Gb network card and run more tests.
IPerf between VM and PC can run full 10Gb line rate both directions.
So this is some thing in the hypervisor or intel adapter and its driver.
Current issue:
vm to bare metal can run line rate if the destination contains a 10Gb adapter.
If the destination contains a. 1Gb adapter, bandwidth for the entire ESXi host is limited to 1Gbps (transmit out of the ESXi host)
Receive traffic has always worked at line rate.
UPDATE 24Jan2014:
I think I've found the issue. It appears to be an Intel driver issue.
- If I install ESXi 5.5 from the default ISO and do not update the driver for my Intel x540-T2 adapter, everything works fine. If I update the VIB/Driver to anything listed as 5.5 supported for the x540-T2 on the vmware site, it breaks.
Intel and Vmware are still working on the issue.
.
Thank you for your help and ideas!
Hardware: ESXi Server with a ton of resources... CPU/Mem running below 10%
Intel x540-T2 - connected via Cat7 cable to Netgear ProSafe XS708E.
Running ESXi 5.5 build 1331820.
One VM on ESX Server with local SSD array (trust me, its not an IOPS or GBPS issue from the disk...)
VM is running Windows 2012 Server with a VMXNet3 nic driver.
Hardware V10 and tools are updated.
Four Physical Desktops with SSDs as local drive.
Test 1) I can start a file copy from the VM to each local desktop, one at a time, and get full saturation on the physical desktop NIC. (99%. 113MBps flat line) All four desktops can receive the full 1Gb stream. Each desktop has its own network port on the netgear switch.
Issue : When I start multiple flows (copy data to two PCs at once) the bandwidth goes to insanity. Network bandwidth will spike to 224Mbps then flop down to 5Mbps. Very random, very poor throughput. Its actually faster to copy one at a time then to attempt multiple flows.
Test 2) I found an updated driver for ESXi 5.5. A VIB LIST showed 5.5 came with driver ver 3.7.x.x and I updated it to ixgbe-3.15.1.8.
New issue: I can still hit saturation when doing a single flow to a single physical PC, but now when I attempt multiple flows (one flow per Physical PC), it perfectly balances the flows. 113MBps will drop down to two flows at 56Mbps to two PCs. Four file transfers will drop down to 28Mbps each to four PCs. So the random, up and down speed went away but now the NIC/Driver is balancing the flows at a 1Gb limit....
No QOS, im using standard switches so no vDS control issues. I want to keep adding more info to help narrow it down but if this gets too long you will all ignore it!
Please help! I hope someone has exp with this Nic and can help with a quick config or command to get me up to full bandwidth!
..
..
Update 1: I found Driver/VIB 3.18.7 and 3.19.1. Upgraded and tested both drivers with the same "balance" issue but now I can get 38.8 times four PCs....
MysticRyuujin : I would LOVE to do a 10Gb to 10Gb test but I do not have another 10Gb adapter hanging off of this switch. I do have the x540-T2 (two port) adapter so maybe I can wire something up to go out one Nic and in the other port for a 10Gb>10Gb test. I checked the switch firmware and its running the latest. As you said, its new so the firmware options are 1.00.06, 1.00.08, and 1.00.10 and im running .10. I've also reviewed release notes etc and do not see anything about this issue.
Update2: I created another VM on the host and installed Windows 7 (just a quick + different OS to test with) it has the same characteristics as the Win2012 server.
The VM is connected to the same vswitch etc.
This helps me know its not Windows Server jacking with me.
On to the iPerf testing!
Update 3 : iPerf!! (this will show you the balanced flows even on a single PC to PC connection)
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55956
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 628 MBytes 526 Mbits/sec
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55957
[ 5] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55958
[ 4] 0.0-10.0 sec 535 MBytes 449 Mbits/sec
[ 5] 0.0-10.0 sec 532 MBytes 446 Mbits/sec
[SUM] 0.0-10.0 sec 1.04 GBytes 895 Mbits/sec
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55959
[ 5] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55960
[ 6] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55961
[ 4] 0.0-10.0 sec 370 MBytes 309 Mbits/sec
[ 5] 0.0-10.0 sec 384 MBytes 322 Mbits/sec
[ 6] 0.0-10.0 sec 330 MBytes 277 Mbits/sec
[SUM] 0.0-10.0 sec 1.06 GBytes 905 Mbits/sec
[ 4] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55965
[ 5] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55964
[ 6] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55962
[ 7] local 1.1.1.2 port 5001 connected with 1.1.1.3 port 55963
[ 4] 0.0-10.0 sec 261 MBytes 218 Mbits/sec
[ 5] 0.0-10.1 sec 256 MBytes 213 Mbits/sec
[ 6] 0.0-10.0 sec 310 MBytes 259 Mbits/sec
[ 7] 0.0-10.0 sec 279 MBytes 233 Mbits/sec
[SUM] 0.0-10.1 sec 1.08 GBytes 922 Mbits/sec
I found with iPerf, if I force each Physical PC to do three flows each to the server, I can reach 300+Mbps.
So, the bandwidth is there. I just need to find out why the Hypervisor or the x540 nic is chopping up my bits....
Update 5ish....
I think the issue is with the x540-T2 card. To check the Netgear switch, I connected Physical PCs to ports 1-6. I then moved 50GB files back and forth between the PCs all concurrently. Port 1>2, 3>4, 5>6 and I could get full 1Gbps throughput on each movement. When I move files to/from the PCs through the x540 card, its limited at 2.5Gbps.
Test3??) I placed the x540-T2 into a PC with windows 7. Installed the updated driver. installed some SSDs and attempted to move data between the PC and other PCs. Using multiple SSDs as source so I do not hit a storage limitation.
The card still maxed out at 2.5Gbps.
--
--
--
--
Update 8Jan2014:
I have a support ticket open with Vmware and Intel atm to help identify two things:
1 - (vmware) Why are the flows being balanced with a max of 1Gbps (1 flow = 1X 113Mbps | 2 flows = 2 X 56Mbps | 3 flows = 3 X 38 Mbps | 4 flows = 4 X 28Mbps) Perfect balance!
2 - (intel) Why can I not reach above 2.5-3Gbps on the adapter (When tested in ESX and full Windows build using RAM drives etc)
The second one sounds like the adapter is only using part of the PCIe lanes of the x8 slot OR negotiated down to PCIe v1.. but that is a guess.
Running Linux based iPerf testing now.. results to follow!
--
--
--
Update 9 Jan 2014
I ran a bunch of Linux based iPerf tests. Using my VM as a server and all six physical desktops as clients, I can "receive" over 6Gbps concurrently on the VM
But, When I made all desktops as iperf servers with different port numbers, then turned the VM into the client, I found I could not push more than 1.5Gbps "Transmit" out of the VM. I ran each client connection as a background process. The first client process hit 1Gbps instantly, the second process made the total network bandwidth spike to 1.2Gbps then it started to crash all over the place. Adding more client instances, changing the window size, number of parallel processes, etc did nothing.
Back to VMware to help research the performance issue now that I can reproduce the issue in many different Guest OS.
.
.
.
Update 16Jan2014
I was able to get my hands on another 10Gb network card and run more tests.
IPerf between VM and PC can run full 10Gb line rate both directions.
So this is some thing in the hypervisor or intel adapter and its driver.
Current issue:
vm to bare metal can run line rate if the destination contains a 10Gb adapter.
If the destination contains a. 1Gb adapter, bandwidth for the entire ESXi host is limited to 1Gbps (transmit out of the ESXi host)
Receive traffic has always worked at line rate.
UPDATE 24Jan2014:
I think I've found the issue. It appears to be an Intel driver issue.
- If I install ESXi 5.5 from the default ISO and do not update the driver for my Intel x540-T2 adapter, everything works fine. If I update the VIB/Driver to anything listed as 5.5 supported for the x540-T2 on the vmware site, it breaks.
Intel and Vmware are still working on the issue.
.
Thank you for your help and ideas!
Last edited: