Vmware - vDistributed Switch - Load Based Policy

nicholasfarmer · Oct 26, 2015

While the reach of this post might be small, I know a few important ones do read them.

Does anyone use Distributed Switches with port groups configured with the Load Based policy and on vSphere v6?

If you do....I have reason to believe that you should change to a different load balance policy ASAP.

Edit:
I love the Physical Nic load based policy. I have it set every where, even my home lab. Until I hit a bug....
This bug does not exist in 5.5.
I cant find the exact KB (if its even published yet.)
Until then, anyone on vSphere 6 (ESXi 6) and you are running "Physical NIC load" based load balance policy, please reach out to your support folks or TAM about the current issue/bug.
Here is a quick KB about the different policies:
http://kb.vmware.com/kb/1004088
I want to say specifics but I don't know if this is supposed to be a dumb secret where customers have their v6 deployments explode and call support.....
It could also have some specific hardware associated to it but "I" don't think it does.
I'll update again if support can provide me with the KB and details.

Edit #2:
I was approved to share!
http://kb.vmware.com/kb/2124725
Important Snip from above URL : "•When using load balancing based on physical NIC load on VDS 6.0, if one of the uplinks is disconnected or shut down, failover is not initiated."
-- The issue is fixed in v6.0u1

Let me tell you that using "physical Nic load" will cause you more problems than reported in that KB. I've had hosts crash/PSOD, vpxa chain crash so the host drops out of vCenter and VMs HA, and the best case was VMs drop off the network because a vDS uplink was removed and VMs did not fail over to another usable link.
Your mileage may vary. I had the enjoyment of replacing 65 corrupted vDistributed Switches so I figure I would share that entertainment.
-- Again.. this is not a bash against using Physical Nic Load.... I'm just saying its broken atm so please patch or use something else until you are patched.

iamwhoiamtoday · Oct 26, 2015

Could you please explain your reasoning?
Please supply documentation / links as to why said change is required.

BillPaxtonIsGod · Oct 26, 2015

Route based on physical nic load? Aka Load based teaming?

If so, why are you poo pooing it? It's not the default but you give no information which makes it difficult to believe there is a real problem with it.

DermicSavage · Oct 26, 2015

Yea context would be useful here.

I have 13 hosts in my datacenter all using load based teaming. It's very effective but does have its caveats. If you are running vxlan or nfs then it's really not ideal, but it works.

Uneven switch port configs can be a nightmare to this type of team. Good thing there's a health check function built into vDS to tell you that. You can turn it on via the web client

KapsZ28 · Oct 26, 2015

DermicSavage said:
Yea context would be useful here.

I have 13 hosts in my datacenter all using load based teaming. It's very effective but does have its caveats. If you are running vxlan or nfs then it's really not ideal, but it works.

Uneven switch port configs can be a nightmare to this type of team. Good thing there's a health check function built into vDS to tell you that. You can turn it on via the web client

What kind of NIC teaming do you recommend for NFS?

DermicSavage · Oct 26, 2015

LACP is better for NFS. Reason being that you only have a single vmk for nfs access ( so long as you don't want to make a huge mess of your storage networking). So you need to load balance on the networking side. Access to each NFS server is more likely to balance across multiple physical NICs with LACP as it will distribute based on IP/MAC/PORT. The next best would be to use Etherchannel, but it's algorithm to distribute the connection is less dynamic

DermicSavage · Oct 26, 2015

Nicholas, i saw your edit. Can you elaborate a little more regarding the symptom? My datacenter is still on 5.5 so I'm very interested in hearing about any bugs on 6. What version/build is it?

nicholasfarmer · Oct 27, 2015

For those that scroll to the bottom... I added my break/fix to the top post.

DermicSavage · Oct 27, 2015

Thanks Nick. We are on the edge of upgrading to vSphere 6 in our data center since 6u1 came out (VSSP licensing is awesome)
Guess we can wait some more. I hate seeing nasty bugs like this keep popping up

defuseme2k · Oct 28, 2015

nicholasfarmer said:
For those that scroll to the bottom... I added my break/fix to the top post.

To be clear, 6.0u1 completely fixes the issue correct? This sounds more like the ESXi side of it than it does the vCenter Server. So... I assume when you say it is fixed in 6.0u1 you mean ESXi? We'll do both, but our hosts will lag by some amount.

Vmware - vDistributed Switch - Load Based Policy

nicholasfarmer

Limp Gawd

iamwhoiamtoday

Limp Gawd

BillPaxtonIsGod

n00b

DermicSavage

[H]ard|Gawd

KapsZ28

2[H]4U

DermicSavage

[H]ard|Gawd

DermicSavage

[H]ard|Gawd

nicholasfarmer

Limp Gawd

DermicSavage

[H]ard|Gawd

defuseme2k

[H]ard|Gawd