UCS and Multi-NIC VMotion

Joined
May 22, 2006
Messages
3,270
How have you guys been setting this up?

One of the advantages of UCS is that if architected correctly, we can ensure VMotion traffic will stay local to the Fabric Interconnects. Prior to vSphere 5, we'd pin VMotion to Fabric A or B only with one vmnic as active and the other as standby.

However, doing the same thing with vSphere 5.X and multi-NIC VMotion is more problematic.

In an ideal scenario, we'd like to continue to have all VMotion traffic local to the Fabric Interconnects and not traversing the client's LAN. In testing, I've found there's simply no way to do this while adhering to VMware's best practice of putting both VMotion vmkernels on the host in the same subnet and VLAN. If we do that, the vmkernel on Fabric A will try to talk to the vmkernel on Fabric B and the VMotion traffic has to go north into the LAN.

The only way I've found to prevent this is to use two subnets for VMotion: one for Fabric A and one for Fabric B. While this works, it falls outside of VMware best practices. If I use the same subnet but try to prevent Fabric A vmkernels from connecting to Fabric B vmkernels, all VMotions fail.

I've heard of some people bucking best practices and using different subnets, some people go ahead and use one subnet and don't worry about the VMotion traffic going north of the FIs.

What do you guys prefer in your implementations?
 
Do you really need multi-NIC vMotion when you have 10Gb? It was mainly meant for 1Gb.
 
I have multi-10GBE NIC vMotion setup at our datacenter, you wouldn't believe how fast it can evacuate 40 VM's when you throw it in maintenance mode :) though I wouldn't say it's needed.

Not using UCS though so this isn't a problem for me. I'd just make separate subnets for vMotion-1 and vMotion-2, it's still redundant in the fact that if one group fails you can still vMotion over the other but in-progress vMotions will time out and have to start over.

I've run into the issue you're experiencing with isolating traffic via another problem, we had vMotion and Management on the same subnet but traversing separate isolated switches, even though the source and destination of the vMotion was over the same physical network, ESXi will try to initiate the conversation for the data move over the first available VMK with a matching subnet, even if vMotion is not ticked as a feature
 
Last edited:
No, I wouldn't say it's needed but it is preferred. It can shave a few minutes off evacuating a host with a couple hundred GB of guest VM RAM.

If there's a way to make it work without going northbound while staying within VMware best practices I'd like to standardize on that rather than pinning VMotion to only one Fabric. Not that it won't work, but it's a lot more sexy to show how the UCS can keep even multi-NIC VMotion constrained to the FIs. :)
 
I cant think of a way to do it, bascially vmk0 and vmk1 on host1 will both try to contact vmk0 on host2 for the initial handshake of the vmotion traffic as it's all within the same subnet.

when vmk0 is talking to vmk1 it's traversing the outside switches. I forget with UCS but do you not link the two chassis switches together? (with a 40GBE QSFP+ cable)

Just to clarify, each of the vmkernal ports are assigned on the same vSwitch?

vSwitch-0
vMotion-1 vmk -> vmnic0 Active vmnic1 Passive
vMotion-2 vmk -> vmnic1 Active vmnic0 Passive

Also try it this way for giggles

vSwitch0
vMotion-1 vmk -> vnic0

vSwitch1
vMotion-2 vmk -> vnic1



EDIT:....now that i think about it, your probably using the distributed switch for this?
 
Last edited:
Do you really need multi-NIC vMotion when you have 10Gb? It was mainly meant for 1Gb.

Hell ya! You don't work a lot of maintenance do you? =) Personally I don't like watching a couple thousand desktops/servers vMotion any longer than i have to in order to patch hosts.


I haven't actually used any UCS stuff but I think it's somewhat similar to HPs virtual connect so what I do for vMotion to stay in the chassis is to get rid of the A and B side thought and create a network on the virtual connect with no uplinks that is assigned to both vMotion nics. That way it creates a layer 2 network across the interconnects that doesn't have to go out of the chassis. Not sure if you can do something similar with UCS.
 
I forget with UCS but do you not link the two chassis switches together? (with a 40GBE QSFP+ cable)

With UCS, the Fabric Interconnects are not connected via a standard ISL, it is merely a couple of crossovers for cluster and replication of UCS-M. This is the reason why you don't cross connect server side connections. Fabric A handles all traffic from one IOM and Fabric B from the other IOM.


As for CoW''s ?, I would rarely want my vMotion traffic going northbound especially since I've seen most customers that really don't have an aggregate layer inbetween their FI's and their CORE. Usually, we deal with smaller sized customers that are connecting SAN directly to FI's and LAN directly from FI's to CORE.

To add on to your ? if you had an Aggregate northbound from the FI's would it then be more acceptable?
 
Last edited:
With UCS, the Fabric Interconnects are not connected via a standard ISL, it is merely a couple of crossovers for cluster and replication of UCS-M. This is the reason why you don't cross connect server side connections. Fabric A handles all traffic from one IOM and Fabric B from the other IOM.


As for CoW''s ?, I would rarely want my vMotion traffic going northbound especially since I've seen most customers that really don't have an aggregate layer inbetween their FI's and their CORE. Usually, we deal with smaller sized customers that are connecting SAN directly to FI's and LAN directly from FI's to CORE.

To add on to your ? if you had an Aggregate northbound from the FI's would it then be more acceptable?

We typically connect up the FIs to a pair of Nexus 5k's when we can, each FI being cross connected to the stacked 5k's via VPC. This means there's at least 40Gb of bandwidth even if the VMotion has to go north. Again, not the end of the world if the VMotion does go through the 5k's, it's just not ideal.

From what I'm seeing, the only option is to buck VMware best practice and use two subnets. Only that will guarantee VMotion traffic is restricted to the FIs. Maybe in a future release of vSphere multiple subnet VMotion will be supported.
 
Back
Top