Arista switch stacking or MLAG?

KapsZ28 · Aug 26, 2014

We recently received Arista 7050S-52 switches which do not have stacking capability. I plan on using two of them for our ESXi hosts. Each host will have six 10GbE ports used for Storage, Access, Management, and vMotion. Being that I am not really a network guy, I never even heard of MLAG until reading through the specs of the switch. Based on some brief research it sounds like it is basically the same as stacking as it allows you to connect two logical switches to act as one logical switch for the purpose of L2 protocols such as STP or LACP. This is perfect and what I wanted to do with the switches to begin with but had no clue it could be accomplished without stacking cables.

So, the main question is, am I right that using MLAG will create a single logical switch just like switches that support stacking?

Second question, would I use ports 49, 50, 51, and 52? Although they are separate from the other 48 ports, they don't appear to be uplinks and seem to just be standard 10Gb ports like the rest.

Third question, is using four ports for 40Gbps throughput too much? Each switch supports up to 1.04Tbps of throughput, so 40Gbps seems insignificant. Not to mention I have seen 1Gb switches that support up to 80Gbps with Full duplex stacking cables.

MysticRyuujin · Aug 26, 2014

I would imagine that MLAG is the same as Cisco's vPC technology. It will not create a single logical switch, like a stack would, instead it will APPEAR as one switch for all intents and purposes to the downstream device. You will still have to manage them as two individual switches though. But as far as LACP / Teaming is concerned it will be one switch.

Basically your two switches will be connected together with peer links (these should be redundant 10Gb links at a minimum), and they will exchange information related to their MLAG configuration.

In the Cisco world you'd have Peer Links and a Keep-Alive link, but you'll have to research the Arista config of an MLAG to determine what they want. I'd imagine you could use whatever ports you wanted to but again look at their documentation.

I've never seen any recommendations about required vPC throughput other than the recommendation of redundant 10Gb links. 40Gb might be overkill, you could always set it up with 20Gb and monitor it and expand it if needed.

berky · Aug 27, 2014

MysticRyuujin is correct. MLAG (aka MC-LAG) is the more generic term for what Cisco refers to as vPC.

The switches themselves are managed separately, but the connecting devices think they have a LAG connection to a single switch.

for question 2, I *believe* that all ports are the same and not 'reserved' for any special purpose.

3) don't get in the habit of using LAG for bandwidth upgrades. While it does work (in general), you can get into situations where you are only pushing 12Gb of data through 2 10Gb links in a LAG and having dropped packet issues. This is due to the switch's hash algorithm and how it determines which packets to push out a particular physical link within a LAG. It's not a 'round robin' algorithm, nor does it try to do anything 'smart' and figure out which link still has bandwidth on it. It is nothing more than a hashing algorithm.
So, if you are saying about using 4x 10Gb links for a single LAG to 2 switches (2 to switch 1, 2 to switch 2), you are really saying that you have a theoretical max of 40 Gb of throughput, but if 1 of your switches dies, you have to be able to support at least 10Gb, but have a theoretical max of 20Gb. Realistically, that 40Gb of potential is reduced to probably in the range of 12-18 Gbps in a failure scenario. Design your LAG for failure/redundancy, not for bandwidth. if you absolutely need higher bandwidth (for critical applications), use something with 40/100 Gb NICs. If you can accept the risk of oversubscription, you'd be fine with the 10Gb links.

KapsZ28 · Aug 27, 2014

So yeah I keep seeing that in the VMware world LAG is not really the best solution for load balancing.

But as far as connecting two 10Gb switches together that do not have a stacking solution, should I still be using LAG or MLAG or simply just connect the two switches together via a single 10Gb port?

berky · Aug 27, 2014

to connect the switches together, it's just a standard LAG. you want the redundancy. it's not MLAG because it's 1 switch to 1 switch.

aredubya · Aug 27, 2014

Good stuff here, and accurate all around. To clarify a couple of points, using Arista nomenclature:

- MLAG peering is how Arista EOS "stacks" a pair of switches to appear as a single logical switch. As noted, the switches still maintain their own control plane and configurations, with some layer 2 functions exclusively managed by one switch in the pair (e.g. STP, LACP). Let's call this the MLAG peer pair.

- MLAG port channels are defined on the MLAG peer pair, connected to upstream (routers/other switches) or downstream (host) devices. Presuming this is a top of rack (TOR) pair that'll connect to hosts, you'd set up port-channels to each host on each switch, adding a matching MLAG port-channel on both switches in the pair. This ID is what logically tells the pair to treat the port-channel as connecting to one upstream/downstream device. Of course, one could connect another MLAG peer pair upstream/downstream, even doing a full mesh of links in the port-channel for complete redundancy and substantial aggregate bandwidth. As cited though, your max throughput through any port-channel is limited to the speed of any individual link. With lots of sessions though, you can make use of all the links in a port-channel, thanks to the magic of hashing.

- The link between an MLAG peer pair is referred to as the peer link. Under ideal circumstances, it's only used to pass traffic that can't otherwise go north-south towards upstream switches/routers to the north, or MLAG connected hosts to the south. If one loses a link to a host or upstream device, the peer link can act as a backup path, albeit at the latency expense of the extra switched/routed hop.

The Arista User's Manual has a lengthy chapter on how to configure MLAG, the peer link, and MLAG port-channels, as well as L2 vs. L3 redundancy steps. If you're an Arista customer with a current support contract, feel free to log in at https://www.arista.com/en/user-registration to get an account. If your company hasn't already registered other users, you can drop a line to support (at) arista (dot) com (obfuscated!), citing your switch's serial number, and we'll get you set with customer privileges to download documentation and software as need be, as well as answer any other technical questions. Hope this helps.

Source: I'm an Arista TAC engineer and occasional [H]ardocp lurker.

KapsZ28 · Aug 27, 2014

aredubya said:
Good stuff here, and accurate all around. To clarify a couple of points, using Arista nomenclature:

- MLAG peering is how Arista EOS "stacks" a pair of switches to appear as a single logical switch. As noted, the switches still maintain their own control plane and configurations, with some layer 2 functions exclusively managed by one switch in the pair (e.g. STP, LACP). Let's call this the MLAG peer pair.

- MLAG port channels are defined on the MLAG peer pair, connected to upstream (routers/other switches) or downstream (host) devices. Presuming this is a top of rack (TOR) pair that'll connect to hosts, you'd set up port-channels to each host on each switch, adding a matching MLAG port-channel on both switches in the pair. This ID is what logically tells the pair to treat the port-channel as connecting to one upstream/downstream device. Of course, one could connect another MLAG peer pair upstream/downstream, even doing a full mesh of links in the port-channel for complete redundancy and substantial aggregate bandwidth. As cited though, your max throughput through any port-channel is limited to the speed of any individual link. With lots of sessions though, you can make use of all the links in a port-channel, thanks to the magic of hashing.

- The link between an MLAG peer pair is referred to as the peer link. Under ideal circumstances, it's only used to pass traffic that can't otherwise go north-south towards upstream switches/routers to the north, or MLAG connected hosts to the south. If one loses a link to a host or upstream device, the peer link can act as a backup path, albeit at the latency expense of the extra switched/routed hop.

The Arista User's Manual has a lengthy chapter on how to configure MLAG, the peer link, and MLAG port-channels, as well as L2 vs. L3 redundancy steps. If you're an Arista customer with a current support contract, feel free to log in at https://www.arista.com/en/user-registration to get an account. If your company hasn't already registered other users, you can drop a line to support (at) arista (dot) com (obfuscated!), citing your switch's serial number, and we'll get you set with customer privileges to download documentation and software as need be, as well as answer any other technical questions. Hope this helps.

Source: I'm an Arista TAC engineer and occasional [H]ardocp lurker.

So you work for Arista? If so, how is your new lab coming along in NJ?

Brak710 · Aug 27, 2014

Probably doesn't apply in your case, but a word of warning about Arista MLAGs...

A large investment firm had a failed supervisor lead to a total double blade chassis outage when MLAG "rebuilding" spiked the CPUs on both chassis, leading to all traffic being dropped. The only way to get traffic flowing again was power cycling both switches.

They eventually confirmed MLAG to be bugged in some manner, but I have no idea if it was fixed in later Arista code releases.

aredubya · Aug 27, 2014

KapsZ28 said:
So you work for Arista? If so, how is your new lab coming along in NJ?

I can neither confirm nor deny the existence of such a lab - and frankly, we're a large enough company nowadays that I wouldn't necessarily know about such a buildout anyhow. Feel free to drop a line to support (at) arista (dot) com with any specific buildout questions or technical issues, and if you like, ask for Randy to review the issue (that's me - large company or not, there's still just the one Randy in support

).

DragonNOA1 · Aug 27, 2014

Brak710 said:
A large investment firm had a failed supervisor lead to a total double blade chassis outage when MLAG "rebuilding" spiked the CPUs on both chassis, leading to all traffic being dropped. The only way to get traffic flowing again was power cycling both switches.

They eventually confirmed MLAG to be bugged in some manner, but I have no idea if it was fixed in later Arista code releases.

When did this happen? I'll bring it up to our Arista rep and SE.

Arista switch stacking or MLAG?

KapsZ28

2[H]4U

MysticRyuujin

Limp Gawd

berky

2[H]4U

KapsZ28

2[H]4U

berky

2[H]4U

aredubya

n00b

KapsZ28

2[H]4U

Brak710

[H]ard|Gawd

aredubya

n00b

DragonNOA1

Supreme [H]ardness