ESX Network Design - Proposed design, did I miss anything?

RiDDLeRThC

2[H]4U
Joined
Jun 13, 2002
Messages
3,963
So below is the proposed design for our new ESX Network. The network will have 13 ESX Servers in all.

We currently are not using VLAN's so this design will add that on top of redundancy in our Physical Network Layout.

Our current VM's are running on VMware Server with each server connected to all the subnets. The block at the top was just to try and explain things to another coworker so that can be ignored unless you see something wrong.

Click on the thumbnail for a larger view.


 
Last edited:
Good. Do VLANs. I hate to see non-VLAN'd ESX designs. Drives me nuts. Why Active/Standby on the NICs and not Active/Active? What type of switches are you connecting up to? What level of vSphere licensing do you have? Why aren't you splitting out traffic types on the NICs? Looks like you're doing 3 active and 3 passive and just pushing everything on all NICs. Why? Why not split out two for MGMT/vMotion/FT? Set all to Active. Set the proper hashing mechanism depending on your uplevel switch.

There are very few cases where doing active/standby NIC configuration makes sense.
 
Also, if you're doing 10 NICs like this just go 10Gb. If you already have the gear bought that's fine..but if not let's talk 10Gb.
 
No 10G right now, SAN supports it but we didn't put the cards in at this time. there is actually 2 additional ports on each of the controller cards that aren't shown on this digram just to keep it clean but they will be connected to the two backend switches.


The switches are Powerconnect 6224, Licensing is Enterprise.


The front end "General" network connectivity is where I am a little unsure, I wanted to make sure that if we lost one of the "PROD" switches we would be okay redundancy wise. Also worried about if we were to add any additional subnets in the future how to handle connecting those into the two PROD switches. Figured I would just VLAN tag all the traffic but I don't have much experience with this part. This setup in in a colo so access is limited.

As for the MGMT/vMotion/FT traffic I planned on putting on the backend switches within a VLAN.
 
Don't put vMotion/FT on the backend (I assume that means the iSCSI) NICs. You don't want vMotion and FT competing with storage traffic. My suggestion is to split out one NIC on each host and use it for those things. Or two..but I'm assuming they won't be used for normal management. They can be non-routed VLANs for FT and vMotion. 6 NICs is a lot for front-end VM traffic. My guess is that it's way overkill. So pull 2 and use the for vMotion/FT and Management (if you want..I normally do). That leaves 4 in Active/Active for VM traffic. Don't do Active/Passive.

I assume the EMC array is an AX4? If it's an NS or NX look at NFS instead of iSCSI. Can explain why if you have the capability.
 
The array is a CX4-120

So pull 2 nics out of the front end general for vmotion/ft and put that on the two front end switches. Should I separate those onto their own switches? While I don't want vMotion and FT completing with Storage traffic, I also don't want it competing with general network traffic. The switches look spec wise to be pretty beefy so maybe its not a concern.

As for the other 4 nics, you are saying that to not tag them for all the VLAN traffic and assign one to each subnet? What can I do to prevent a single point of failure?
 
Just do a separate VLAN for vMotion/FT. I wouldn't do separate switches..no need there. Are your switches in a stack? If so you can port-channel multiple NICs and use IP-based hashing for better load balancing.

No. Tag everything. Take all 4 of those. Put them in a single vSwitch. Make a port-group for each VLAN w/ tagged. Set all 4 NICs to active. If you can stack the switches put all 4 in a port-channel and use IP-based hashing on both ends.

Same thing for the 2 NICs for vMotion/FT.
 
Not to butt in here, are you saying team the 4 nics under 1 vswitch and create port groups w/vlans off of that?
By doing that you still are using the full 4 in an Active/Active configuration, VLANS on the virtual port groups so if one goes down..no biggie? Am I reading this correctly..still learning..lol..actually going through the Networking section in my studies right now.
 
Yes, we do have sacking modules (48Gbps) for all the switches (both front and back). I did not plan on using them in the backend switches.

based off everything, here is the revised visio based off your suggested changes. Did I miss anything?



 
Last edited:
Not to butt in here, are you saying team the 4 nics under 1 vswitch and create port groups w/vlans off of that?
By doing that you still are using the full 4 in an Active/Active configuration, VLANS on the virtual port groups so if one goes down..no biggie? Am I reading this correctly..still learning..lol..actually going through the Networking section in my studies right now.

Yes. You're basically giving all VLANs over those NICs equal priority. If any single (or multiple in this case) fails the others pick up the load. If you need or want to physically isolate traffic you can split off NICs, but he doesn't need or want that.
 
Yes, we do have sacking modules (48Gbps) for all the switches (both front and back). I did not plan on using them in the backend switches.

based off everything, here is the revised visio based off your suggested changes. Did I miss anything?

Much better and more robust. Let me know if you have any other questions.
 
If anyone is interested, this will be the final visio design that I will be presenting in a meeting tomorrow with our other sysadmin that has no virtualization background and our network admin that has a general understanding. Hopefully it all goes well.



 
I assume the EMC array is an AX4? If it's an NS or NX look at NFS instead of iSCSI. Can explain why if you have the capability.
I am curious about this. Do tell why NFS rather than iSCSI. Is this specific to EMC arrays?
 
What are you using to connect your SAN to the Back end switch, then to your esx servers? What hardware are you using for ESX? I am planning a similar deployment and I'm wondering what other people have used.
 
What are you using to connect your SAN to the Back end switch, then to your esx servers? What hardware are you using for ESX? I am planning a similar deployment and I'm wondering what other people have used.

SAN is connected to the two PowerConnect 6248's via CAT6. ESX Servers are also connected via CAT6 to the back-end and front-end switches. ESXi boxes are Dell PowerEdge 2950 III booting of internal USB drives. Each server is dual processor quad core, 32gb RAM. SAN is a CX4-120 with 45 FC drives.
 
I was more wondering the speed of the connections, but you basicaly told me there.

Thanks for all that information! It helps me out a lot.
Posted via [H] Mobile Device
 
I was more wondering the speed of the connections, but you basicaly told me there.

Thanks for all that information! It helps me out a lot.
Posted via [H] Mobile Device

Yeah right now everything is GbE with the plan to jump to 10GbE in a year or two if needed on the SAN. I don't think we will end up going that route anytime soon with the way our IOPS look so far.
 
I'm curious...I see that you'll be booting off of USB..and i've seen this mentioned on several occaisions surrounding ESXi and embedded, I can see that in a Lab environment, but wouldn't you want redundancy..or is it that ESXi is so easy to install and setup..it's moot?

What's really the best practice in an Enterprise for ESXi installation? I always thought small mirrored array or boot from SAN?
 
I'm curious...I see that you'll be booting off of USB..and i've seen this mentioned on several occaisions surrounding ESXi and embedded, I can see that in a Lab environment, but wouldn't you want redundancy..or is it that ESXi is so easy to install and setup..it's moot?

What's really the best practice in an Enterprise for ESXi installation? I always thought small mirrored array or boot from SAN?

Good question, we are looking at the power savings (hopefully) and we do have spare usb drives (2) in our colo for them to swap if needed. ESXi is a very simple install (takes about 5-10 mins tops from CD). We have a quite a few ESXi servers so if one was to fail the redundancy would kick in and wouldn't effect us too much.

I don't know what the best practices are for an Enterprise ESXi installation, I do remember my dell rep saying that not too many people buy the servers with USB embedded installs.

I would be interested in hearing from someone else that does deal with ESXi USB installs on a larger scale.
 
We do a lot of ESXi boot via USB in enterprise deployments. Also do boot from SAN..and some people use internal disks. Remember that ESXi is basically disposable. Even a manual install is a quick process..even faster with something like UDA (Ultimate Deployment Appliance) and host profiles.

It just depends what you want to do. SAS disks aren't cheap so people are pulling them whenever possible.
 
Makes perfect sense. The environment that I was thrown in to..the previous admin setup boot from SAN via Fiber and gave each ESX host 60GB..i was like..what the heck do they need 60GB for...just a waste especially when it is FC SAS.

I'm in the process of planning the upgrade from VI 3.5 to 4.1. We have a Dell m1000e blade encloure. I was talking to my Dell rep yesterday and he was telling me that they have flash cards that we could install ESXi on...I thought that was interesting..but if it's easy to recover..we may forgo boot from SAN to save on storage.
 
Well my Director nixed the 10 nics per server, so i'm down to 6 but the design did win the initial approval. I adjusted things to reflect just having 6 ports.

Only downfall now is that the vMotion/FT/MGMT traffic will be using the two onboard ports. The subnet traffic and the iscsi traffic will be put on 2 dual port network cards for nic redundancy. Honestly if the onboard NIC's were to fail the motherboard most likely would be having a problem anyways so I don't know if its that big of a deal.



 
NetJunkie - Maybe you can help with this question. We are now debating how best to break up the 45 disk in the storage array.

I know there is some considerations when doing just one large diskgroup/lun.

Also should it be a 45 disk RAID10, two RAID10, etc. Our Director is sold on it being a RAID10 so that won't change. We need it anyways for the IOPS.
 
If it were me i would make several small LUNs. Although this does require more management it does yield better performance.

First, IOPS does not scale linearly with the # of Disks (e.g. a 20disk RAID10 is not going to be 2x as fast as a 10drive RAID10). With multiple LUNs you can also spread out the disk contention. So if you have a SQL server you can isolate it to a different LUN so when it is getting hit hard it has Zero Impact on your other VMs. Yea you can set priority via Disk Shares but with one massive LUN every VMs I/O is going to cause contention with all of the other VMs IO requests.

Secondly, if you have multiple LUNs you have more flexibility because multipathing policies and disk shares are set per LUN, also if you are going to be running any guests that use the Microsoft Cluster Service, it requires that each cluster disk resides on its own LUN.

Also, I know you mentioned your boss said that he wanted RAID10, but with multiple LUNs you could have a LUN for your test environment that could be RAID5 or even RAID0 (i am assuming you have a solid backup solution). This would provide you significant space savings for those VMs that are not critical to the business's operation.

Just my .02
 
(i am assuming you have a solid backup solution).

This right here is an interesting question. What are you using for a backup solution for your VMs and how is it implemented/what VLANs and links will it be going over?
 
First, networking. No big deal on dropping a couple NICs. 10 is a lot, but I see that many a lot especially in iSCSI and NFS installs but usually that many aren't required. What I'd do in this case is take the four NICs you have and put them in one vSwitch. Do separate port groups w/ VLAN tagging for VM traffic, vMotion, FT, and Management. Then within the vSwitch you can override the failover policy. So for the VM traffic you can have NICs 0, 1, and 2 be active and 3 be standby. For vMotion, FT, and Management you can have NIC 3 be active and the rest standby. This way you get 3 NICs for VM traffic, 1 NIC for FT/vMotion but you have the ability to fail a NIC over if required. This way you aren't "wasting" a NIC as a second NIC in the FT/vMotion/MGMT vSwitch. I can do screenshots of this if you need it.

Storage. First, is your CX4-120 up to FLARE30? If not get that scheduled with EMC. They'll send a CS guy out to do it for you. You want that so you get VAAI support. You want VAAI so you can put more VMs on a datastore without excessive LUN locking. Then listen to Nitro. RAID10 is fine...but do you need it for everything? Have you done some perfmon analysis (or whatever for the OS you use) to see your 95th percentile IOPS requirement? Do you have a tiered breakdown of your proposed VMs? Using this and the separated LUN layout that Nitro mentions you can figure out your storage layout and design. Spend some time here. This can make or break your virtualization project.

LUN sizes are up to you. We used to do 500GB and 750GB LUNs (and therefore datastores) to keep customers from putting too many VMs on a single datastore..which causes other performance issues. Now with VAAI that pretty much goes away so I have no problem with the max datastore size (2TB-512 bytes). It comes down to RAID Group and LUN design at that point. Do a good job balancing LUNs across the two storage processors, make sure you balance the paths for the LUNs across the front-end ports. I see that a lot..everything goes thorugh A0 and B0..nothing on A1 and B1.

Once you get moving let me know and I can do a quick health assessment on the array for you. Make sure everything is balanced well. I'll offer this to anyone else using EMC storage as well, BTW.
 
Last edited:
First, networking. No big deal on dropping a couple NICs. 10 is a lot, but I see that many a lot especially in iSCSI and NFS installs but usually that many aren't required. What I'd do in this case is take the four NICs you have and put them in one vSwitch. Do separate port groups w/ VLAN tagging for VM traffic, vMotion, FT, and Management. Then within the vSwitch you can override the failover policy. So for the VM traffic you can have NICs 0, 1, and 2 be active and 3 be standby. For vMotion, FT, and Management you can have NIC 3 be active and the rest standby. This way you get 3 NICs for VM traffic, 1 NIC for FT/vMotion but you have the ability to fail a NIC over if required. This way you aren't "wasting" a NIC as a second NIC in the FT/vMotion/MGMT vSwitch. I can do screenshots of this if you need it.

That screenshot would be really useful, I think i know how you have it setup, but I want to make sure.
 
First, networking. No big deal on dropping a couple NICs. 10 is a lot, but I see that many a lot especially in iSCSI and NFS installs but usually that many aren't required. What I'd do in this case is take the four NICs you have and put them in one vSwitch. Do separate port groups w/ VLAN tagging for VM traffic, vMotion, FT, and Management. Then within the vSwitch you can override the failover policy. So for the VM traffic you can have NICs 0, 1, and 2 be active and 3 be standby. For vMotion, FT, and Management you can have NIC 3 be active and the rest standby. This way you get 3 NICs for VM traffic, 1 NIC for FT/vMotion but you have the ability to fail a NIC over if required. This way you aren't "wasting" a NIC as a second NIC in the FT/vMotion/MGMT vSwitch. I can do screenshots of this if you need it.

This is exactly what I was thinking of doing this morning when driving in. Pretty sure I have an understanding on exactly how to do it but screenshots never hurt if you don't mind. I'm sure others would love to see it also.

Storage. First, is your CX4-120 up to FLARE30? If not get that scheduled with EMC. They'll send a CS guy out to do it for you. You want that so you get VAAI support. You want VAAI so you can put more VMs on a datastore without excessive LUN locking. Then listen to Nitro. RAID10 is fine...but do you need it for everything? Have you done some perfmon analysis (or whatever for the OS you use) to see your 95th percentile IOPS requirement? Do you have a tiered breakdown of your proposed VMs? Using this and the separated LUN layout that Nitro mentions you can figure out your storage layout and design. Spend some time here. This can make or break your virtualization project.

The CX4-120 was purchased DEC last year and is actually still in its box up at our colo. EMC will be out to install in within the next few weeks when I go onsite. Would they do this then if needed? I have reached out to my sales guy and his engineer this morning to figure more out on this.

When it comes to our production setup (the CX4-120 will only be used for production) we don't really need to do a tiered storage setup, now in our office we might but we will save that for another day. We also did do the 95th percentile with EMC to come up with our IOPS based off our current loads on some very very active VMs.

Code:
Projected IOPS vs. Actual IOPS for a RAID10
							
                                 Actual	   Projected	   Difference	Size
40 15K Drives			6111.672	7200	   1088.328	7963 GB

LUN sizes are up to you. We used to do 500GB and 750GB LUNs (and therefore datastores) to keep customers from putting too many VMs on a single datastore..which causes other performance issues. Now with VAAI that pretty much goes away so I have no problem with the max datastore size (2TB-512 bytes). It comes down to RAID Group and LUN design at that point. Do a good job balancing LUNs across the two storage processors, make sure you balance the paths for the LUNs across the front-end ports. I see that a lot..everything goes thorugh A0 and B0..nothing on A1 and B1.

I think for the ease of management seeing as the array will support VAAI that we will do a few 2TB LUNs and go from there

Once you get moving let me know and I can do a quick health assessment on the array for you. Make sure everything is balanced well. I'll offer this to anyone else using EMC storage as well, BTW.

Thanks that would be great.
 
First, networking. No big deal on dropping a couple NICs. 10 is a lot, but I see that many a lot especially in iSCSI and NFS installs but usually that many aren't required. What I'd do in this case is take the four NICs you have and put them in one vSwitch. Do separate port groups w/ VLAN tagging for VM traffic, vMotion, FT, and Management. Then within the vSwitch you can override the failover policy. So for the VM traffic you can have NICs 0, 1, and 2 be active and 3 be standby. For vMotion, FT, and Management you can have NIC 3 be active and the rest standby. This way you get 3 NICs for VM traffic, 1 NIC for FT/vMotion but you have the ability to fail a NIC over if required. This way you aren't "wasting" a NIC as a second NIC in the FT/vMotion/MGMT vSwitch. I can do screenshots of this if you need it.

Won't the standby adapters interfere with using IP hash load balancing? I could be completely wrong, I don't have tons of experience with failure situations while using using port groups like that. :) Assuming I'm right, op should update his notes unless he picked up on this on his own.
 
Won't the standby adapters interfere with using IP hash load balancing? I could be completely wrong, I don't have tons of experience with failure situations while using using port groups like that. :) Assuming I'm right, op should update his notes unless he picked up on this on his own.

Good question. I haven't seen it interfere and I now we've done this. I'll test it in my home lab when I return. Easy enough.
 
I'll tell you what..i was going to forego our Flare 30 update from Flare 28..as Dell told us that it would support all features for VSphere 4.1...looks like they were wrong. I should've known it hasn't been the first time. Now..we'll go Flare 30...this post has been very informative indeed!

Does the Flare update require updates to Powerpath do you know off-hand?
 
I'll tell you what..i was going to forego our Flare 30 update from Flare 28..as Dell told us that it would support all features for VSphere 4.1...looks like they were wrong. I should've known it hasn't been the first time. Now..we'll go Flare 30...this post has been very informative indeed!

Does the Flare update require updates to Powerpath do you know off-hand?

No. But...when you go to FLARE30 change the Failover Mode to 4 on the ESX host initiators. The enables ALUA. You want that (with or without PowerPath).
 
Do you know if the AX4-5i also supports this? Our Office SAN is an AX4 and will be the one I start setting up first.
 
Do you know if the AX4-5i also supports this? Our Office SAN is an AX4 and will be the one I start setting up first.

AX4 storage processors use 32-bit CPUs. FLARE30 requires 64-bit. So you can't do FLARE30 on an AX4.
 
Back
Top