ESXI to SAN

TType85

[H]ard|Gawd
Joined
Jul 8, 2001
Messages
1,487
In our datacenter we have the following hardware:
  • 1x Supermicro 1u for PFSense (i3-3220T, 4GB ram)
  • 2x HP DL160 G6 servers for ESXi (Dual 56xx, 72GB ram each, Intel 2port NIC)
  • 1x Enhance Technology ES3160P4 iSCSI SAN (Dual controller, Dual 4x 1GB NIC ports, 10x600GB 15K SAS drives in a raid 10)
  • 1x Cisco 3750G(?) switch (not 100% sure on the model but it is a 24-port gig one).
  • 1x HP 8 port switch (used to connect our line in to the pfsense box. They installed a optical line instead of a copper one and for the life of me I couldn't get the cisco switch to work with it).

I am trying to figure out the best way to hook this up to get the best performance out of the SAN. The SAN can do MPIO, LACP, etc.

Currently it is set up all 8 ports from the SAN go to the cisco switch; the san is configured as 2 LACP links. All 8 ports from the ESXi hosts go to the switch. The only other connections on the switch are for the cross connect to our DR server, the management NICs for the SAN and the LAN side of the PFSense box.

Would it be best to replace the 2-port nic's on the server with 4-port ones and direct connect the SAN to them (no switch)? No iSCSI data would be going over the switch, which if I recall correctly doesn't handle iSCSI well. Do I use LACP?

Something like this?
SAN Controller 1
PORT1 --->ESXi-1
PORT2 --->ESXi-1
PORT3 --->ESXi-2
PORT4 --->ESXi-2

SAN Controller 2
PORT1 --->ESXi-1
PORT2 --->ESXi-1
PORT3 --->ESXi-2
PORT4 --->ESXi-2

This is production for my work. The 2 hosts is just to give some redundancy/fault tolerance. We could host everything on one server if something goes down. Doing 2 connections from each controller should give redundancy if the controller goes down. The SAN itself has redundant controllers.
 

Blue Fox

[H]F Junkie
Joined
Jun 9, 2004
Messages
11,813
You shouldn't be using LACP for iSCSI and just let MPIO do its thing instead.
 

Nate7311

2[H]4U
Joined
Jan 11, 2001
Messages
3,320
Yup, LACP and iSCSI can produce oddball effects. Given a stronger switch, your best bet would be to run the everything into the switch, isolated on it's own VLAN and setup your ESXi hosts to utilize MPIO as BlueFox mentioned. Performance may be an issue, depending on the specific model of the Cisco Switch. I don't think the 3750 series has the fabric speed to keep up with all ports. But, I'm not a Cisco guru, so I'll defer to those that are.
 

peanuthead

Supreme [H]ardness
Joined
Feb 1, 2006
Messages
4,701
Do you want max performace or some reliability built in?

Max performance - put all storage traffic on it's own switch

Some reliability built in - put one ESXi server on one switch and the other on another switch (I know you would need to purchase it). The place 2x 1Gb connections to each of those switches and left MPIO do it's thing. The ESXi traffic and storage traffic would be in its own VLAN. That should help keep you from having any one point of failure other than the physical array itself. Just some initial thoughts.
 

TType85

[H]ard|Gawd
Joined
Jul 8, 2001
Messages
1,487
Yup, LACP and iSCSI can produce oddball effects. Given a stronger switch, your best bet would be to run the everything into the switch, isolated on it's own VLAN and setup your ESXi hosts to utilize MPIO as BlueFox mentioned. Performance may be an issue, depending on the specific model of the Cisco Switch. I don't think the 3750 series has the fabric speed to keep up with all ports. But, I'm not a Cisco guru, so I'll defer to those that are.

I have a 3Com 4500G switch I can use if it is better than the Cisco for SCSI traffic.

Right now there are 2 datastores on the SAN, I am seeing 8 targets, 2 devices, 16 paths. Can I assume MPIO is working?
 

peanuthead

Supreme [H]ardness
Joined
Feb 1, 2006
Messages
4,701
Create some network traffic, run esxtop (or resxtop from vMA), press ‘N’ key (network) and see if you get traffic. You should have similar numbers on the vmk port/vmk number rows.
 

schizrade

Supreme [H]ardness
Joined
Feb 15, 2003
Messages
4,885
I have a 3Com 4500G switch I can use if it is better than the Cisco for SCSI traffic.

Right now there are 2 datastores on the SAN, I am seeing 8 targets, 2 devices, 16 paths. Can I assume MPIO is working?

Yes MPIO is working. Make sure you enable Jumbo Frames.

As far as a switch, the 3750G is garbage for the role as a SAN switch. I had to use a few for over a year until new units could be purchased. You need line speed, fast CPU, and large buffers. I use Cat 4948E switches. Brocade has some great switches for this as well. On paper the Brocade switches perform better, but in practice I found the 4948E ran faster in our application.

Your HP has a 4MB buffer, so not the best, but better than the 3750G. The 4500G is a distribution/floor switch, not a datacenter/san switch. Again, not *ideal*, but better than the 3750G.
 

TType85

[H]ard|Gawd
Joined
Jul 8, 2001
Messages
1,487
Inverted Pyramid of Death = 1 SAN.

Unfortunately it's all we have and i'm working on a tiny budget.

At least the SAN has dual controllers, dual power supplies (one on each PDU) and I am running RAID10 with a hot spare. I also have all the VM's backing up nightly to our DR server in another DC so if the SAN fails I can bring the important stuff back up.

We only need around 1.5-2TB of space for all of our VM's (SQL server is about half of that). With some sort of deduplication we could probably even have a smaller footprint.
 

Haitch

Limp Gawd
Joined
Mar 8, 2011
Messages
383
Given it's two hosts, and I assume not likely to increase (?), put in the quad ports and bypass the switches for iSCSI all together, use direct connects the way you indicated, with each link being on it's own subnet.
 

SGalbincea

Weaksauce
Joined
Aug 14, 2008
Messages
103
No need to use jumbo frames anymore.

Can you explain how you came to this conclusion? I completely disagree. In our own extensive internal testing with EqualLogic, Nimble, and even VSAN, we saw improvements in latency and throughput of between 9% and 25% with jumbo frames enabled. It is only when we completed saturated the 10Gb links to 100% that the differences became negligible.
 
Top