Cisco 2960 + iSCSI Problems

Liquidkristal

Supreme [H]ardness
Joined
Dec 17, 2000
Messages
5,075
Right then

At work we have a small cloud infrastructure comprising of a bunch of 2960 switches, with a load of VM's connected, each VM has a mixture of terminal servers, application servers, all regualr stuff that companies require in our market, all in separate vlans, pretty basic stuff and it has been working well as the company has grown.

We have always had a bit of iSCSI going on, mainly for backups, or for large storage for files, the odd database.

Now recently we started moving a lot of VM's onto new SAN infrastructure that we have built (starwind SAN), and we have started experiencing slowdowns, machine locks ups on the VM's that are running from the SAN (we are moving that way so we can start to use vmotion, and other bits and peices), it was fine with 16VM's running over the SAN, 24 damn near killed it

Now using catci and looking at the interfaces / cpu history on the switches it looks like the interfaces are struggling, lots of output drops, the CPU is spiking to 70%.

The iSCSI side of things, is just dual GigE (MPIO) across the switches to the storage server, the server is bearly breaking a sweat, low CPU / MEM usage, bandwidth around 20Mbit sustained.

Now we don't have jmbo's enabled yet (thats on my list of possible fixes), but the question is: Are the 2960's up to the job of handling a lot of iSCSI, I see differing opinions on google of good / bad for these switches, and from what I see we are really beating them down and its starting to show that they might need to be moved off iSCSI duty and something bigger moved in.

Thoughts....
 
No, your 2960 is definitely not meant for the kind of traffic you're pushing through it. It sounds like you are smashing through the port's buffering capabilities and dropping packets thus causing frequent re-transmits which will freeze your VMs.

You'll want a SAN/storage grade switch. If the switch is meant to share other traffic, use a 3560/3750 at a minimum, 4948 preferred. Don't mess with Jumbo just yet until you've done a full array of tests with the right switch. Jumbo in my experience is hit or miss. Some hardware combinations will see gains, some will see a loss, some will see nothing at all. Jumbo with iSCSI isn't just "enable this and away we go".
 
No, your 2960 is definitely not meant for the kind of traffic you're pushing through it. It sounds like you are smashing through the port's buffering capabilities and dropping packets thus causing frequent re-transmits which will freeze your VMs.

You'll want a SAN/storage grade switch. If the switch is meant to share other traffic, use a 3560/3750 at a minimum, 4948 preferred. Don't mess with Jumbo just yet until you've done a full array of tests with the right switch. Jumbo in my experience is hit or miss. Some hardware combinations will see gains, some will see a loss, some will see nothing at all. Jumbo with iSCSI isn't just "enable this and away we go".

Yep. We won't use 2960 switches for iSCSI for this reason. Under light/medium load they are fine..but hit them with like a VDI boot storm and they tank pretty fast. We do 3750s now in that size.

Another good Cisco option is the Nexus 3K. That's why you see the 3Ks used in a lot of iSCSI 1Gb reference architectures and they are cheaper than 4948s.
 
Ok thats exactly what we are seeing, the current infrastructure is starting to show weakness, and I had a hunch that it was the switches that were being battered.

I guess it is time to think about upgrades as the VM's that we run take a pounding as they are mostly terminal servers, so not exactly quiet machines, logon time is a cross fingers and hope that it holds (as is the final hour of the day as everyone gets their work printed off and mailed out for the day)
 
Ok thats exactly what we are seeing, the current infrastructure is starting to show weakness, and I had a hunch that it was the switches that were being battered.

I guess it is time to think about upgrades as the VM's that we run take a pounding as they are mostly terminal servers, so not exactly quiet machines, logon time is a cross fingers and hope that it holds (as is the final hour of the day as everyone gets their work printed off and mailed out for the day)

You really need to properly capacity plan these solutions before putting them in to production.
 
NEVER EVER use a 2960 or a 3750 for iSCSI traffic! Period.

Use a 4948 or 4900 series and above switch here is why ...

<-- B U F F E R S -->

Cisco 1921 with 8 port Gigabit EHWIC vs. Cisco 4948 (Both on my rack @ home)
f198.jpg


Cisco 3750E 24 port Gigabit with 10gig ports.
t5ny.jpg


See the difference in buffers? It massive. iSCSI uses massive burst traffic form time to time and you need the switching capacity to handle the burst properly.

As others have said do not mess with Jumbo.
edit""
There is another factor important to iSCSI switching and that is latency. Lack of latency in a switch serving servers is like Kryptonite to Superman. 4900 series absolutely blaze in the latency arena.

Let me add this too...

http://www.networkworld.com/reviews/2005/090505-cisco-test.html
 
Last edited:
remove switch completely for production

that's what majority of ip san vendors including starwind recommend

if you absolutely need to keep it

cisco is questionable as mixed results

force10 is a no no

netgear being cheap is surpsisingly good

Right then

At work we have a small cloud infrastructure comprising of a bunch of 2960 switches, with a load of VM's connected, each VM has a mixture of terminal servers, application servers, all regualr stuff that companies require in our market, all in separate vlans, pretty basic stuff and it has been working well as the company has grown.

We have always had a bit of iSCSI going on, mainly for backups, or for large storage for files, the odd database.

Now recently we started moving a lot of VM's onto new SAN infrastructure that we have built (starwind SAN), and we have started experiencing slowdowns, machine locks ups on the VM's that are running from the SAN (we are moving that way so we can start to use vmotion, and other bits and peices), it was fine with 16VM's running over the SAN, 24 damn near killed it

Now using catci and looking at the interfaces / cpu history on the switches it looks like the interfaces are struggling, lots of output drops, the CPU is spiking to 70%.

The iSCSI side of things, is just dual GigE (MPIO) across the switches to the storage server, the server is bearly breaking a sweat, low CPU / MEM usage, bandwidth around 20Mbit sustained.

Now we don't have jmbo's enabled yet (thats on my list of possible fixes), but the question is: Are the 2960's up to the job of handling a lot of iSCSI, I see differing opinions on google of good / bad for these switches, and from what I see we are really beating them down and its starting to show that they might need to be moved off iSCSI duty and something bigger moved in.

Thoughts....
 
The 3750s we've used have held up just fine. Most people aren't putting a ton of servers w/ iSCSI on these switches. If you're going to put a lot, yeah, go to a 4948 or Nexus 3K but doing that for a few servers blows the budget real fast.
 
The 3750s we've used have held up just fine. Most people aren't putting a ton of servers w/ iSCSI on these switches. If you're going to put a lot, yeah, go to a 4948 or Nexus 3K but doing that for a few servers blows the budget real fast.

3750s buffers will keel out just like the 2960s. I've watched 3750-E's and X's struggle with backups over iSCSI and normal traffic and cause lots of disconnects and dropped frames. If you have a 3750 in your datacenter, you better be a SOHO/medium business and looking to replace that switch as soon as you start to ramp up traffic. 3750s are access layer switches, not datacenter. The buffer size is too small to handle large loads continuously.

Also OP, make sure to follow your storage vendor and vm hypervisors best practices for iScsi to the letter. I know on multiple storage system vendors using a vmware environment, you will see issues with iScsi if the network portion isn't correct to the vendors specifications.
 
3750s buffers will keel out just like the 2960s. I've watched 3750-E's and X's struggle with backups over iSCSI and normal traffic and cause lots of disconnects and dropped frames. If you have a 3750 in your datacenter, you better be a SOHO/medium business and looking to replace that switch as soon as you start to ramp up traffic. 3750s are access layer switches, not datacenter. The buffer size is too small to handle large loads continuously.

The 3750's will handle it but it won't lock up the VMs like the 2960's do. I've run VDI implementations with 300+ VMs to a stack with 3750's on the backend for all traffic and would occasionally see buffer drops but not enough to where the VMs would lock up. The 3750's were quicker at processing the packets. We even ran high end traffic on 3560's without issue. A properly designed stack though should utilizing switching designed for this type of traffic. Access-grade (i.e. 3560/3750) weren't meant for this.

Also OP, make sure to follow your storage vendor and vm hypervisors best practices for iScsi to the letter. I know on multiple storage system vendors using a vmware environment, you will see issues with iScsi if the network portion isn't correct to the vendors specifications.

I can't stress this enough having been there, done that. VMware won't care and won't recommend anything on the network config, maybe at a high level at the most though we drilled VMware for days when designing our VDI stack to where they deferred to our storage vendor for their recommendation. Our storage vendor had a CCIE on staff who then made the appropriate recommendations.
 
Thanks for the input guys, the more I look at it the more obvious it is becoming that we have outgrown the original setup that got the company started, its just growing pains,

I needed to have the steer in the right direction as far as this went, so I could be 100% in recommendations to splash out a lot of cash on upgrading our core infrastructure.

Hell my Cisco 300SB runs iSCSI better than the 2960's do, its only 2 VM's there but no packet loss at all.
 
I have a 3650 will it have the same iscsi problems. I have all kinds of weird issues with backups.

FYI I have 3 Netgear 10 gb switches 712. I love them they are fast. NFS and Iscsi flies on them.
 
few force10 switches we've used with dell eql and sds did the same

work fine but from time to time hiccups long enough to make vms bsod

netgear is not the best performer but being a fraction of cost of a both force10 and cisco it's definitely a huge winner in terms of a price-performance

I am curious why you are recommending netgear but not force10?
 
few force10 switches we've used with dell eql and sds did the same

work fine but from time to time hiccups long enough to make vms bsod

netgear is not the best performer but being a fraction of cost of a both force10 and cisco it's definitely a huge winner in terms of a price-performance

I run quite a few Force10 switches with no issues in SAN and Core/Distribution roles.
 
Wow this person recommends Netgear but says not to trust Cisco? There is a 99.9% chance that the very packets of information that comprise this very text that you are reading on [H] forums went through 75-80% cisco equipment on it's delivery to your web browser.

Now on the other hand Juniper is another damn good brand to look at or Brocade.

But the important thing to look for in all switches is, What are they designed to do?

Access layer switches are NOT going to have the buffering or fabric to handle large traffic burst. Access switches are for printers, PC's, telephones etc... not servers.

Distribution switches will have the buffering available and are made for servers and other access switches. The 4948 is a distribution switch but can be used as a basic access switch as well. It is a fixed config whcih means that you dont have modules etc... to deal with like a chassis based switch.
For instance a 4948 or Nexus 3 series are designed to have a 2960 plugged into each gig port, each having its own 24-48 ports worth of traffic passing through it. So imagine 48 2960's plugged into your 4948 with each 2960 maxed out with PCs etc... that is what the buffering is for. The 10 gig links on the 4948 can be used either to access a server or for distibuting the 48 other ports worth of 2960 uplinks/trunks into a larger 6500+ or nexus chassis Core switch. So yes they can handle your iscsi traffic 100% fine. And the best part is they are going for about $1000 on ebay for a 48 port switch.

Core switches are overkill do not look at these for your iscsi solution at all unless cost of ownership / TAC agreements / support cost / money is not a factor. If so I would tell you to find an older 6500 chassis and let her scream.
 
Last edited:
You could also get an HP switch I have used 1910g - 2510g - 2910al-g in production networks never had an issue.

I have a QNAP with iSCSI with two 1910G I run like 10 VM of one host with 4 ports and its fast.

I have the another cluster running two xs712T Work great.
I run managment traffic and VSA traffic through them just fine. I have about 30 VM's running through these and have absolutely no problem.

My own personal lab has xs712t handling management traffic, vmotion is stupid fast.
I can't afford a 6k switch atm I wish I had a nicer switch but for 1.6k that's not to shabby.
 
10VMs won't kill a switch. Go boot a couple hundred VDI clients.

Yeah well VDI environments are all together a different beast. I am just talking from SMB perspective. Most Sub 100 user companies can't justify a VDI environment nor the hardware to go with it. I wish I could play with that stuff.
 
What others have said, we had 2960's and 3750's for our iSCSI network and both crapped out. We didn't stop dropping frames till we moved up to the 4948 model switches.
 
2960's is definitely not what you want. I wouldn't scoff at 3750's however. Not in a second.

Right now I'm running 600+ uses in VMware View and another 80+ VM guest servers using a single VNX 5300 SAN for storage (~100TB), with mixed bag of older HP DL380's G6s and newer CISCO UCS B200's. It's entirely iSCSI using 3750's stacked.

I also have single monster RHE server running on an entirely different EqualLogic SAN across 5 EQ arrays all connected via separate dedicated stacked 3750s.

I'm at nearly 6000 IOPS on the RHE environment constantly. I'm at nearly 8,000 IOPS on the View, VMware back office stuff.

I'm Jumbo frames throughout.

The 3750's are great switches if you know how to manage iSCSI environments from the front end server, the network, and the back end storage.

I anticipate only growing out of my iSCSI switch config for the View and back office servers in the next 6 - 12 mos. I'll be around 1100 users in VIew at that time.

I've had parts of this implementation for the better part of 6 years.

I've seen more people 'over buy' for their environments in recent years than ever before. I've seen sales guy's selling SANS based on peak IO that occurs during only backups. The challenge is identifying product that at each layer of your environment can be optimized for the other pieces in your environment (Storage, Network, Server).

In your case, it's likely the 3750's are more than capable when run in at least 1 pair. Really though you need to research both your SAN's capabilities and everything else that connects to it.

24VM's... I have that many MS SQL VMs alone.
 
Last edited:
Yep. We won't use 2960 switches for iSCSI for this reason. Under light/medium load they are fine..but hit them with like a VDI boot storm and they tank pretty fast. We do 3750s now in that size.

Another good Cisco option is the Nexus 3K. That's why you see the 3Ks used in a lot of iSCSI 1Gb reference architectures and they are cheaper than 4948s.

Is this true? I'm in the market for a top end Cisco giabit iSCSI based switch and I don't see the 4948 switch being cheaper than a Nexus 3k, which model(s) are you referring to?
 
Is this true? I'm in the market for a top end Cisco giabit iSCSI based switch and I don't see the 4948 switch being cheaper than a Nexus 3k, which model(s) are you referring to?

Damnit.. where was NetJunkie when we needed him in your thread.
 
Nexus 3K are NOT cheaper . They are newer and therefore even off of fleabay cost more.

3750 has a buffer size that is equivalent to a 2960, look earlier in my thread at he screenshots. The 4948 has a ass ton more buffering as will a Nexus 3K or even Juniper EX4000 + series and above.

While some users have better results with 3750s than 2960's keep in mine the two main reasons why:

1. The particular architecture they are running may be more friendly to the characteristics that a 3750 has
2. The 3750 series has more RAM and a faster CPU than the 2960G. This in it's self means faster frame on the layer 2 side and faster packet forwarding on the layer 3 side than the 2960G. It by default has a faster internal switching fabric due to the 3750 actually being a Ethernet router. By all means if they work for you and your iSCSI environment there is no reason to replace them but if you are building your network from scratch or doing a new implementation of iSCSI why would you put a 29/3700 series switch/router as your core when you have a chance to do it with "Proper" engineering the first time around.

I am in no way knocking the sheer and quite beautiful performance of the 3750 series, because they are wonderful little pieces of hardware, but there is a reason that Cisco has them listed as access switches and they have 4948's listed as server or distribution top of rack switches.

I have one client that is running lots of VMs but I cant freaking for the life of it get him to understand that he is wasting his money upgrading his already dual socket 2011 6 core servers when the problem is his network fabric is not built for the traffic. Its like trying to speak French in Russian as the coolest guy in the world can do when he drinks Dos Equis
 
Last edited:
Nexus 3K are NOT cheaper . They are newer and therefore even off of fleabay cost more.

3750 has a buffer size that is equivalent to a 2960, look earlier in my thread at he screenshots. The 4948 has a ass ton more buffering as will a Nexus 3K or even Juniper EX4000 + series and above.

While some users have better results with 3750s than 2960's keep in mine the two main reasons why:

1. The particular architecture they are running may be more friendly to the characteristics that a 3750 has
2. The 3750 series has more RAM and a faster CPU than the 2960G. This in it's self means faster frame on the layer 2 side and faster packet forwarding on the layer 3 side than the 2960G. It by default has a faster internal switching fabric due to the 3750 actually being a Ethernet router. By all means if they work for you and your iSCSI environment there is no reason to replace them but if you are building your network from scratch or doing a new implementation of iSCSI why would you put a 29/3700 series switch/router as your core when you have a chance to do it with "Proper" engineering the first time around.

I am in no way knocking the sheer and quite beautiful performance of the 3750 series, because they are wonderful little pieces of hardware, but there is a reason that Cisco has them listed as access switches and they have 4948's listed as server or distribution top of rack switches.

I have one client that is running lots of VMs but I cant freaking for the life of it get him to understand that he is wasting his money upgrading his already dual socket 2011 6 core servers when the problem is his network fabric is not built for the traffic. Its like trying to speak French in Russian as the coolest guy in the world can do when he drinks Dos Equis

Pretty much sums up my response.
 
Our current situation with the 2960's is that we grew from not much, and used them as a starting block, now we are wanting to start using iSCSI a lot (using starwind / netapp arrays) and now we have a latency issue due to packets being dropped which needs fixing in a structured manner.

If the 37xx switches are just more powerful 29xx switches then they are probably not where we want to be as the solution we are working towards has to be good for a few years.

Would NIC's factor into thsi issue, we are using Intel T350 Nics across the board for our iSCSI sub system.
 
Step up to 10Gb for SAN. The throughput alone will better allow you to utilize any decent array's real disk read/write throughput.
 
With a 4948-10GE link.... http://www.ebay.com/itm/CISCO-WS-C4...79?pt=US_Network_Switches&hash=item4ac947d9fb

Not correctly pictured has 2 10gb X2 ports. You can hook your NAS/SAN iSCSI blah blah into these two ports and then dole out all the information you want at true line speed mega low latency to all your dependent hosts at gig speeds. This switch will absolutely destroy a 2900/3700 series switch in iSCSI performance.

As a switch is blissfully fast at passing frames and as a router you will get NO slowdown on passing packets.
 
As an eBay Associate, HardForum may earn from qualifying purchases.
Just based on your first post OP, I would recommend having an architect or at very least a few of your vendors come in and make some recommendations.

That setup is by no means a "cloud" and I would be curious what you are moving from to this starwind setup. Starwind is not a bad solution at all in the right instance, but in many cases is thrown into environments not very well thought out since it is cheap and a swiss army knife compared to implementing a major dedicated hardware SAN solution.
 
In fact im disconnecting my 3750E and hooking up my 4948 getting ready to sell the 3750E. I dont need it anymore.
 
what do you do to pick up right san vendor? iops? vm density? capacity? budget?

Just based on your first post OP, I would recommend having an architect or at very least a few of your vendors come in and make some recommendations.

That setup is by no means a "cloud" and I would be curious what you are moving from to this starwind setup. Starwind is not a bad solution at all in the right instance, but in many cases is thrown into environments not very well thought out since it is cheap and a swiss army knife compared to implementing a major dedicated hardware SAN solution.
 
what do you do to pick up right san vendor? iops? vm density? capacity? budget?

A mix of all....not saying starwind is bad by any means, just saying it sounds like the OP could use some other input/opinions from system integrators/VARs/etc. to get to the right solution.
 
i don't care about any particular vendor in this case

i'm interested in the whole process

how you decide where to go

hardware san vs. virtual san

?

A mix of all....not saying starwind is bad by any means, just saying it sounds like the OP could use some other input/opinions from system integrators/VARs/etc. to get to the right solution.
 
yes exactly

if it works why pay 10x more for the name only

we're not talking abstract data we're talking about ip san workload strictly

Wow this person recommends Netgear but says not to trust Cisco? There is a 99.9% chance that the very packets of information that comprise this very text that you are reading on [H] forums went through 75-80% cisco equipment on it's delivery to your web browser.
 
yes exactly

if it works why pay 10x more for the name only

we're not talking abstract data we're talking about ip san workload strictly

I thought you were a troll though you appear to really be that dense. People don't choose Cisco because of the name. They choose it because of the reputation for being the best in the enterprise.
 
Well I was telling a client that they need to redesign their Data Center because the cluster was effed up. No offsite backups, no backups of any sort beside barracuda file backup. I kept repeating myself like a fucking broken record for a year. I bought some cheap usb drives which partially survived with some images. I want to beat the fuck tard that designed it. They spent 250k and nothing on backup systems.

I found out that 3560 switch was one of the culprits but the dumb fuck Jenator in charge of HVAC shut off the power to the server room AC causing it to heat up to 35 C. three of my server are fucked backups were also cooked. They will loose 1-2 month of work. It will be close to 40 Hours of work for me. I said I had enough you buy this shit or I quit.
 
I've had great luck with a cheap HP Procurve 1410-24, ESXi, jumbo frames, Brocade 5709C nics, iSCSI, but of course I'm just one guy running 14 VMs for training purposes in MS DFS configs. There's tons of traffic, but they're pretty much just sequential transfers. It might be something to try:

http://www.amazon.com/HP-Procurve-1...e=UTF8&qid=1383152865&sr=8-3&keywords=hp+1410

Both my SAN and my local network operate off the same switch, but under different IP subnets. (10.2.2 for my main network and 10.5.5, unrouted, for my SAN.) I don't know how well it would scale up for you. It works well for me setting up DFS and having 2TB of data transfers between VMs.
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Did you change MTU size on switch. For iSCSI traffic is recommended to change MTU.

Check it with "show system mtu".
On 2k/3k you need to set it for the whole system.
Change it to biggest possible: system mtu jumbo xxxx
 
Back
Top