SAN network bottleneck

DermicSavage

[H]ard|Gawd
Joined
Jun 8, 2004
Messages
1,107
So I've hit an impasse with my boss in regards to our cheaply built datacenter.

Currently the entire place is run on a 1GbE network all based on a Cisco 6500 core switch. We went this route because the hardware was stupid cheap, but we are now feeling the performance hit on our storage network because of it.

We have a nimble SAN with four 1GbE links, and five hosts also with four 1GbE links.
The main issue we are hitting now is that we have a large number of SQL VMs in this cluster, and we are really getting hit bad on performance and latencies of the storage from all the load. This is impacting our latencies and bottlenecking our disk throughout from an expected 100MB/s to 10-15MB/s read and write speeds.

I've been begging for the investment to put the whole system on a 10gig infrastructure, but have been largely knocked down due to cost. With this recent issue cropping up, they are considering allowing me to install 10gig on the SAN and have that service the hosts which will remain 1gig.

Will having a SAN with 4x1Gb upgrade to 2x10Gb help reduce much of our issue? The SAN is clearly over subscribed right now and I'm hoping just a bandwidth increase will alleviate the issue significantly enough for the year until we get a budget to overhaul the network.

Does anyone have any insights on the storage performance here? I'm no storage engineer and am trying to piece it all together.

As a note, the SAN is a Nimble with 680GB of flash cache, the protocol used is all iSCSI, and the storage network is all connected via layer 2 on the 6500 chassis.

Note 2: Any recommendations on switches that would provide low latencies, primarily offer 1GBASE-T, and 4x 10GbE SPF+ ports? I am hoping to find something less than $7-10k per switch. Cisco 3850 provides the port counts, but I'm not sure about latencies. I'm welcome to any comments
 
If you're running tons of SQL VMs I'd check that it's not all the reads/writes to the databases causing the latency versus the network hardware.

Disclaimer: I'm not a system admin/engineer though so that's just my best guess.
 
How much RAM is dedicated to each SQL VM?

I take it you are using NIC teaming?

Have you tried adjusting the buffers on the NICs?

What does the load on the switch look like?

Have you considered moving to SSDs on the SAN? All the network upgrades in the world are not going to help if you are being limited by the super low IOPS of spinners even though you do have flash cache.

I am not familiar with the Nimble SANs... Is there an option to change the cache mode? Write-Back is much faster than Write-Through.
 
that Cisco should support SNMP, are you graphing the port utilization with anything to see if this is an actual network issue vs something else?
 
Are you running your iscsi traffic and production network traffic over the same NIC's?
 
I would look at ram too.... the BIGGEST issue we ever had with our SQL database(back when we HAD one onsight) was my Server 2008 box. I only had 4GB of ram and it couldn't keep up because the DB was about 4GB large and couldn't cache it all properly. Moved to 8GB and all was well. Instant load times... all the difference in the world.

Now... I am NOT saying this is the problem, but there is a chance...
 
If you're running tons of SQL VMs I'd check that it's not all the reads/writes to the databases causing the latency versus the network hardware.

Disclaimer: I'm not a system admin/engineer though so that's just my best guess.

I don't even know how to check that. I've been using IOMeter and can see that running 4K tests is insanely slow when compared to a 64K test.

How much RAM is dedicated to each SQL VM?

I take it you are using NIC teaming?

Have you tried adjusting the buffers on the NICs?

What does the load on the switch look like?

Have you considered moving to SSDs on the SAN? All the network upgrades in the world are not going to help if you are being limited by the super low IOPS of spinners even though you do have flash cache.

I am not familiar with the Nimble SANs... Is there an option to change the cache mode? Write-Back is much faster than Write-Through.

The SQL servers all have 32GB of RAM each.
iSCSI is using MPIO for the four NICs, with a round robin policy (recommended by Nimble)
Not sure how to adjust the buffers. The hypervisors are all running CentOS 6 and using Xen 4.3 as our hypervisor kernel.
Actually collecting traffic counters from the switch now. Thus far, total throughput doesn't seem to be that high, but I can tell running a 4K test with IOmeter, it murders performance for other VMs on that hypervisor while actual throughput stays rather low

As far as SSDs on the SAN go, it's all write-back and all is used for cache. Nimble is kind of like Meraki, they take away a lot of customization to make it easy and just run a lot of best practice under the hood. My management seems to like that idea, but it's more about the marketing line 'NetApp levels of performance without the NetApp cost' that I think drew that decision. I'm sure it's not quite as fast as NetApp can get, but it's mostly a plug and play SAN and does work very quickly.

Are you running your iscsi traffic and production network traffic over the same NIC's?

iSCSI traffic is on dedicated NICs

I would look at ram too.... the BIGGEST issue we ever had with our SQL database(back when we HAD one onsight) was my Server 2008 box. I only had 4GB of ram and it couldn't keep up because the DB was about 4GB large and couldn't cache it all properly. Moved to 8GB and all was well. Instant load times... all the difference in the world.

Now... I am NOT saying this is the problem, but there is a chance...

Yea, I commonly hear that upping RAM is important, and thus we have 16-32GB of RAM on each SQL server.



I've been collecting the network speeds from the switch and hypervisors when doing a IOMeter test, and it looks like I'm not breaking more than 100mbps on any single port. I'm pretty much at a loss as to what is the root cause, but there's definitely something causing a bottleneck there.
Is there some other metric of storage networking that I can measure aside form just network throughput?
Anyone have recommendations of tests to run with IOMeter and what I can compare to in order to see what I should expect to see?
 
You said iSCSI traffic is on dedicted NICs, but is it connected to dedicated iSCSI only switches. (as in no SMB traffic traversing the switch?)

We pulled our fileserver SAN off of the datacenters Cisco 3500s for a pair of dedicated Dell 6200 series switches. The difference was fairly significant. We measure 450MB/s max with minimums coming in around 80MB/s.

When we were using the datacenter's (nondedicated) switches we saw an avg throughput of 30MB/s

Also switches that are: primarily offer 1GBASE-T, and 4x 10GbE SPF+ ports usually can not handle 10Gbe iSCSI traffic well as the buffers are too small.

switches used for iSCSI traffic typically have a configuration in the manual for iSCSI use case. I 'd consider using a switch for both, if A it was a beast of a switch, and B I segregated the traffic into VLANs by port.

Since you are using SQL there's a ton of performance related data you need to take and measure before jumping to conclusions.

What version of SQL are you using? 2012 is way more efficient when fed large qualities of RAM. If you have multiple installs per host, I'd consider licensing the box for SQL 2012 (or R2)and upgrading the box to 128GB of ram.

You need to isolate your performacing metrics to what is bottlenecked at the OS, verses the box, verses the network, verses the SAN.

Just to give you another example:
We just rebuilt our network of 500 VMs here and you need to do your homework and determine based on all the equipment you currently have what the best and most cost effective options would be.

I pulled off our upgrade for $80k. The initial suggested upgrade was $140k. Knowing your system and what is available on the market is key.
 
If your seeing heavy VM latency on only one host when you generate load with IOmeter then i would look at two things.

Latency on the storage NIC's.
Maybe look at your link modes, maybe route based on physical nic load would work better than round robin. Depending on your storage capabilities
 
I'm not breaking more than 100mbps on any single port. I'm pretty much at a loss as to what is the root cause, but there's definitely something causing a bottleneck there.
Is there some other metric of storage networking that I can measure aside form just network throughput?
Anyone have recommendations of tests to run with IOMeter and what I can compare to in order to see what I should expect to see?

You're failing to understand the relationship between IO size and throughput. The bottleneck here is the performance of your SAN. How many disks does it have and what is the layout?

Your SAN can only do so many IO operations per second for any given workload, it's limited physically by the disks and the pattern of IO going in or out of them. For instance if you were to remove all other load from the SAN and use IOMeter to do a single sequential read using a large 1Mb blocksize, it would probably deliver at least 100MB/s of throughput since even a single spinning disk can do this. In this case if you got 100MB/s you'd be nearly using an entire GigE link and you'd be getting 100 IOPs (100 1Mb operations per second, totaling 100MB/s). If you did the same test with tiny 4KB blocks, the disk are going to have a much harder time doing lots of little requests. Say your disks can do 20 IOPs in this condition, since they are only processing 20 4KB IOs every second your throughput is only .4MB/s, but you're disk is totally slammed!

What's happening is your workload is much more like the second example, lots of random small IOs. SQL is going to do a lot of smaller reads/writes when it's busy (depending on your DB workloads). Since you're running multiple SQL servers against the same SAN the SAN is seeing pretty much entirely random IO. Since it sounds like you're using spinning disks they are getting slammed constantly seeking around only to return or save small chunks of IO. Spinning disks aren't great at that so they aren't going to be able to do a lot of IOPs, and since your IOs are very small your throughput is going to be very low.

You can probably get a rough idea of what your array is capable of with IOMeter if you're able to stop all of the other VMs while you benchmark. Create a benchmark with several workers with fully random IO and vary the IO size between 4k and 64k with a mix of read/write. The total IOPs and throughput of all these workers will probably be similar to what you're seeing now.

If you want to increase performance in your situation you can do several things. Either just add more disks to the existing raid group. Add more disks and create separate groups and separate your busy SQL servers onto each group. Add SSD drives to your SAN if it can handle it. SSDs will deliver much better performance than spinning disks, especially at low block sizes and they are largely unaffected by random workloads.

There are ways to make iSCSI faster, but it sounds like your issue is entirely IO based. As you can see, it would take a massive amount of IOPs to saturate a gig link with the kind of workload you have.
 
You're failing to understand the relationship between IO size and throughput. The bottleneck here is the performance of your SAN. How many disks does it have and what is the layout?

I don't think we can say for sure if its the SAN. He stated:

Thus far, total throughput doesn't seem to be that high, but I can tell running a 4K test with IOmeter, it murders performance for other VMs on that hypervisor while actual throughput stays rather low

If the SAN LUN got hammered, it would reflect not just on the single host, but any VM on that LUN, regardless of host. To me is sounds like something is going on at the host level with the round robin iscsi traffic.
 
Beside a more capable controller the best performance improve we saw when redesigning out VM enviroment was going from a 22 drive RAID 5 w/2 global hotswaps to a 4x5 drive RAID 50 with 4 global hot swaps.

RAID 5 on the old controller we had approx: 7400 IOPS Read, 3800 IOPS Write

RAID 50 on the new controller we measured: 13,600 IOPS Read, 9400 IOPS Write

It all comes back to knowing your equipment and were the bottlenecks are.

FYI...the new controller according to the manufacturer is 20% faster then the older model.
 
What else is the SAN doing?

I don't know masses about Nimble but if you're hitting a ton of stuff that isn't in the SSD cache then you're basically hitting a 7.2K SATA (maybe NL-SAS?) RAID6.
 
Back
Top