Recommended LUN Size for VMware / NetApp

KapsZ28 · Jan 28, 2015

This is a NetApp using iSCSI with 10 Gb networking. We have a single aggregate that is 30 TB of tier 1 storage. Is there any reason to carve it up so that we have a cluster of smaller LUNs like 2-4 TB? Being that they all going to the same controller and there is 20 Gbps of bandwidth dedicate for storage per controller, I don't think we will gain anything in performance from having multiple LUNs. I would rather have say a 12 TB LUN for a couple of reason. One, getting more out of deduplication. And two, some of our VMs are provisioned with 6 TB of storage. It isn't all used, but obviously it exceeds the size of a smaller LUN and I would rather not separate the VMDK's on to different datastores when we have more than enough capacity.

Just kind of wondering what the recommendations are these days now that the limits have increased so much.

lopoetve · Jan 28, 2015

Contention can be an issue with ATS locks on large luns, as well as heartbeat regions and the hostd buffer. All depends on the workloads, to be honest.

Sp33dFr33k · Jan 28, 2015

I would prefer to have the spindles separated into multiple LUNS to spread the work load across the physical disks. That's just how I've always done it and maybe it's not the best way. In our environment with 30 VM's, I would think it would be better to have the physical disks split up across multiple LUNS than having all VMs accessing the same physical drives all the time.

KapsZ28 · Jan 28, 2015

Makes sense. Any recommendations on size? Is 4tb LUNs too big?

peanuthead · Jan 28, 2015

We have done 2tb luns iirc.

kdh · Jan 30, 2015

I wouldn't do larger then 2tb luns. Remember, a 2tb lun has the same exact scsi queuing characteristics as a 4 tb lun or a 512 gig lun.

With that said, if you have 1 host make 10 requests to single 4tb lun you might not notice much of a problem. If you have 30 hosts make 10 requests to a single 4tb lun, you will have queue issue.

If you spread the load wide to more luns, you'll have better over all long term performance.

A lot depends on the environment as well. Sometimes you'll paint yourself into a corner where you have to use 4tb and larger luns. It happens, but as long as you are aware of the risks and they don't impact what you are doing, then you might get away with it with out anyone noticing.

KapsZ28 · Jan 30, 2015

So what if you have several large SQL databases? If they are about 1tb each and you have many of them, would it be best to put them on separate LUNs?

REDYOUCH · Feb 1, 2015

KapsZ28 said:
So what if you have several large SQL databases? If they are about 1tb each and you have many of them, would it be best to put them on separate LUNs?

SQL benefits heavily from parallelism. Generally, this is from multiple files in filegroups. You also achieve some benefit by splitting the files across disks. It comes with a price of management overhead.

The TempDB portion of the database generally receives the most IO. A potential solution would be to use local SSD for TempDB volume and then separate VDMK for C (OS), D (SQL Install), E (SQL Data), F ( SQL Log), and G (TempDB on local SSD).

Of course if you are doing WSFC it's a different story with the use of RDM only. A good read on this: http://www.amazon.com/gp/product/B00M1SJ3OS

Edit: Sorry, I didn't answer your question. For very large databases, you could split across VMFS which comes with some issues (storage-based snapshots, management, etc.). Personally, I would use an RDM for the SQL volumes on extra-large databases. You can keep them on a single LUN and expand as necessary. VMDK files over 2 TB can't be expanded online (ESX 5.1 and below) but RDM can, so keep that in mind.

KapsZ28 · Feb 1, 2015

REDYOUCH said:
The TempDB portion of the database generally receives the most IO. A potential solution would be to use local SSD for TempDB volume and then separate VDMK for C (OS), D (SQL Install), E (SQL Data), F ( SQL Log), and G (TempDB on local SSD).

This is almost exactly how we have one environment setup. Only difference is tempdb is not separated, but I have recently made that recommendation including using SSD. Other difference is the SQL Data and SQL Log are currently using an iSCSI connection, but that is also something I am looking to change. We have half a disk shelf for our NetApp filled with SSDs used for Flash Pool. Would be awesome if I could convince someone to buy at least six more SSDs for an all flash LUN.

RESTfulADI · Feb 3, 2015

May I ask why you're not using NFS? It's a win-win with NetApp and vSphere.

kdh · Feb 3, 2015

I wouldn't run SQL behind VMware, but sometimes that's what you have to do. The more luns you can span it, and if you can put the tempdb on a higher tier of disk the better.

NFS is highly dependent on the backend network. Running all your user traffic on top of your storage traffic is a serious recipe for disaster. But if that's what you got, then that's what you use.

RESTfulADI · Feb 3, 2015

How is NFS more dependent on the "backend network" than iSCSI?

lopoetve · Feb 3, 2015

kdh said:
I wouldn't run SQL behind VMware, but sometimes that's what you have to do. The more luns you can span it, and if you can put the tempdb on a higher tier of disk the better.

NFS is highly dependent on the backend network. Running all your user traffic on top of your storage traffic is a serious recipe for disaster. But if that's what you got, then that's what you use.

What? SQL runs great on VMware. Tons of VERY large companies doing just that.

NFS is no more or less dependent on the network than iSCSI is. On a properly segregated network, it can easily keep up with any other alternative transport protocol (assuming more recent clients/servers).

KapsZ28 · Feb 3, 2015

I am currently reading "Virtualizing SQL Server with VMware, Doing IT Right". So hopefully I will learn a lot more when I am done. Written by some pretty awesome guys.

lopoetve · Feb 3, 2015

KapsZ28 said:
I am currently reading "Virtualizing SQL Server with VMware, Doing IT Right". So hopefully I will learn a lot more when I am done. Written by some pretty awesome guys.

Also check with the array vendor - most of them have quite a few good guides as well (I know we do).

kdh · Feb 3, 2015

RESTfulADI said:
How is NFS more dependent on the "backend network" than iSCSI?

Its not.. I should have been more clear. Biggest pitfall folks have is run NFS or iSCSI on the same network as all their user traffic. Unless you have a really beefy network, its not ideal. You'll want to run your NFS/iScsi traffic on a dedicated storage network in minimize latency and over all slow downs.

kdh · Feb 3, 2015

lopoetve said:
What? SQL runs great on VMware. Tons of VERY large companies doing just that.

Depends on the work load. I wouldn't doing OLTP behind VMware.. Datawarehouse.. maybe. Rule of thumb for me in general.. Don't do it. That opinion could easily change in a 2 or 3 years.

NFS is no more or less dependent on the network than iSCSI is. On a properly segregated network, it can easily keep up with any other alternative transport protocol (assuming more recent clients/servers).

100% Agree.

KapsZ28 · Feb 3, 2015

kdh said:
Depends on the work load. I wouldn't doing OLTP behind VMware.. Datawarehouse.. maybe. Rule of thumb for me in general.. Don't do it. That opinion could easily change in a 2 or 3 years.

I bet some of the world's best VCDX's would disagree with you. And how about an all flash array using VMware and OLTP?

http://blogs.vmware.com/vsphere/2014/08/ms_sql_2014_oltp_all_flash.html

lopoetve · Feb 4, 2015

kdh said:
Depends on the work load. I wouldn't doing OLTP behind VMware.. Datawarehouse.. maybe. Rule of thumb for me in general.. Don't do it. That opinion could easily change in a 2 or 3 years.

100% Agree.

Shens at the first part. I've helped design and implement quite a few large SQL and Oracle OLTP installs for ~very~ large customers, as well as real-time transaction processing for financials, and both run just fine on VMware, if you put a bit of effort into it. It's not something you can drop in with everything else on the same LUN/Volume/host/etc, but depending on how extreme you need, it's all doable with some tuning, and will perform just as well as physical (within 1-2% on the extreme realtime transaction side, equal on the more normal OLTP side).

KapsZ28 · Feb 4, 2015

For us I think NFS may be a better solution. We have VMs where the guest OS has many 1-2 TB iSCSI LUNs connected to them for either MS SQL or Oracle and as we have been moving away mounting them in the OS with iSCSI to using VMDK, it is getting much harder to manage the datastores.

For example two VMs with more than 6 TB of allocated space. Not all is in use, but quite a bit of it is. Some seem to say not to use more than a 2 TB LUN for the datastore because of queue depth issues. Right now I am using 4 TB, but still having space issues unless I separate the VMDK files on to different LUNS.

Any recommendations in this case?

lopoetve · Feb 4, 2015

KapsZ28 said:
For us I think NFS may be a better solution. We have VMs where the guest OS has many 1-2 TB iSCSI LUNs connected to them for either MS SQL or Oracle and as we have been moving away mounting them in the OS with iSCSI to using VMDK, it is getting much harder to manage the datastores.

For example two VMs with more than 6 TB of allocated space. Not all is in use, but quite a bit of it is. Some seem to say not to use more than a 2 TB LUN for the datastore because of queue depth issues. Right now I am using 4 TB, but still having space issues unless I separate the VMDK files on to different LUNS.

Any recommendations in this case?

NFS neatly avoids most of the queuing issues that SCSI has for shared utilization volumes, but doesn't make that much of a difference for 1 volume / 1 workload devices.

Size is one issue - number of workloads is another, activity of workload, total number of volumes, total number of changes/second, clone rates... it's not an easy question to answer anymore.

I have no hesitation using a 16TB VMFS datastore for a guest that needs 16TB of data, and that's the ~one~ thing on that volume. I'd run away if it was 16TB for 150 VMs, or a massive VDI environment or the like. I have no problem using a large NFS volume though, assuming that the backing disks/system are fine with it, as it doesn't have the same limitations that SCSI can have.

Hope that makes some sense

KapsZ28 · Feb 4, 2015

lopoetve said:
NFS neatly avoids most of the queuing issues that SCSI has for shared utilization volumes, but doesn't make that much of a difference for 1 volume / 1 workload devices.

Size is one issue - number of workloads is another, activity of workload, total number of volumes, total number of changes/second, clone rates... it's not an easy question to answer anymore.

I have no hesitation using a 16TB VMFS datastore for a guest that needs 16TB of data, and that's the ~one~ thing on that volume. I'd run away if it was 16TB for 150 VMs, or a massive VDI environment or the like. I have no problem using a large NFS volume though, assuming that the backing disks/system are fine with it, as it doesn't have the same limitations that SCSI can have.

Hope that makes some sense

Thanks, that is great information. I guess for now I will separate our big VMs from the rest of them on their own dedicated LUNs. I asked for the NFS license when we bought it, but they didn't listen. If we had NFS, I am pretty sure it could handle it. It is a FAS8040 with hybrid storage (Flash Pool) and 10 Gb network. Total of a 40 Gb LAG from the switches to the storage.

lopoetve · Feb 4, 2015

A FAS8k has a lot of growth room

You'll definitely hit volume limits before IOPS most of the time.

RESTfulADI · Feb 4, 2015

NFS requires licensing? I don't remember having to buy it on my 8020.

lopoetve · Feb 4, 2015

RESTfulADI said:
NFS requires licensing? I don't remember having to buy it on my 8020.

Depends on many things - relationship with VAR, relationship with NetApp, etc.

RESTfulADI · Feb 4, 2015

lopoetve said:
Depends on many things - relationship with VAR, relationship with NetApp, etc.

We had the premium bundle so it must have been included. OP just remember there is no MPIO with NFS. If you want >10gb per host you have to use multiple LIFs and exports.

KapsZ28 · Feb 4, 2015

Well, I wanted to switch VARs too, but that is a complicated story. Anyway, good to know. I am aware of the lack of MPIO. Most of our VMs are running on a FAS3240 with NFS and multiple LIFs using only 1 Gb. So this would still be a huge upgrade.

lopoetve · Feb 4, 2015

RESTfulADI said:
We had the premium bundle so it must have been included. OP just remember there is no MPIO with NFS. If you want >10gb per host you have to use multiple LIFs and exports.

Fixed in NFS4 with vSphere6, but more than that, throughput is almost never the limitation

RESTfulADI · Feb 4, 2015

lopoetve said:
Fixed in NFS4 with vSphere6, but more than that, throughput is almost never the limitation

pNFS? I must have missed that, better hit up Socialcast.

lopoetve · Feb 4, 2015

RESTfulADI said:
pNFS? I must have missed that, better hit up Socialcast.

Was on the list supposedly as of when I left, and when I last talked to my folk over there. There were more questions on how the new locking was gonna work, as it may re-introduce some limits that NFS eliminated from SCSI in the first place

KapsZ28 · Feb 4, 2015

lopoetve said:
Fixed in NFS4 with vSphere6, but more than that, throughput is almost never the limitation

True, we ran out of IOPS on our FAS3240 before we ran out of 1Gb networking.

lopoetve · Feb 4, 2015

KapsZ28 said:
True, we ran out of IOPS on our FAS3240 before we ran out of 1Gb networking.

Always do. That's why I presented on that twice at VMworld

KapsZ28 · Feb 4, 2015

lopoetve said:
A FAS8k has a lot of growth room You'll definitely hit volume limits before IOPS most of the time.

Only downside is we are running in 7-mode.

RESTfulADI · Feb 4, 2015

Ugh why?

KapsZ28 · Feb 4, 2015

Because two of our older NetApps were setup in 7-mode and they wanted to use snapmirror from old to new netapp, so they stuck with 7-mode.

kdh · Feb 5, 2015

KapsZ28 said:
I bet some of the world's best VCDX's would disagree with you. And how about an all flash array using VMware and OLTP?

http://blogs.vmware.com/vsphere/2014/08/ms_sql_2014_oltp_all_flash.html

I said rule of thumb for me, and it could change in 2 to 3 years. In my environment, today, I will not do it.

All flash arrays are awesome, so don't get me wrong. In fact, I have a 5tb brick of xtreme io im about to setup. But if you need an all flash array to now power your oltp environment because you put it behind VMware? That's a problem.

kdh · Feb 5, 2015

lopoetve said:
Shens at the first part. I've helped design and implement quite a few large SQL and Oracle OLTP installs for ~very~ large customers, as well as real-time transaction processing for financials, and both run just fine on VMware, if you put a bit of effort into it. It's not something you can drop in with everything else on the same LUN/Volume/host/etc, but depending on how extreme you need, it's all doable with some tuning, and will perform just as well as physical (within 1-2% on the extreme realtime transaction side, equal on the more normal OLTP side).

I'm not crapping on this.. At all. if it worked for your customers, and it was a win/win all around. Awesome. Glad it worked in that environment. In my environment, today, its not an option I'm willing to implement.

kdh · Feb 5, 2015

KapsZ28 said:
For us I think NFS may be a better solution. We have VMs where the guest OS has many 1-2 TB iSCSI LUNs connected to them for either MS SQL or Oracle and as we have been moving away mounting them in the OS with iSCSI to using VMDK, it is getting much harder to manage the datastores.

For example two VMs with more than 6 TB of allocated space. Not all is in use, but quite a bit of it is. Some seem to say not to use more than a 2 TB LUN for the datastore because of queue depth issues. Right now I am using 4 TB, but still having space issues unless I separate the VMDK files on to different LUNS.

Any recommendations in this case?

Pull it out behind VMware, and put it back on physical boxes. If its MSSQL then use 2 physical boxes, and do cluster services. If its Oracle, do Oracle RAC.

I don't care what VMware, or the VMware experts say.. Not everything should be put behind it. Just because you can, doesn't mean you should. I've found that the more hoops you jump through in VMware to make something work, the more of a pain in the ass is it to keep in online, update and migrate down the road. More so when a host has more then 10+ ts of allotted space.

If the above is not an option, then do 6tb luns to keep you rolling and plan to move it to physical in the future.

Flakes · Feb 18, 2015

Best practice from NetApp:

1 LUN per volume
Do not put your volumes on aggr0 create a new aggregate and put your data serving volumes on that

Its up to you if you want to use iSCSI or NFS, whatever is best for your environment, as said above you can use greater than 2TB LUN's shouldn't be an issue.

oh and to your its 7-mode not cluster mode, you can always convert at a later date if the business requires it.

SQL and ORACLE will work fine with VMWare very few people have issues with it and many large companies run like this. -> my only word of warning on this is to make sure you use appropriate backup tools, theres companies out there that try to use vmware consistent snapshots through a tool like VSC to backup consistent copies of a Database such as SQL or Oracle, that can cause issues see http://kb.vmware.com/selfservice/mi..._1_1&dialogID=156074555&stateId=1 0 156084694

In terms of the backup please also ensure that VM mount points are on the same LUN as the VMDK, it can do wierd things with backups, if you have a VM on datastore A and mount a ISO to the CDROM drive on Datastore B when a backup runs it will snapshot both volumes (the volume with Datastore A and the volume with B)

KapsZ28 · Feb 18, 2015

Flakes said:
Best practice from NetApp:
Do not put your volumes on aggr0 create a new aggregate and put your data serving volumes on that

Really? Unfortunately it is a little late for that. I am guessing this requires having a certain amount of disks dedicated for the root_vol on aggr0 and then using the rest of the disks to create aggr1?

Recommended LUN Size for VMware / NetApp

2[H]4U

Extremely [H]

2[H]4U

2[H]4U

Supreme [H]ardness

Gawd

2[H]4U

Supreme [H]ardness

2[H]4U

2[H]4U

Gawd

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

Gawd

Gawd

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

2[H]4U

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

2[H]4U

Extremely [H]

2[H]4U

2[H]4U

2[H]4U

Gawd

Gawd

Gawd

Limp Gawd

2[H]4U