SAN with SSD, reliability?

holden j caufield · Feb 9, 2017

Running a Scale cluster with a number of spinner drives. Veeam backup snapshot removal is causing so much IO vsphere sometimes shows the vm timesout for a split second.

I'm wondering what the pros and cons of an SSD filled SAN are. I think it's the way to go but my collegues are not sold on the idea of SSD.

myaccountbroke · Feb 9, 2017

holden j caufield said:
Veeam backup snapshot removal is causing so much IO vsphere sometimes shows the vm timesout for a split second.

That's unusual to say the least. I'd check for configuration issues first (all multipaths/channels?). Have you been running any monitoring agents? What's performance like without the backups running? Veeam defaults to a fairly low queue depth, generally speaking you won't even notice it running.

There is a place for SSD SANs, but you should really make sure your setup is working right before looking into a multimillion dollar 'solution'.

k1pp3r · Feb 9, 2017

myaccountbroke said:
That's unusual to say the least. I'd check for configuration issues first (all multipaths/channels?).

It actually is pretty common on high IO VM's or anything transaction related like Informix. I use to see a lot of disconnects due to snapshot removal on really high IO machines.

THUMPer · Feb 9, 2017

We have an SSD SAN. It was too fast for one of our applications, and caused an issue. However it runs great with little to no downtime.

myaccountbroke · Feb 9, 2017

k1pp3r said:
It actually is pretty common on high IO VM's or anything transaction related like Informix. I use to see a lot of disconnects due to snapshot removal on really high IO machines.

I agree SSD SANs have very real value. I just prefer looking for the 0 dollar solution first; which is likely why the rest of his team is skeptical. There a lot of knobs that can be tuned in Veeam and certainly in their applications. A good starting point is to figure out their actual usage first, determine if if is just a small subset of data that needs high IOPS, can it be resolved by more cache, etc. Many SANS come with multiple disk pools, maybe you just need one smaller pool with higher performing drives/SSDs rather than replacing the entire infrastructure.

holden j caufield · Feb 9, 2017

If there are too many delta changes from when the snapshot is taken and when it completes I think IO strain is too much for too long when snapshot removal happens There are also muliple backups being run. Spaced out an hour apart but I think if your veeam repository says keep it for 30 days, after x days it consolidates some and takes longer. I know SSD SAN is the way to go as the slowness is costing them money and it would pay for itself in like a month with improved responsiveness and latency.

I just need a way to phrase it so they can be sold on it.

Child of Wonder · Feb 9, 2017

Reliability of an all SSD SAN is just fine so long as the array was designed for flash storage from the get go. If it's simply a legacy disk array with SSDs slapped in using legacy RAID and treating the SSDs like they're disks, then I wouldn't expect too much longevity out of it, which, coincidentally, is why most manufacturers increase their maintenance costs as time goes by and want to force you into buying another new array every 3-5 years.

Olga-SAN · Feb 18, 2017

scale definitely has i/o performance issues

their software defined storage stack is pants ;(

just upgrading hdds -> ssds won't help much

they don't have all-flash configs for a good reason

https://www.scalecomputing.com/products/hardware-platforms/

we have a customer who wiped off their kvm fork-out and installed hyperv 2016

tl;dr don't do what your vendor don't want you to do or you'll void your support

holden j caufield said:
Running a Scale cluster with a number of spinner drives. Veeam backup snapshot removal is causing so much IO vsphere sometimes shows the vm timesout for a split second.

I'm wondering what the pros and cons of an SSD filled SAN are. I think it's the way to go but my collegues are not sold on the idea of SSD.

Eulogy · Feb 22, 2017

We have several dozen PB of all-flash storage deployed. No real issues to speak of... with that many drives out there, of course we see failures, but not a noticable amount more than with spinning rust.

bigdogchris · Feb 23, 2017

holden j caufield said:
Running a Scale cluster with a number of spinner drives. Veeam backup snapshot removal is causing so much IO vsphere sometimes shows the vm timesout for a split second.

I'm wondering what the pros and cons of an SSD filled SAN are. I think it's the way to go but my collegues are not sold on the idea of SSD.

Rather than buy a bunch of expensive SSD's, just use backups from Storage Snapshots. Depending on your SAN it may be a feature you have to license but it shouldn't be that expensive. Then just check the box in Veeam under the advanced configuration for guest VM processing. But you do need a proxy server storage attached (FC or iSCSI) zoned to the SAN as well.

https://www.veeam.com/backup-from-storage-snapshots.html

It would mean your VMware snapshots are about 1-2 seconds rather than possibly hours now. The recommitting of hours of changes disk is what causes the problem.

geiger · Feb 24, 2017

If your san supports it, look into VVOLs for servers that tend to be problematic with the snapshot removal process. We had a number of VMs that would take almost an hour to remove the snapshots and it was killing our SAN IO.

Eickst · Feb 25, 2017

geiger said:
If your san supports it, look into VVOLs for servers that tend to be problematic with the snapshot removal process. We had a number of VMs that would take almost an hour to remove the snapshots and it was killing our SAN IO.

You're better off seeing if your SAN is supported from VEEAM for storage snapshots. Snapshot VM, take storage snapshot, close VM snapshot, take backup from storage snapshot. We do it with TB's of info and vm snapshots are open for less than a minute.

Olga-SAN · Feb 26, 2017

many ssd drives put into array are actually less prone to failure for a reason

load is split between all the drives in array => single drive get less work to do => less burnt cells to replace

Eulogy said:
We have several dozen PB of all-flash storage deployed. No real issues to speak of... with that many drives out there, of course we see failures, but not a noticable amount more than with spinning rust.

geiger · Feb 27, 2017

Eickst said:
You're better off seeing if your SAN is supported from VEEAM for storage snapshots. Snapshot VM, take storage snapshot, close VM snapshot, take backup from storage snapshot. We do it with TB's of info and vm snapshots are open for less than a minute.

Our SANs are(Nimble), but purchasing 120+ sockets of Veeam Enterprise Plus is a tough sell.

Eickst · Feb 27, 2017

geiger said:
Our SANs are(Nimble), but purchasing 120+ sockets of Veeam Enterprise Plus is a tough sell.

I don't think you will be able to avoid the VM stun without it.

Langly · Feb 28, 2017

VVOL is so fun to play with... but I'm still leary on the ESXI side for stability with it.

Vendors are dropping prices lately which is good for us the consumers. Instead of using the most expensive crazy Enterprise Flash drives the hardware vendors are allowing the middle of the pack drives to be bought. Tie this into more software defined storage coming around and the marketplace is becoming quite interesting.

If you're coworkers aren't sold on flash then they are stuck in the dark ages. Spinning mechanical drives are going the way of tapes. Used for specific purposes but falling out of mainstream

geiger · Mar 1, 2017

Eickst said:
I don't think you will be able to avoid the VM stun without it.

We're still on vRanger, but once I moved the VMs over to VVols I have almost instant snapshot removal after the backups complete.

Grimlaking · Mar 8, 2017

We are running VNX5400's with Flash, SSD, and SAS drives in a tiered pool storage setup. Still new for us but seems ok so far.

4saken · Mar 8, 2017

netapp 8080AFF(all flash fas) here, it absolutely decimates all workloads. we love it. If you still buy SAS you are already behind. SATA for cold storage/static workloads is still OK, but only for probably another 2 years. SSD capacity/reliability/pricepoint are about to remove mechanical drives from existence permanently at the enterprise level.

SAN with SSD, reliability?

holden j caufield

Gawd

myaccountbroke

n00b

k1pp3r

[H]F Junkie

THUMPer

Supreme [H]ardness

myaccountbroke

n00b

holden j caufield

Gawd

Child of Wonder

2[H]4U

Olga-SAN

Limp Gawd

Eulogy

2[H]4U

bigdogchris

Fully [H]

geiger

Limp Gawd

Eickst

[H]ard|Gawd

Olga-SAN

Limp Gawd

geiger

Limp Gawd

Eickst

[H]ard|Gawd

Langly

Supreme [H]ardness

geiger

Limp Gawd

Grimlaking

2[H]4U

4saken

[H]F Junkie