SAS vs MLC SSD for mission critical data.

ochadd

[H]ard|Gawd
Joined
May 9, 2008
Messages
1,297
I've got an SAP OLTP machine with a 600GB database that's crushing an 8 disk RAID 10 of 10k rpm SAS drives.
Queue depth up to 780
Peak IOPs of 3600 though sustained of ~1500,
127ms latency reads
257 MB/sec throughput.
21k average IO size
98% Reads and 2% writes.

It's just begging for SSDs. Instead of adding another cabinet of disks I could go with 4 Crucial M4 512 GB in RAID 5 which would be 1.5 TB. Add a hot spare and I'd have more performance than we could use in the foreseeable future. Set the partition to 1 TB and I'd have a year or two of data growth and a 30% under-provision + whatever Crucial hides under the hood. It also gets to idle for 1-2 hours a day when garbage collection could run.

SSD: $3,800 for 170k IOP/s and 1.5 TB
-5x Crucial M4 512

HD: $10,000 for 4000 IOP/s and 1.5 TB
-21x 146GB 2.5" Dell 15k SAS in MD1220

Would anyone else try it?
 
Performance of MLC SSD is much much better than with 10k disks but reliabilty is worse.
I have three server with SSD only storage (ZFS SAN server, ESXi datastore) with 120 GB
Sandforce based SSD's. Failure rate is estimated about 6-8% per year
Failure rate of the same disks in desktops is much better - bad thing, i suppose only ZFS is
able to discover all the problems.

I would not use them in a Raid-5 scenario with important data
I use multiple 3 x ZFS mirrored pools with enough hotspares to sleep well

(currently move to Intel, reliabilty seems better until now)
 
Gea,
What's the resilver/rebuild time on those when replacing the failed disk?
 
I would expect that SSDs which burn in fine should be reliable to a predictable life cycle as opposed to mechanical drives whose failures can be far more "random" even if less likely.

The SSDs use less power, generate less heat, no noise and have better performance.
You could add even more redundancy and hot swaps and still be cheaper.
Personally, I don't understand why more enterprise usage hasn't gone SSD already.
 
Reliability. Enterprise SAS drives are insanely solid in my experience. SSDs are not. It will come.
 
Enterprise SAS spindles at this point will give you more longevity vs the SSD route (imo with the numbers of drives I have changed out recently). I would suggest RAID6 vs RAID5+HS for the small write penalty you will pay with today's cards.
 
Reliability. Enterprise SAS drives are insanely solid in my experience. SSDs are not. It will come.

The performance, running conditions and reliability of SSDs in this case seems more than appropriate as a replacement for Enterprise SAS drives.

Your equipment will not fail from moving parts, heat, vibration... And when they do fail the array can be rebuilt much faster (decreasing the chance of a dual failure).

SSDs that pass burn in have a much more predictable lifetime than spinners.
 
The controllers still fail at random quite frequently. Hopefully that gets worked out soon because the performance vs even the fastest 15k drives is off the scale. I say as long as you have a system for dealing with random total failure (hot spares) they are a viable option now, if you're after IOPS per $.
 
SSD: $3,800 for 170k IOP/s and 1.5 TB
-5x Crucial M4 512

HD: $10,000 for 4000 IOP/s and 1.5 TB
-21x 146GB 2.5" Dell 15k SAS in MD1220

That's not really a fair comparison is it? What would a supported SSD solution from the server manufacturer (Dell?) cost?
 
IS there a change you could use SSD only for read cache? I know some controllers support SSD cache and ZFS also support additional cache devices(for read and for write).

Matej
 
What is your current mechanical disk array running in? Do you have a controller performance limiting array performance? Your comparison speaks nothing about what you will run the custom build SSD array in vs. an off-the-shelf storage array from an OEM. The MD1220 is no big performer.

Are you only concerned with performance? What about support, warranty, etc.? I can onyl guess all of this has been considered, but perhaps not. Your price comparison is not very accurate without everything factored in.

That being said, you obviously won't touch the performance of the SSD, but it's not strictly about performance in the enterprise world. I wouldn't touch the SSD solution for that kind of important data (it sounds important!) without dual parity and hotspare, if not multiple hot spare. I'd feel more comportable with real-time replication of the data in place, although with only 2% write, it sounds like changes are infrequent so your transactional changes seem like they'd be low and data loss would be minimal with a scheduled backup.
 
How long until you will outgrow that 1.5TB and need to replace disks again? If it is less than a year or two, the reliability of the SAS drives may be a non-issue.

Personally I'd rather use a RAID 10 but with that large of a read/write discrepancy you probably wouldn't gain any performance and you'd lose a drives worth of space.

But if you push for SSD's and the unthinkable happens and it breaks, you have put yourself out there to be blamed. If you stick with the supported SAS drives, it's easier to place blame on the vendor and not look like the idiot who put the company's data at risk with consumer grade drives.
 
Aren't there some enterprise SSD solutions outside already? I remember I saw them somewhere...

Anyway, how about a RAID6 oz SAS drives with ZFS and a big SSD read cache? If there is mostly read, data will be served off SSD cache and you will still have all the data safely on the SAS's? In case one read cache wont be enough, you can add 2 or more and put the in raid1(I'm not sure if that is possible on ZFS). That was you would have all your data safe on SAS drives and only cache data on SSD.

Matej
 
That's not really a fair comparison is it? What would a supported SSD solution from the server manufacturer (Dell?) cost?

Around $85,000 for an MD1220 with 1TB solid state from the factory. I can get into a full on EMC disk based SAN for around $35k and move away from DAS for the whole infrastructure. This is actually what I have RFQs out on now.

What is your current mechanical disk array running in? Do you have a controller performance limiting array performance? Your comparison speaks nothing about what you will run the custom build SSD array in vs. an off-the-shelf storage array from an OEM. The MD1220 is no big performer.

Are you only concerned with performance? What about support, warranty, etc.? I can onyl guess all of this has been considered, but perhaps not. Your price comparison is not very accurate without everything factored in.

That being said, you obviously won't touch the performance of the SSD, but it's not strictly about performance in the enterprise world. I wouldn't touch the SSD solution for that kind of important data (it sounds important!) without dual parity and hotspare, if not multiple hot spare. I'd feel more comportable with real-time replication of the data in place, although with only 2% write, it sounds like changes are infrequent so your transactional changes seem like they'd be low and data loss would be minimal with a scheduled backup.

It's in a Dell MD1220 attached to a Dell R710 with a H800 if I replaced the existing drives straight up. If I went the SSD route I'd probably go with the internal storage on a Perc6i. I use DoubleTake for replication.
I'm all kinds of concerned about warranty and support. The core reason why I'm rolling the idea around vs. putting it in place. Having the 1-800-oh-shit-help-me number makes me sleep better.

I'm figuring 1.5 TB will last me a couple years for sure. There are 300 users that would be very impressed with the performance.
 
If you are going to use SSD's, then you need to use enterprise rated SSD's, not consumer type like the M4 512GB. Yes it's bigger, but it's not designed for your environment requirements.

Now if you used the same M4 512GB combined with something like the LSI Megaraid CacheCade capable SAS controller cards, it would probably not have the endurance/reliability issues as using the consumer SSD directly in a high usage RAID setup since the controller card would be doing the majority of the work. Coupled with the controller cards cache, the SSD cache might be a good speed improvement.
 
SSDs that pass burn in have a much more predictable lifetime than spinners.

But not predictable by us. Well, we can predict that they should all fail at the same time. Which is not really something to be happy about.
 
If you are going to use SSD's, then you need to use enterprise rated SSD's, not consumer type like the M4 512GB. Yes it's bigger, but it's not designed for your environment requirements.

Now if you used the same M4 512GB combined with something like the LSI Megaraid CacheCade capable SAS controller cards, it would probably not have the endurance/reliability issues as using the consumer SSD directly in a high usage RAID setup since the controller card would be doing the majority of the work. Coupled with the controller cards cache, the SSD cache might be a good speed improvement.

Its the same discussion like with Sata vs SAS
If you use desktop disks, you must handle a higher failure rate.
For my own i decided to use 'cheap' desktop MLC disks because enterprise SSD disks are out of range,
Even wit my 3 x mirrors it was much much cheaper than a pool with 2 x mirrors build from enterprise SSD's

But i am prepared to throw them away after warranty time (3 years).
I expect then much better and much cheaper SSD's and - i would never ever go back to spindels for my ESXi datastores-
the difference is heavy
 
Around $85,000 for an MD1220 with 1TB solid state from the factory. I can get into a full on EMC disk based SAN for around $35k and move away from DAS for the whole infrastructure. This is actually what I have RFQs out on now.

Well, if I where you, I'd move this direction. You can do it for a lot less than $35k. A fully redundant NetApp FAS2020A with 2.25TB of usable storage (12x450gb FC drives, with two aggregates running RAID DP + hotspare) is right at $20k with support. You can get more usable storage with configuring differently. The new EMC VNX family is in this range as well. I'm a big NetApp fan, but will be checking out the VNX the next time a need for a SAN in this price range comes up. I also like the HP P4000 (former LeftHand) stuff and it's grid storage model. Also something else to look at, same ball park of price at the $20k range for 2-3TB usable.

Downside to moving to a SANi is now your network is a potential bottleneck (iSCSI, NFS) and additional costs to properly get a SAN connected vir iSCSI/NFS. Or you can go FC, but that's almost always more money than iSCSI/NFS route. So something else you should look at is the HP P2000/ G3 MSA in a directly attached configuration (SAS). This is kind of a hybrid between DAS and SAN. The unit itself is like SAN, but you can directly attach it to a parent server. $15k for a 7.2TB SAS connected model (24x300gb 10k SAS discs) or about 18k for 3.5TB using 24x146gb 15k SAS discs). Add in $400 for dual SAS controller in your host computer and you are golden. Oh yea, warranty is about $1500. SAS connectivity means no concerns about network for talking to your storage. You can actually connect this unit to up to 4 servers with dual pathing (2 controller per server) or 8 servers with single pathing (1 controller per server). If you know your growth pattern won't require more servers to connect your storage array up to, these are some sweet units.

I use DoubleTake for replication.

Good software. If you have that going as real-time bit-by-bit replication to a trusted secondary source, then you're probably pretty comfortable about protecting that database. So now the big question would be how long of an outage would you have (and can tolerate) should a home-built SSD array go out and need to be rebuilt?
 
Last edited:
I have to agree with some above that a pure RAIDed set of SSD's is not really the way to go. If you could somehow migrate your existing array to a new controller that supports SSD caching that I think would be ideal since you are so read heavy. Then you could stick just two SSD as mirrored pair (read only) cache and be done. I'd probably be fine with consumer drives in this situation, but you could consider a pair enterprise drives like the Intel 710's here since you'd only need two.
 
Looks like some H800s can do Cachecade SSD caching. Will see what Dell can hook me up with. Thanks for all the replies.
 
How much RAM have you got? As you are so read heavy, I would say that the best bang for buck would be to get as much of the database into RAM as possible, so that the load on storage is lower, If you could boost your budget, a 1TB RAM box (about $65K list from Dell for a R910) will perform much better on read queries than any SSD storage.

I think SLC or overprovisioned MLC is best for CacheCade, as it will be written to more often than your db writes as the cache is optimised for your hot data. You can have up to 32 SSDs in the CacheCade pool so you may be better off with more smaller SSDs to get higher IOPs. The Intel 311 is probably cheapest way to get lots of IOPs on SLC.

Another thing to look at is a SAN that can do in-line block level dedupe, assuming that there is a lot of duplicate blocks in your data. You could try Starwind (the free edition has dedupe), set up a 2TB volume, copy your database files there, and see how much space the deduped data takes up. If it's significantly smaller, then dedupe will help providing the SAN has a powerful CPU.
 
Another thing to look at is a SAN that can do in-line block level dedupe, assuming that there is a lot of duplicate blocks in your data. You could try Starwind (the free edition has dedupe), set up a 2TB volume, copy your database files there, and see how much space the deduped data takes up. If it's significantly smaller, then dedupe will help providing the SAN has a powerful CPU.

I don't know what kind of rdbms is being used, or the platform it is on... but for databases that are not read only, deduplication is generally a bad idea... In a fully normalized database, there should be minimal duplicate data, and the database files stored on disk are likely constantly changing. With that being said a zfs system with an L2ARC comprised of SSD's, and redundant ZIL cache for writes might meet the OP's needs. If using an rdbms that can make use of a lot of ram and cache everything, that would be the best option for read/reporting queries.
 
Last edited:
How much RAM have you got? As you are so read heavy, I would say that the best bang for buck would be to get as much of the database into RAM as possible, so that the load on storage is lower, If you could boost your budget, a 1TB RAM box (about $65K list from Dell for a R910) will perform much better on read queries than any SSD storage.

I think SLC or overprovisioned MLC is best for CacheCade, as it will be written to more often than your db writes as the cache is optimised for your hot data. You can have up to 32 SSDs in the CacheCade pool so you may be better off with more smaller SSDs to get higher IOPs. The Intel 311 is probably cheapest way to get lots of IOPs on SLC.

Another thing to look at is a SAN that can do in-line block level dedupe, assuming that there is a lot of duplicate blocks in your data. You could try Starwind (the free edition has dedupe), set up a 2TB volume, copy your database files there, and see how much space the deduped data takes up. If it's significantly smaller, then dedupe will help providing the SAN has a powerful CPU.

18GB of memory. I've got the SAP basis crew working on optimizing everything. I'm no DBA or SAP guru. The last two attempts at SQL and SAP memory usage tweaking haven't made a significant difference.
 
If you've got 18GB of RAM, and a 600GB database, and many queries have to operate on more than 18GB of data, then no wonder your storage is being hammered...

You don't say what the RDBMS is (SAP can run on Oracle, DB2 and SQL Server, IIRC). If it is SQL Server, there are performance counters in Windows which will tell you how effectively RAM is being used, e.g. Buffer Cache Hit Ratio - the % of queries which are being fulfilled from RAM as opposed to disk. You want that to be as close to 100% as possible. I'm expect the other databases have similar counters.

Last time I looked at the Dell MD1220 with the Pliant SSDs, I think the SSDs were about 50% of the cost per GB of RAM, i.e. very expensive, and given that your problem is more query execution speed, I think spending on RAM would be much more effective than spending on storage. Once you've sorted out read query speed by chucking RAM at it, then you may notice a bottleneck with writes, and then it will be time to look at storage again.
 
If your running oracle, can't get more memory, and just need to improve reads with an SSD, have your dbas check out the db_flash_cache parameters.
 
I wanted to post an update to this thread with the end of this journey.

So I ended up going with the Dell PS4100x. After moving the main SQL load over it became apparent that we had grossly undershot the mark. The box was maxing out around 2200-3000 IOPs with our single SAP production instance on it. 25-30 ms latency and 100+ queue depths. I still had 15 VMs to move over... There were allot more IOPs hiding in that 780 queue depth. After having myself and Dell do the original performance analysis my boss decided we needed to call in a consulting firm for a third opinion. After a week of monitoring they suggested a complete forklift overhaul. $153,000 worth of equipment and software based around a NetApp SAN and VMware. That got vetoed in a hurry.

More than slightly pissed off I then went back to my Dell storage reps. They claimed they hadn't seen anything like that kind of result where the queue depth was masking 75% of the IOPs. After another performance review we all concluded 8k read IOPs is where I needed to be. They proposed a Dell PS6100X that consisted of 7x 400GB SSD + 17x 600GB 10K SAS with a very significant discount.

Been up and running now for a couple months on the 6100X and can't say a bad thing about it. It manages what data is stored on the SSD at the block level in a "RAID 6 accelerated" set. I don't know what black magic they use but it's lightening fast and right on the money. Queue depth went from 100+ to around 20. Latency went from 25-30 down to 2-3. I'm seeing peaks of 7000 read IOPs and it chews through it and users never even notice. It's rated for 18k read IO so there's plenty of room to grow. Hopefully someone looking at Dell PS SANs can make some use out of my experience.
 
Back
Top