Need new drives to upgrade our 10k SAS 6Gbps 2.5 hotplugs; recommendations?

RavinDJ

2[H]4U
Joined
Apr 9, 2002
Messages
4,026
We have a Dell PowerEdge R720 and we currently have three (3) 900GB 10K RPM SAS 6Gbps 2.5in Hot-plug Hard Drive in RAID 5 for a total of 1.8TB of storage. The server has 32GB of RAM and an Intel Xeon E5-2620 CPU @ 2.00Ghz.

We would like to upgrade our drives to more storage and hopefully more speed. Looking for a total of 2.5-3.0TB of storage and could use an increase in speed, as well.

We have the 2.5" Chassis with up to 8 Hard Drives and a Dell Perc H710 RAID controller.

I would like to add new drives at a reasonable price (not looking to go under $400 but not looking to go over $4,000).

Any recommendations on which drives, what number of those drives, and in what RAID configuration to get?

Any help will be greatly appreciated.

Thanks!!!
 
Joined
Dec 1, 2004
Messages
966
Move to consumer SSD drives. If you're looking for ~3TB of space, then I would add 4x 1TB Samsung 860 EVO SSDs in RAID 5. That'll give you 3 TB (2.73 TB really) capacity, and compared to any spinning disk will seem *lightning* fast. The 1TB drives are currently ~$150/each, so that puts your upgrade in the $600 range. You also might have to buy one drive sled, because a lot of Dells come with 'dummy' trays in the slots that aren't populated.

Obviously if you need more space, then you can just buy more drives - if you spend $1200 you can get 8x 1TB SSDs and run them in RAID6 for extra redundancy.

My company has put a ton of 850 and 860 EVO drives in R610/620/710/720 servers and they're chugging along to this day as a result.

If you have any questions, I'd be happy to answer.
 

RavinDJ

2[H]4U
Joined
Apr 9, 2002
Messages
4,026
Move to consumer SSD drives. If you're looking for ~3TB of space, then I would add 4x 1TB Samsung 860 EVO SSDs in RAID 5. That'll give you 3 TB (2.73 TB really) capacity, and compared to any spinning disk will seem *lightning* fast. The 1TB drives are currently ~$150/each, so that puts your upgrade in the $600 range. You also might have to buy one drive sled, because a lot of Dells come with 'dummy' trays in the slots that aren't populated.

Obviously if you need more space, then you can just buy more drives - if you spend $1200 you can get 8x 1TB SSDs and run them in RAID6 for extra redundancy.

My company has put a ton of 850 and 860 EVO drives in R610/620/710/720 servers and they're chugging along to this day as a result.

If you have any questions, I'd be happy to answer.
That's soooo much, Sinister!! I was doing some research online and many posts said RAID 5 is bad. I should go with RAID10. Was that the old case with spinning disks or should I still spend a little more and go RAID10?

I definitely need tray caddies because we have the 'dummy' trays in the 5 empty slots. But what do you mean "one drive sled"? You mean the caddies?

In the photo, it's the R720 on top.
IMG-1539.JPG
 
Joined
Dec 1, 2004
Messages
966
That's soooo much, Sinister!! I was doing some research online and many posts said RAID 5 is bad. I should go with RAID10. Was that the old case with spinning disks or should I still spend a little more and go RAID10?
RAID5 is 'bad' because as capacity increases you run the risk of suffering a second drive failure during a rebuild operation.
RAID10 is faster than RAID5 or 6 because it's a simpler math for the RAID controller to perform since it doesn't involve calculating parity. It also suffers no write amplification penalty as a result, which affects RAID5 and doubly affects RAID6.

With all that said, at the capacity levels you're discussing and the fact you're considering SSDs instead of mechanical drives, RAID5 would still be fine.
If you move to 8 drives, I would use RAID6. RAID6 fully mitigates RAID5's potential for failure during a rebuild, while only sacrificing one additional disk's capacity to parity data. It'll be slower than RAID10 (while still being way faster than what you have now), but you'll have more capacity compared to RAID10, and I personally prefer the more predictable nature of RAID6 fault tolerance. With RAID10, if you get extremely unlucky a second drive failure can nuke your array, where RAID6 always has 2 drive tolerance. An 8-drive RAID10 array would (typically) consist of four RAID1 groups that are striped together. In this arrangement you could potentially survive up to four simultaneous drive failures, assuming all four failures were on different RAID1 groups; however it is possible that if you got super unlucky and lost both drives from any one of the four RAID1 groups that the entire array would be lost.


I definitely need tray caddies because we have the 'dummy' trays in the 5 empty slots. But what do you mean "one drive sled"? You mean the caddies?
Sorry, was using tray / caddy / sled interchangeably. I meant you might have to buy one if you put in 4x SSDs to replace your 3x existing drives, because you currently only have 3 trays/caddies/sleds. Obviously in reality you need to buy as many trays as you want to add drives :)
 

Ready4Dis

Gawd
Joined
Nov 4, 2015
Messages
733
RAID 6 does not fully mitigate, it can fail as well, it's just less of a chance of two drives vs one. It also must read all the other drives to rebuild the array and on very large drives it takes some time. With raid 10 you *could* lose two drives and be down, but it would have to be 2 specific drives. Any other 2nd drive and it can still function. Also it only needs to rebuild one piece rather than hammer all disks, which means less to rebuild. I run RAID 10 in my home server (Dell R710) and get around 1GB/s sustained (sequential) read speeds from my 5.4k RPM drives. I have 6 500gb drives, if I lose one I need to read/ rebuild 500gb. If I put it into RAID 5, I need to read 2.5TB to rebuild 500gb of data. My array is small enough raid 5 or 6 would have been ok, but it was mostly for playing and learning.
That said, depending on how critical your data is, 4 2TB SSD in raid 10 will give you 4TB and have read/write speeds near 2TB/s with great random I/O as well. You would have room to upgrade if needed. I don't think your likely to have to many issues with such small drive counts, but RAID 6 with 4 drives doesn't net you anything over RAID 10.
 
Joined
Dec 1, 2004
Messages
966
RAID 6 does not fully mitigate
On a smart RAID controller, it 100% does.

The reason is this; RAID 5 rebuilds can fail because the mathematical chance of an of unrecoverable read errors (URE) starts getting pretty high when the capacity of an array goes up. All drives have a rated URE rate, and if you take the mathematical chance of any particular read event failing combined with the need to read all the data from an array of big disks, your odds of running into a URE get uncomfortable. With RAID 5, during a rebuild if a URE is encountered then the rebuild will fail because without the data from the rest of the drives, the missing drive can't be reconstructed and in fact you've lost a tiny bit of a second drive, thus violating RAID 5's 1-drive fault tolerance.

Given an intelligent RAID controller in an array missing a single disk, RAID 6 fully mitigates this. Here's why.

If you're rebuilding a disk in RAID 6 in an array missing a disk, you can encounter a URE and keep on trucking. The reason is because whatever drive encountered the URE can be 'worked around' because the controller can calculate the data the URE missed by polling the other drives and calculating from the second parity drive.

Now then, there are two kinds of RAID controllers here; smart and dumb. Dumb controllers, upon encountering a URE, will drop a disk from an array because it encountered a failure. In the context of RAID 5 this sort-of made sense, because there's nowhere else to recover that bit of data that the URE messed up. In RAID 6 though, this doesn't make sense, because a URE on a single bit of data is not an indicator that the other 99.999% of data on the drive has anything wrong with it, so dropping the drive is ridiculous. A smart controller, upon encountering a URE, will utilize the parity bit (or bits) in the RAID array to correct the URE. On a smart controller, if you're rebuilding RAID 6 and encounter a URE it'll simply correct for the URE by calculating with parity. In order for a single drive RAID 6 array rebuild to fail, you would need to get *two* UREs (mathematically unlikely) and, assuming a smart controller, both UREs would need to affect the same bit of data being read - in other words, the second URE would need to happen *while attempting to correct for the first URE*. That scenario - a second URE when correcting for the first one - is functionally a 0% chance of happening.

RAID 6 with two fully dead drives is just as likely to die during a rebuild as RAID 5. RAID 6 with 1 dead drive, given a smart controller, should have essentially a 100% chance of a successful single drive rebuild.
 

Ready4Dis

Gawd
Joined
Nov 4, 2015
Messages
733
On a smart RAID controller, it 100% does.

The reason is this; RAID 5 rebuilds can fail because the mathematical chance of an of unrecoverable read errors (URE) starts getting pretty high when the capacity of an array goes up. All drives have a rated URE rate, and if you take the mathematical chance of any particular read event failing combined with the need to read all the data from an array of big disks, your odds of running into a URE get uncomfortable. With RAID 5, during a rebuild if a URE is encountered then the rebuild will fail because without the data from the rest of the drives, the missing drive can't be reconstructed and in fact you've lost a tiny bit of a second drive, thus violating RAID 5's 1-drive fault tolerance.

Given an intelligent RAID controller in an array missing a single disk, RAID 6 fully mitigates this. Here's why.

If you're rebuilding a disk in RAID 6 in an array missing a disk, you can encounter a URE and keep on trucking. The reason is because whatever drive encountered the URE can be 'worked around' because the controller can calculate the data the URE missed by polling the other drives and calculating from the second parity drive.

Now then, there are two kinds of RAID controllers here; smart and dumb. Dumb controllers, upon encountering a URE, will drop a disk from an array because it encountered a failure. In the context of RAID 5 this sort-of made sense, because there's nowhere else to recover that bit of data that the URE messed up. In RAID 6 though, this doesn't make sense, because a URE on a single bit of data is not an indicator that the other 99.999% of data on the drive has anything wrong with it, so dropping the drive is ridiculous. A smart controller, upon encountering a URE, will utilize the parity bit (or bits) in the RAID array to correct the URE. On a smart controller, if you're rebuilding RAID 6 and encounter a URE it'll simply correct for the URE by calculating with parity. In order for a single drive RAID 6 array rebuild to fail, you would need to get *two* UREs (mathematically unlikely) and, assuming a smart controller, both UREs would need to affect the same bit of data being read - in other words, the second URE would need to happen *while attempting to correct for the first URE*. That scenario - a second URE when correcting for the first one - is functionally a 0% chance of happening.

RAID 6 with two fully dead drives is just as likely to die during a rebuild as RAID 5. RAID 6 with 1 dead drive, given a smart controller, should have essentially a 100% chance of a successful single drive rebuild.
It highly reduces the chance of complete failure, but doesn't completely (just as raid 0 or 10 doesn't). RAID 10 is still considered safer and unless you have a lot of drives you don't lose much capacity vs raid 6, while gaining speed. With many drives, raid 6 will give you more space at the expense of speed and more likely to encounter issues while rebuilding. A 12 array raid 6, with 500gb each would need to read 5TB worth of data to rebuild itself which is slow and has a good amount of time where a second failure could (unlikely) occur. RAID 10 only needs to read 500GB to rebuild, so will rebuild in 1/10th of the reads, meaning faster rebuild and less chance of a second drive failure. Again, were talking highly unlikely but these things can come into play depending on the needs. I've seen things like raid 50 and raid 60 for similar reasons. Chances of 2 RAID 5 arrays to both lose 2 disks each is low, chance of 2 raid 6 arrays losing 3 drives each is even lower. Anyways, this really didn't seem like much of a concern for the OP so I'll leave it at this unless you wanted to discuss further. Large companies will even try to use drives from different batches just to help reduce the chances of them dying at the same time because the chance of 2 drives from the same batch with the same mechanical wear increases the chance of failure at the same time. Again, doesn't seem pertinent in this case, sorry if I side tracked this post to much.
 

RavinDJ

2[H]4U
Joined
Apr 9, 2002
Messages
4,026
I cannot thank you enough guys!! I was confused by what to get... and Dell is charging some obscene $800+ for a single 1.2TB spin disk... I mean, WTF?!?!?

After reading your posts (twice, actually), I decided to go with four 2TB Samsung 860 EVO drives in a RAID10 configuration. This allows me to leave the original three 900GB 10k drives in place. That way, I can backup and restore the data to make sure I don't screw anything up.

I'll keep you posted on the results and I'll post the BEFORE and AFTER results from ATTO :D
 
Joined
Dec 1, 2004
Messages
966
Sounds great! You'll like the performance for sure! And still significantly under your original maximum budget lol. Dell's prices for storage are *ridiculous* full stop.
 
Top