best 1TB 7200rpm SATA desktop drives for RAID5/6?

xXaNaXx

Gawd
Joined
May 15, 2003
Messages
935
ok, so i'm almost ready to purchase some hard drives for a large (5TB+) RAID5 or RAID6 multimedia storage array, and i want to know what desktop drives are best for running in RAID? from what i hear, the newest Western Digital drives have updated firmwares that will no longer allow you to use the TLER utility to enable their use in a RAID array, and i really don't want to buy the RE3 versions of their drives for twice the price of the standard desktop drives. it's absolutely fucking LUDICROUS to have to spend twice as much for basically the same drive, just with TLER enabled, and i'm gonna stop buying WD drives altogether because of them doing that.....they won't get another penny from me as long as they keep up this bullshit.

although, if anyone knows a way around the TLER issue with their latest drives, i'd love to hear it, and may consider buying WD drives if there is a sure-fire way to do TLER on them for RAID.

anyone have any suggestions/experience with other desktop drives in a RAID environment? i'd prefer to shop at newegg, so whatever they have available there will take precedence, unless there is some other drive they don't carry that will be far better and would be worth going elsewhere to buy it. i'd prefer to stay at or below ~$100/drive, but a few bucks +/- is ok.
 

sub.mesa

2[H]4U
Joined
Feb 16, 2010
Messages
2,508
You only really need TLER if on Windows or if it concerns mission-critical production servers. If neither do not apply, you likely do not need nor want TLER.

So unless you're open to suggestions beyond Windows, your alternative might be Samsung CCTL.
 

Dangman

Ninja Editor SuperMod
Joined
Dec 15, 2005
Messages
46,062
AFAIK, Hitachi and Samsung desktop drives work pretty well in RAID arrays.
 

xXaNaXx

Gawd
Joined
May 15, 2003
Messages
935
sub.mesa:

where did you come up with that? TLER is completely OS-independent, it doesn't care what operating system is installed....and it's because TLER is enabled/disabled within the hard drive's own firmware, it's not a driver or something that's installed in the host OS.

ERC (Error Recovery Control) is the Seagate equivalent to WD's TLER, and CCTL (Command Completion Time Limit) is used by Samsung & Hitachi.

when a hard drive encounters a bad sector, it either will or will not attempt to correct the problem on its own indefinitely, based on whether TLER is enabled or not. with TLER disabled, it attempts to correct the problem on its own for an indefinite amount of time (thus, never allowing the RAID card to correct it via parity calculations). with TLER enabled, it only tries to correct on its own for (by default) 7 seconds before stopping and allowing the RAID card to correct it via parity calculations.

so if TLER is disabled and you have the drives connected to a RAID controller, during the time the drive is attempting to correct itself, the drive encountering a problem becomes unresponsive to the OS (regardless of what OS you're using). depending on the quality/configuration of the RAID card, this can cause it to drop out of the RAID array as a "failed" drive. if it drops out of the array, depending on the size of the array, it can take a VERY long time to restore the array once the problem has been corrected and the drive becomes responsive again. however, if TLER is enabled, the hard drive will give up trying to correct the problem on its own after (by default) 7 seconds, allowing the RAID controller to correct the problem on its own via parity calculations. this helps to keep the drive from dropping out of the array, and keeps you from having to restore the RAID array. this is good because if you have to go as far as the array restoration process, the array is either performing at a vastly reduced speed (if you have a good RAID controller), or it may not be accessible at all during the restore (if your controller sucks).

i plan on buying a high-quality RAID card (most likely Areca or 3Ware), and intend to run the server on linux. given the high quality of these RAID controllers, the TLER being disabled may or may not cause the drive with sector problems to drop out of the array...but i'd MUCH rather be 100% sure that TLER is turned on to reduce that chance to almost nothing. i just don't want to pay TWICE the amount for the same storage capacity in order to do it. :rolleyes:

i can understand why WD offers their RE drives, and i fully support them doing so in cases where they are installed in a mission-critical application, and when the user needs professional support to minimize downtime....but it's unforgivable to me that they would completely disable any possibility of enabling TLER on standard desktop drives for the general public who are not installing them in mission-critical applications, just so they can then turn around and charge them 2x the price of their desktop drives. i would be perfectly content to not have WD support for RAID'ed desktop drives, as long as they still continue to allow me to turn TLER on and use it at my own risk.

but whatever, i've pretty much decided on Hitachi 1TB desktop drives instead.....from the reviews i've seen on Newegg, a lot of other people ran into that same problem with the WD drives, which was corrected when they sent the WD drives back and got the Hitachi ones instead....and the Hitachi 1TB drives are only $84.99, vs. $99.99 for the WD Black drives....so i'll be getting drives that work in RAID, and will be spending $15 less per drive to do it. of course, the Hitachi ones are 3yr warranty vs. 5yr for the WD, but i'm not too worried about that....the desktop drives are cheap enough (and will only get cheaper as time goes on) that if it fails after 3 years, i won't have too much trouble replacing it. not only that, but the vast majority of hard drive failures occur well within that 3yr limit, anyway, so they would be covered, regardless. i've only had maybe 1 or 2 drives fail between the 3-year and 5-year time period, and they were small, cheap drives anyway.

and if anyone at WD is reading this: SCREW YOU!! for taking away our choices.
 
Last edited:

sub.mesa

2[H]4U
Joined
Feb 16, 2010
Messages
2,508
TLER is completely OS-independent
TLER is, but the reaction to an I/O timeout varies per Operating System. If it concerns Hardware RAID; it depends on how the RAID firmware is designed to cope with drive failure and timeouts. If it concerns Onboard RAID, it depends on how the Windows-only drivers and Windows kernel react to long timeouts.

Judging from the many array splits from Windows users, this problem is pretty common, and i've had this happen to myself as well with Areca HW RAID5 + Windows.

The problem is, how do you distinguish a failed/disconnected drive from a drive that is performing repairs? In the latter case, for the RAID controller, it is a non-functional device that does not respond until it repaired its damage. After 10-30 seconds the RAID controller may think "hmm appears this disk failed" and disconnect the drive, degrade the array and update metadata on the remaining disks that it is running degraded; preventing a simple reboot from fixing this.

So you could say that I/O timeouts are not handled very gracefully. On Linux and FreeBSD however, if you're on software RAID, the OS kernel drivers will determine how to react to a timeout. In FreeBSD this is 90 seconds for SATA devices via AHCI driver. This is about the maximum time a recovery will take, before the drive returns an I/O error.

So if using disks on Linux/BSD and you encounter a bad sector, all that happens is the system may freeze for 60 seconds (mostly less like 23secs) and then continue like before - a "TIMEOUT - RETRYING" message may appear in the kernel logs, but there should be no I/O failures; i.e. no application ever got any notice something was wrong.

So that's why you do not need TLER on Linux/BSD - unless you are an enterprise user and this production box cannot afford freezes of up to a minute; then you would absolutely need TLER regardless of OS. Personally i think you better go SSDs in this scenario; but besides the point.
're on FreeBSD or Linux.

So if you're on Windows and using onboard RAID or Hardware RAID; you basically cannot have any long I/O timeouts or disks will be dropped from the array. So by capping the recovery time of HDDs to never spend more than 5 seconds on recovery, it will produce I/O errors instead of a timeout. After an I/O error, the engine can decide what to do:
- passthrough the I/O error to the application (fail)
- disconnect the failing disk and continue degraded (fail)
- use redundant data to reconstruct the bad sector, pass data to application (good)
- use redundant data to reconstruct the bad sector; then write to that bad sector (excellent)

The real question is: how does onboard RAID driver X handle these cases, or how does Hardware RAID controller Y work. Well that i do not know, but i do know i had a broken array with Areca before.

No matter how many bad disks i had (5 or 6 i think) - i never had a broken or split array - though i have some experience with drive failures, timeouts, bad sectors and failed drives altogether. As long as you use software RAID; you should be fine when on Linux/BSD. The OS will handle timeouts just by waiting and be patient; and you would never have a broken or split array because of a bad sector.

So to conclude: TLER is not absolutely required on Linux/BSD; home users can live fine without it. TLER is pretty much mandatory on Hardware RAID and Windows-based Onboard RAIDs; without it you may experience broken or split arrays.

If you would allow me to criticize TLER even more, i believe TLER effectively highers the number of uncorrectable errors (BER) on your disks. If 90 second recovery time gets you a BER or Bit-Error Rate of 10^-14, then a 5 second recovery time can increase this to 10^-12 for example; you'll recover less sectors that are weak and thus more unrecoverable bit errors.

Even worse; TLER drives can cause dataloss in some cases. On non-redundant arrays like bare disks and JBOD/RAID0s because of more bit-errors that won't be recovered; TLER makes no sense when using them on drives not in a redundant configuration.

Also, if you have a RAID5 or RAID6, and one disk fails; you lose all or part of the protection parity gives you. Because you used TLER, any unrecoverable error that might have been recoverable would translate into headaches. For example; the RAID6 becomes double-degraded or the RAID5 becomes FAILED. The chance of encountering BER on 2TB drives in a redundant configuration is pretty high; so if you are rebuilding BER can nail your array. Disabling TLER in this case might make sense; as you rather wait longer but have the damage fixed; then to risk failing your redundant array. To be clear: i'm talking about a 100% failed disk PLUS 1 or more bad sectors (BER) on the remaining drives.

So TLER can even do some harm in some cases. Though, if you have a hardware RAID controller that can fix BER quickly by writing data to the bad sector, then TLER makes sense as it would make this process much quicker and basically bad sectors don't need to be recovered since you have enough redundancy anyway. Still this increases the BER; you might amplify the dangerous elements of BER to a point where they fail your array; i.e. exceed the redundancy level. Having BER on multiple drives affecting the same stripe (block) is highly unlikely however; but i can only imagine what a Hardware RAID or windows driver would do in this case.

when a hard drive encounters a bad sector, it either will or will not attempt to correct the problem on its own indefinitely, based on whether TLER is enabled or not. with TLER disabled, it attempts to correct the problem on its own for an indefinite amount of time (thus, never allowing the RAID card to correct it via parity calculations).
You make it sound like any normal drive encounters a bad sector it will be bricked due to it never doing anything else anymore. That is of course not true and normal recovery is 60-90 seconds. The problem is that all/most windows RAIDs will react very poorly to those long timeouts; it can't really determine the difference between a failed disk and bad sector on a single HDD; so it kicks out the disk and that's where you get a broken array.

So i'm not as convinced TLER is so useful or necessary. I think it's more like necessary evil; since the ATA standard does not allow disks performing recovery to let the controller know it just takes a bit longer; then the controller can opt whether or not to write data to the bad sector; that would fix it instant and have < 2 secs 'recovery' time. If it doesn't fix it (for example non-redundant array or no intelligent controller/driver) then it just takes longer. But never should you get broken arrays that is soooo annoying!
 

xXaNaXx

Gawd
Joined
May 15, 2003
Messages
935
sorry, i should have been more clear in the reason i was looking to make sure i have TLER

....TLER is pretty much mandatory on Hardware RAID and Windows-based Onboard RAIDs; without it you may experience broken or split arrays.....
this is exactly the reason i'm trying to find less-expensive desktop disks where TLER (or its equivalent) can be (or is already) enabled, because i will be using a true hardware-based RAID solution (likely 12 or 16 ports), rather than software-based. this is also why (for me) the TLER is going to pretty much be OS-independent....i guess i should have made that more clear from the start.....i'm speaking from the viewpoint of what i'm personally trying to accomplish.

even though i'll be using a high-end (i.e., expensive) hardware RAID card, it's a single, one-time purchase up front, so i don't mind spending more for quality hardware in that case. however on the hard drives, there will be purchases of multiple pieces of hardware, which will likely be growing more as needed, so even $20 - $30 more per drive can quickly add up to a lot of extra money spent.

on newegg, the 1TB WD RE3 drives are $160, vs. only $80 for the WD Blue drives, $85 for the Hitachi drives, and $100 for the WD Black drives. so assuming i only purchase 5 drives to start off with, that's a savings of $400, $375 & $300, respectively. that's a LOT of money for extra features that i don't even need (the few other features other than TLER that are on the RE3 drives).

say for instance i buy the Areca ARC-1261ML 16-port RAID card for the current price of $800...once i buy enough drives to fill up the card's entire capacity, using (for instance) the $100 WD Black drives listed above, i will have saved enough money over the cost of the RE3 drives to buy a SECOND Areca 16-port card and a spare hard drive for hot-swapping in the event of a drive failure, plus a little left over for cables or drive bays/cages, etc. and the savings only goes up if i were to buy the other less-expensive desktop drives. if i were to get the $80 WD Blue drives, i would save $1280 over the cost of the same number of RE3 drives. that's a LOT of money for a home budget.

i don't care about vibration issues or any of the other crap that they include on the RE3 drives, all i care about is whether or not the drives i buy are going to continually drop out of the array because those assholes at WD are disallowing the ability to turn TLER on with their desktop-class drives.

and honestly, i could even care less about the warranty on the drives....if they would just continue allowing TLER to be enabled on their desktop-class drives, but put a warning/disclaimer stating that doing so is 100% unsupported, and maybe even void the warranty on the drive, i'd be perfectly happy. and honestly, you'd think they would WANT to do this...after all, the desktop drives supposedly have less longevity in that situation than the RE3 drives do....so if they allowed people to enabled TLER while voiding the warranty, and not having to spend time supporting those drives, that's a cash cow right there....when a drive fails, the user has to purchase an entirely new drive, vs. doing an RMA on it. how could that possibly be a bad thing for them? as long as they make it BLATANTLY clear to anyone doing this that the warranty would be void and that there will be no support offered for it, it's all gravy.

Also, if you have a RAID5 or RAID6, and one disk fails; you lose all or part of the protection parity gives you. Because you used TLER, any unrecoverable error that might have been recoverable would translate into headaches. For example; the RAID6 becomes double-degraded or the RAID5 becomes FAILED. The chance of encountering BER on 2TB drives in a redundant configuration is pretty high; so if you are rebuilding BER can nail your array. Disabling TLER in this case might make sense; as you rather wait longer but have the damage fixed; then to risk failing your redundant array. To be clear: i'm talking about a 100% failed disk PLUS 1 or more bad sectors (BER) on the remaining drives.
yeah, i will be sticking with 1TB drives anyway....so that will lessen this problem, as well as being a LOT less expensive than the 2TB drives. as for losing the data, i'm not overly concerned with that, i will have backups on CD/DVD for pretty much everything that's on this array, anyway....so while it may be a pain in the ass to have to copy it all back over, i can do it if need be. the RAID5/6 array will be more for the convenience of recovering stuff a little quicker, as long as there's only one problem at a time.


You make it sound like any normal drive encounters a bad sector it will be bricked due to it never doing anything else anymore. That is of course not true and normal recovery is 60-90 seconds. The problem is that all/most windows RAIDs will react very poorly to those long timeouts; it can't really determine the difference between a failed disk and bad sector on a single HDD; so it kicks out the disk and that's where you get a broken array.
yeah, that's pretty much what i was trying to say, i just may not have worded it correctly or whatever...i know it's more fault tolerant than a tiny little issue bringing down the entire thing....but the fact remains that more people have problems with drives dropping out of an array when using a hardware RAID controller when TLER is disabled than when it is enabled, and i want to do all i can to avoid that as much as possible, and as CHEAPLY as possible.
 
Top