Western Digital Time-Limited Error Recovery (TLER) with RAID and SE, SE16, GP Models

Other than buying an enterprise drive...

What about Hitachi or others?

If we can't get around the error correction timeout then what hardware/software is recommended that accomodates this?
 
You want an enterprise solution, pay an enterprise price.

we don't have enterprise budgets for enterprise solutions to use at home, so we want home versions of the enterprise solutions at home prices.
 
The home versions are the RE2/3/4 drives. Enterprise drives would be SAS and a lot more.
 
You'll notice that we've been talking about the 1.5TB drives.

At least at my local store they don't come in that flavour.

1TB Green is $82
1.5TB Green is $108
2TB Green is $245

1TB RE3 is $189 (+$107 or 130% more expensive)
2TB RE4 is $357 (+$112 or 45% more expensive)

Clearly what we're talking about is wanting to get a raid up with prices similar to the green drives, especially since 1.5TB is the best bang-for-your-buck right now.
 
oh, I know. I use 5 of the 1.5TB drives in a RAID 5 setup. Sucks that you can't use them anymore.
 
Well that's why I'm asking. Some hardware allows you to set the timeout value for the command response to as high as 120 seconds, as does some software. Most don't. So I'm wondering what people have found that works with the drives as we have them then.
 
I use the LSI SAS3442E-R hooked to a 16 bay SAS enclosure via 1 external cable. I set the timeout in the LSI card to 120s and use linux software raid. I got the card for $60 shipped on ebay! i've expanded the raid6 array 2x so far and have 2 more drives coming in.
 
WD deliberately changed TLER support on cheaper models. Now we are forced to buy enterprise class drives (RE2/3/4, yes danman...) for our reliable home raid storage,
 
WD deliberately changed TLER support on cheaper models. Now we are forced to buy enterprise class drives (RE2/3/4, yes danman...) for our reliable home raid storage,

Well I wouldn't say WD are 'Enterprise Class' HD's. They are just way cheaper than real enterprise HD's.
 
Hi.
I have 4 WD15EADS. 3 of them are manufactured in August and TLER capable but 4th is manufactured in October and no longer TLER enabling is possible. So my question is can I use both TLER enabled and this one disabled in my RAID configuration? I'm thinking of RAID 10 or alternatively maybe RAID 5 (3+1 hot spare disk [this newer hard as hot spare disk]).
Do you think it's wise to combine such disks?
Thanks.
 
Enterprise HD's are a scam.

There is no such thing as % duty cycle impacting the reliability of any hard drive with appropriate cooling. There is no way a mechanical engineer could justify the design of one HD vs another to perform better in a high duty cycle environment (except that one, perhaps, may last a longer total number of hours.)

Perhaps enterprise HD's are manufactured with better tolerances, perhaps not.

Not being able to use (for example) WD15EADS drives in RAID defeats the original purpose: using a redundant array of inexpensive disks to increase performance and reliability.

Up yours, too, WD.
 
The ability to adjust TLER on cheap drives was the only (yet major) reason for me to prefer WD drives. I guess I'll go back to getting Samsungs.
 
It's too late for me to change from WD to another manufacturer. I'm now "stuck" with 3 TLER enabled and 1 TLER disabled drive. Now I wonder can I relatively safely combine both of enabled and disabled in my RAID setup...
 
I just bought 4 WD15EADS drives, and 3 of them support TLER (I guess I got quite lucky), I'm using them in RAID5 on a 3ware 9650 controller. before I enable TLER, I had a drive dropped out of the array due to a timeout. I noticed those drive tend to stop and restart from time to time. Since I enabled TLER on the 3 drives, I didn't have any problems (well only 2 days, but using the array 24/24 for testing). I think that even 1 drive get dropped out of the array (due to timeout, not error) while another is already dropped, the risk of data loss is not too big, since, the array will function correctly again when the drive comes back.

I will keep testing for a while, anyway, have a backup of all the data on the array for safety.

I also noticed those drives change speed, I saw them running at 3600, 4500 and 5400rpm, but it seems not to cause any harm to the array (beside performance I guess). I couldn't find any documentation about that variable speed, why and when it happens and how to prevent it, anyone know more about that?
 
How did you see at what speed the drives are spinning?
You sure you weren't just noticing them spinning up or down?
 
Some drives support spinning at lower speeds rather than spinning all of the way down, and some RAID controllers let them do so for power savings.
 
The WD GP drives and the adaptec 5-series cards that I know of, and apparently the newer 3-ware cards as well. Most drives other than the green drives are on/off I beleive.
 
The WD GP drives and the adaptec 5-series cards that I know of, and apparently the newer 3-ware cards as well. Most drives other than the green drives are on/off I beleive.

For some reason I was thinking you were saying the drive varies its speed.
 
Some drives support spinning at lower speeds rather than spinning all of the way down, and some RAID controllers let them do so for power savings.
Um, no. Maybe if you try teaching the drive some Scientology. Could work, you never know. Then again, it might start mimicking Tom Cruise, jumping on the couch and screaming...
 
For some reason I was thinking you were saying the drive varies its speed.

Here's a nice quote from Adaptec's website:
With Intelligent Power Management, users can minimize power consumption by alternating between 3 modes:
1) Normal operation - full power, full RPM (revolutions per minute)
2) Standby - low power mode spins disks at lower RPM
3) Power-off - disks not spinning

Not all drives support the lowered RPM mode and only have the on/off mode.

I have this working with my array spinning down after an hour with RE3 drives so the array spins down (all the way off, these drives are the one/off only type) while I'm sleeping/at work, then when I access the drives again it takes a few seconds to spin them all up again but it works great with no errors or dropped drives in the last year of doing this.
 
I bought a bunch of WD20EADS - WD's Caviar Green 2TB drives at 5400rpm, like a week ago.
I have Adaptec 51645 and 52445 raid controllers, updated with latest firmware (Ver. 5.2.0 Build 17544). I will just refer to one of them only, because both are behaving very similar.

After I created Raid-5 array with 4 drives, one of them dropped in about 1hr. What I did, I replaced the dropped drive with another one, and rebuilt the array. After 2 replacements, I got a stable build, that doesn't drop the drives anymore.
I also built a Raid-6 array with 8 drives, on my other Adaptec, and again, I had one dropped drive, and I had to replace the drive with one of my other drives (unused) and eventually I got the array in a stable state.
However, I have pretty bad performance, my Raid-6 with 8 drives has 4-5MB/sec write speed, and my Raid-5 with 4 drives has 10-15MB/sec write speed. I was expecting better, something like 50MB/sec or so..

Now my questions are:
- will enabling TLER improve my write performance? My guess is not, but it will prevent drives from dropping out of the Raid array, as far as I understood..
- how can I improve my write performance? Flashing the drives to latest firmwares? Or this is controller's fault?

If anyone can give an advice, it'd be greatly appreciated.

thanks,
 
You have the cache set to write back enable, correct? You should be seeing speeds of 300MB/s on the R5 array and 5-600MB/s on the R6 array.
 
Supposedly there is THIS for seagate... But I can't find anyone that can confirm that it works! :(

That tool is useless - changes to TLER/ERC values do NOT survive a power cycle. Maybe in the future this gets sorted by the author, but for now, I don't know what the point is.

Example if I change my July-2009 built WD20EADS with the WD tool and enable TLER to 7 seconds, it obviously survives a power cycle. If I try to do the same with HDAT2, it appears to save the change but does not survive the cycle- value gets reset.
 
I bought a bunch of WD20EADS - WD's Caviar Green 2TB drives at 5400rpm, like a week ago.
I have Adaptec 51645 and 52445 raid controllers, updated with latest firmware (Ver. 5.2.0 Build 17544). I will just refer to one of them only, because both are behaving very similar.

Now my questions are:
- will enabling TLER improve my write performance? My guess is not, but it will prevent drives from dropping out of the Raid array, as far as I understood..
- how can I improve my write performance? Flashing the drives to latest firmwares? Or this is controller's fault?

The first thing you should be worrying about before all else is whether your drives can have their TLER enabled, otherwise the arrays are just a ticking timebomb. As for your performance it sounds like there's some other problems- a 52445 running 4 or 8 drives @ RAID6 should be giving you read/write in the hundreds of MB/s -- like 350+ MEGABYTES per second. As someone already mentioned, make sure write cache is enabled- the default may be "write cache only when protected by battery" but you don't want that.

What are the build dates of your drives? What's the third character in the model number after the dash? (i.e. WD20EADS-00SXXX) If your drive was built before September 30, 2009, and/or the eleventh character of the model number is an "S" rather than a "P", "R" or anything else, then TLER should enable. Lastly, you aren't going to be flashing any drives to any newer firmware unless you work at Western Digital. It's pretty much impossible to get firmware files out of them.
 
The first thing you should be worrying about before all else is whether your drives can have their TLER enabled, otherwise the arrays are just a ticking timebomb. As for your performance it sounds like there's some other problems- a 52445 running 4 or 8 drives @ RAID6 should be giving you read/write in the hundreds of MB/s -- like 350+ MEGABYTES per second. As someone already mentioned, make sure write cache is enabled- the default may be "write cache only when protected by battery" but you don't want that.

What are the build dates of your drives? What's the third character in the model number after the dash? (i.e. WD20EADS-00SXXX) If your drive was built before September 30, 2009, and/or the eleventh character of the model number is an "S" rather than a "P", "R" or anything else, then TLER should enable. Lastly, you aren't going to be flashing any drives to any newer firmware unless you work at Western Digital. It's pretty much impossible to get firmware files out of them.

First of all, thanks a lot for your feedback, guys!
I had write cache disabled on both controllers (51645 and 52445) and as soon as I did that, write speed improved to about 100-120MB/sec for my raid5 (with 4 drives) and 50-60MB/sec for my raid6 (with 8 drives) - that is real life performance, not benchmarks. I attached the benchmark results as well.
This is the kind of speed I was expecting for write, to be honest, because I've read that write speed is somehow limited to the speed of a single drive somehow...

Read speed for the Raid5 array (4 drives) - sitting on Adaptec 51645:
Code:
Skeleton linux # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   17182 MB in  2.00 seconds = 8599.49 MB/sec
 Timing buffered disk reads:  624 MB in  3.01 seconds = 207.59 MB/sec

Skeleton linux # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   19272 MB in  2.00 seconds = 9646.85 MB/sec
 Timing buffered disk reads:  660 MB in  3.01 seconds = 219.30 MB/sec

Skeleton linux # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   18928 MB in  2.00 seconds = 9474.68 MB/sec
 Timing buffered disk reads:  684 MB in  3.02 seconds = 226.69 MB/sec

Skeleton linux # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   18474 MB in  2.00 seconds = 9247.24 MB/sec
 Timing buffered disk reads:  646 MB in  3.01 seconds = 214.58 MB/sec

Read speed for my raid6 array (8 drives) - located on another machine, with Adaptec 52445.
Code:
Marcus ~ # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   12426 MB in  2.00 seconds = 6217.79 MB/sec
 Timing buffered disk reads:  632 MB in  3.01 seconds = 210.12 MB/sec

Marcus ~ # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   12544 MB in  2.00 seconds = 6276.77 MB/sec
 Timing buffered disk reads:  628 MB in  3.00 seconds = 209.09 MB/sec

Marcus ~ # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   12664 MB in  2.00 seconds = 6336.68 MB/sec
 Timing buffered disk reads:  630 MB in  3.00 seconds = 209.78 MB/sec

Marcus ~ # hdparm -tT /dev/sdc
/dev/sdc:
 Timing cached reads:   13322 MB in  2.00 seconds = 6667.06 MB/sec
 Timing buffered disk reads:  606 MB in  3.00 seconds = 201.69 MB/sec

The write test for Raid6 (8 drives) array:
Code:
Marcus ~ # time sh -c "dd if=/dev/zero of=/mnt/local/Storage/zerofile bs=8k count=1000000 && sync"
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 84.0251 s, 97.5 MB/s
real    1m24.429s
user    0m0.148s
sys     0m9.417s

Marcus ~ # time sh -c "dd if=/dev/zero of=/mnt/local/Storage/zerofile bs=8k count=1000000 && sync"
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 99.7235 s, 82.1 MB/s
real    1m41.415s
user    0m0.156s
sys     0m10.717s

Marcus ~ # time sh -c "dd if=/dev/zero of=/mnt/local/Storage/zerofile bs=8k count=1000000 && sync"
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 95.512 s, 85.8 MB/s
real    1m37.352s
user    0m0.104s
sys     0m10.593s

The write test for Raid5 (4 drives) array:
Code:
Skeleton linux # time sh -c "dd if=/dev/zero of=/mnt/local/TEMP/zerofile bs=8k count=100000 && sync"
100000+0 records in
100000+0 records out
819200000 bytes (819 MB) copied, 6.01289 s, 136 MB/s
real    0m6.532s
user    0m0.008s
sys     0m0.992s

Skeleton linux # time sh -c "dd if=/dev/zero of=/mnt/local/TEMP/zerofile bs=8k count=100000 && sync"
100000+0 records in
100000+0 records out
819200000 bytes (819 MB) copied, 5.94442 s, 138 MB/s
real    0m6.428s
user    0m0.004s
sys     0m0.992s

Skeleton linux # time sh -c "dd if=/dev/zero of=/mnt/local/TEMP/zerofile bs=8k count=100000 && sync"
100000+0 records in
100000+0 records out
819200000 bytes (819 MB) copied, 6.50114 s, 126 MB/s
real    0m7.192s
user    0m0.008s
sys     0m1.020s

The read speed is very similar for both arrays, here I was expecting the 8 drives Raid6 to be almost twice as fast as the 4 drives Raid5...
For the write speed, it seems pretty acceptable in both cases, considering these are 5400rpm drives.

I also tested one Raid5 array with 6x 1.5TB Seagate 7200.11 drives (on Adaptec 52445). This array has been rock solid sice day 1 (created like 6months ago), with no dropped drives on anything else. Plus that 3 drives are different firmware than the other 3 :)
Here are the results:

Code:
Marcus ~ # time sh -c "dd if=/dev/zero of=/mnt/local/R5/zerofile bs=8k count=1000000 && sync"
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB) copied, 41.1173 s, 199 MB/s
real    0m41.645s
user    0m0.104s
sys     0m16.901s

Marcus ~ # hdparm -tT /dev/sdb
/dev/sdb:
 Timing cached reads:   14168 MB in  2.00 seconds = 7091.14 MB/sec
 Timing buffered disk reads:  624 MB in  3.00 seconds = 207.95 MB/sec


What do you think guys, are these results ok? Or should I expect better from these raid controllers?
 
The first thing you should be worrying about before all else is whether your drives can have their TLER enabled, otherwise the arrays are just a ticking timebomb. As for your performance it sounds like there's some other problems- a 52445 running 4 or 8 drives @ RAID6 should be giving you read/write in the hundreds of MB/s -- like 350+ MEGABYTES per second. As someone already mentioned, make sure write cache is enabled- the default may be "write cache only when protected by battery" but you don't want that.

What are the build dates of your drives? What's the third character in the model number after the dash? (i.e. WD20EADS-00SXXX) If your drive was built before September 30, 2009, and/or the eleventh character of the model number is an "S" rather than a "P", "R" or anything else, then TLER should enable. Lastly, you aren't going to be flashing any drives to any newer firmware unless you work at Western Digital. It's pretty much impossible to get firmware files out of them.

Write cache was disabled, I remember I put it that way myself. I didn't think it would matter that much, lol :) Now it's set to "Enable Always" (even without battery). My 51645 has battery, but 52445 hasn't. I might switch the battery to the 52445, depending on my needs..

My 4 drives in Raid5 are WD20EADS-00R6B0
The drives in my Raid6 array are mixed, 00R6B0 and 11R6B1.

From what you're saying, these drives should not be able to enable TLER. However, I was able to enable TLER (I set it to 3 seconds) on all 4 drives from my Raid5 array (00R6B0).
I haven't tried yet for any of the 8 drives in my Raid6 array, I will try when I will reboot that computer (in a few days probably) - so I don't know if 11R6B1 can enable TLER or not.
The manufacturing date on most drives is October 2009 - not all at the same date, though.

It's pretty much impossible to get firmware files out of them.
Damn, I need to get a job at WD, then :)
 
If you're testing with files that are large enough to need to write to multiple drives (depends on your stripe size and, to some extent, the file system's block size) in the array you should be seeing read/write performance much higher than that. The only time I see numbers that low are when I'm writing out files smaller than my stripe size, which is set to 1MB (my array holds mostly video files and linux/windows/game disc iso's so 99% of it is files over 1GB each). I can also copy out of .iso's on the array to folders on the array at 200+MB/s (basically, file duplication) using a 5-disc raid 5
 
If you're testing with files that are large enough to need to write to multiple drives (depends on your stripe size and, to some extent, the file system's block size) in the array you should be seeing read/write performance much higher than that. The only time I see numbers that low are when I'm writing out files smaller than my stripe size, which is set to 1MB (my array holds mostly video files and linux/windows/game disc iso's so 99% of it is files over 1GB each). I can also copy out of .iso's on the array to folders on the array at 200+MB/s (basically, file duplication) using a 5-disc raid 5

From benchmarks, my 6x 1.5TB Seagate Raid5 are performing almost double in write performance - 200MB/sec compared to about 100MB/sec the other two arrays, that are using the WD 2TB 5400rpm drives. So I guess the difference comes from the rpm speed here?
Also, I am using very low stripe size for my arrays (I think 16KB or so). Do you think this matters so much for the write performance? I will have small files as well on these arrays, but not so many, so I might consider migrating to a larger stripe, if the performance is much better (like double or so).
 
That would explain things then, with that stripe size. You're taxing it with a lot of calculations, plus the drives are writing things out in 16KB chunks so their speed will be lower (look at drive benchmarks for 16KB writes vs. 1MB or larger writes), so in this case you're being affected by spindle speed and possibly maxing out the processor.
 
Back
Top