Backblaze: Hard Drive Stats for Q2 2018

Megalith

24-bit/48kHz
Staff member
Joined
Aug 20, 2006
Messages
13,000
Backblaze has returned with new quarterly and lifetime statistics for its 100,254 data center drives. Enterprise and consumer drives are again compared to deduce whether the former is worth the price premium (they do show a lower annualized failure rate), while 14 TB Toshiba drives have been introduced for later analysis.

The combined AFR for all of the larger drives (8-, 10- and 12 TB) is only 1.02%. Many of these drives were deployed in the last year, so there is some volatility in the data, but we would expect this overall rate to decrease slightly over the next couple of years. The overall failure rate for all hard drives in service is 1.80%. This is the lowest we have ever achieved, besting the previous low of 1.84% from Q1 2018.
 
If the gap is about 0.1% AFR with a price gap of 33-50% on any given drive? That's a damned good reason to stick with consumer with a good warranty period.
 
That's much better than years ago, where Seagate was #1 in drive failure. HGST may not be the king of reliability any more.
 
Glad to see those 10TB Seagate Enterprise drives holding up. I have 12 of those in my NAS server.

Smoothie sailing thus far.
 
Interesting to see how HGST's reliability is evolving toward that of WD now that the acquisition has simmered a bit.
 
That's much better than years ago, where Seagate was #1 in drive failure. HGST may not be the king of reliability any more.
I started moving to Toshiba as soon as news hit about the WD acquisition. I don't know what I'll do if Toshiba starts flaking out on us. Thanks to past experience (looking at you pile of failed Barracuda ES.2 1tb drives somewhere in my basement) I won't buy another Seagate drive willingly, and I don't put WD drives much ahead of Seagate.
 
Not surprised there's not much of a gap between the enterprise and consumer drives. It doesn't make economical sense to produce two separate drives and I think drive manufacturers know they can use a good/better/best model that simply extends the warranty or relies strictly on placebo/marketing.

It's common in electronics. Sometimes there's binning involved (e.g. certain tests failed, but the part still "works," so it falls to a lower tier product) other times there's firmware limitations artificially added, but a lot of times, it's just marketing.
 
Not surprised there's not much of a gap between the enterprise and consumer drives. It doesn't make economical sense to produce two separate drives and I think drive manufacturers know they can use a good/better/best model that simply extends the warranty or relies strictly on placebo/marketing.

It's common in electronics. Sometimes there's binning involved (e.g. certain tests failed, but the part still "works," so it falls to a lower tier product) other times there's firmware limitations artificially added, but a lot of times, it's just marketing.

A lot of the time it is also just features like SAS instead of SATA. While you should't have to pay as much as they charge, it is a reason why they are more expensive. If you have a big array, you want 12g SAS drives with dual ports so a redundant controller can take over and you aren't wasting bandwidth to STP.

For us that's really what determines enterprise vs consumer. Regular computer with SATA controllers? Consumer SATA drive. NetApp with dual controllers? Enterprise SAS drives.
 
Seagate? Never, ever again. A company that allowed a fundamentally unreliable design to be sold for years, and years, and years.

This wasn't a mistake with a single product or series, it lasted for far too long to every trust that company again.
 
Seagate? Never, ever again. A company that allowed a fundamentally unreliable design to be sold for years, and years, and years.

This wasn't a mistake with a single product or series, it lasted for far too long to every trust that company again.

What design was that?
 
No wonder WD is shutting down a Factory, demand for their not so reliable 5400 drives at higher prices than HGST highly reliable 7200 drives can't be high.

Given that Seagate is mostly crap and HGST drives are mostly gone my only hope is Toshiba.
 
What design was that?

Everything from 1TB to 3TB from 2012-2016

blog-fail-drives-manufacture-2015-june.jpg
 
Did you even look at the data?

Yes, I did. Open your eyes. You obviously didn't.

HGST 15k drives, only 10 failures.

Seagate 27k drives, 134 failures.

Even if HGST had 30k drives and 20 failures, it is still way way below what Seagate has for failures.

upload_2018-7-25_18-4-40.png
 
The biggest cost factor between enterprise and commercial drives isn't the drive, it's the warranty. Hard drive manufacturers are not under obligation to replace a commercial drive with an identical commercial drive. They can opt to repair it, replace it with a refurb, replace it with a newer drive, or replace it with something dissimilar but of equal value.

Until very recently, enterprise drives needed to be replaced by identical or near identical drives. Drive manufacturers (or their OEM partners) who provided warranties for major server installations needed to keep a large stock of drives for years after the EOL of a particular model. I remember that Dell had to replace about 20 IBM drives in one of our hubs because the failure rate finally wiped out Dell's spares for those drives.
 
It seems many don't know how to look at that graph need to learn. Seagate is BAD. Even if you bump the drive count on all those to what Seagate had, it is still 10x lower than Seagates count. LOL.

Another thread with seagate fans that don't have a clue.

https://hardforum.com/threads/what-...ems-these-days.1941065/page-3#post-1043745317


I'm neither a fan nor a hater of any drive manufacturer. I used WD last time around (12x 4TB reds, three of which died in 4 years). I was pretty happy with them. I'm just looking at the annualized percent failure rates. Seagate is at about 1% or lower with all of the models in this chart. 1% is - from as far as I can tell - about where the industry average falls.

HGST fares a bit better on most of their drives, but they also have one that is pretty miserably bad at 4.68%. The sample size on that one is pretty low though, but even so that is surprisingly high.

Seagate had TERRIBLE problems in the 1-3TB era, but I look at these charts and they tell me they have mostly recovered. (The 4TB's on that list are still a bit high)

Both WD and HGST have drives on the list that do both better and worse than they do.

I was willing to take a chance on them this time around. In the last couple of years of data they have been steadily improving. And I have both redundancy and backups so I am not concerned in the slightest. If I wind up being wrong, it will just be a slight inconvenience as I RMA and replace them and resilver the pool.

Looking at the ST1000NM0086's on that list, it looks like I made a good call. Over 1200 drives and not a single failure. That's is pretty impressive actually. My 12x ST10000NM0016 are essentially the same drive but with encryption enabled. I'm keeping my fingers crossed.

Let's not forget. HGST used to be IBM's storage division responsible for the Deskstar clicks of death. (Aka Deathstar) Now they are considered the best. Just because you have a bad experience with a brand once doesn't mean they suck forever.

I knew I was taking a chance going with Seagate this time around, and that it could go either way, but thus far so am a very happy camper for having done so. These 10TB helium drives are awesome.

It's good to see that Seagate is now a viable alternative again. Competition can only help the consumer.

I'm actually really thrilled by these numbers. The again, maybe the key to happiness truly is low expectations.
 
Last edited:
I'm neither a fan nor a hater of any drive manufacturer. I'm just looking at the percent failure rates. Seagate is at about 1% or lower with all of the models in this chart. 1% is - from as far as I can tell - about where the industry average falls.

HGST fares a bit better on most of their drives, but they also have one that is pretty miserably bad at 4.68%. The sample size on that one is pretty low though, but even so that is surprisingly high.

Seagate had TERRIBLE problems in the 1-3TB era, but I look at these charts and they tell me they have mostly recovered. (The 4TB's on that list are still a bit high)

Both WD and HGST have drives on the list that do both better and worse than they do.

I was willing to take a chance on them this time around. In the last couple of years of data they have been steadily improving. And I have both redundancy and backups so I am not concerned in the slightest. If I wind up being wrong, it will just be a slight inconvenience as I RMA and replace them and resilver the pool.

Looking at the ST1000NM0086's on that list, it looks like I made a good call. Over 1200 drives and not a single failure. That's is pretty impressive actually. My 12x ST10000NM0016 are essentially the same drive but with encryption enabled. I'm keeping my fingers crossed.

Let's not forget. HGST used to be IBM's storage division responsible for the Deskstar clicks of death. (Aka Deathstar) Now they are considered the best. Just because you have a bad experience with a brand once doesn't mean they suck forever.

I knew I was taking a chance going with Seagate this time around, and that it could go either way, but thus far so am a very happy camper for having done so. These 10TB helium drives are awesome.

It's good to see that Seagate is now a viable alternative again. Competition can only help the consumer.

I'm actually really thrilled by these numbers. The again, maybe the key to happiness truly is low expectations.

You aren't reading it correctly. You can't look at the %. The higher the count of drives, the lower and less change the % is going to be vs the others with less drives and more % change. You need to look at the count of drives vs how many failed.
 
You aren't reading it correctly. You can't look at the %. The higher the count of drives, the lower and less change the % is going to be vs the others with less drives and more % change. You need to look at the count of drives vs how many failed.


Nope. The annualized percent failure rate is the ONLY measure you can rely on in that chart, as it takes into account how long g each drive was in use. Some drives get added towards the beginning of the period, some towards the end, the percentage takes this into account by using drive days, and converting this into years.

By just using fixed counts you are oversimplifying the data.
 
Nope. The annualized percent failure rate is the ONLY measure you can rely on in that chart, as it takes into account how long g each drive was in use. Some drives get added towards the beginning of the period, some towards the end, the percentage takes this into account by using drive days, and converting this into years.

By just using fixed counts you are oversimplifying the data.

You have no clue what you are talking about.
 
I like how big sets of data make our personal experience look flawed.

I've had crazy good luck with all Seagate drives I've ever come across, save for three. One of them was a 1.5TB drive, which I saw listed on the graph posted above. Explains a lot.

The other two were a pair of 2TB drives bought from the same vendor, came on the same package. I supposed the mail man dropped it or something, because both went terribly bad. One of the same kind I bought shortly afterwards is still going to this day.

On the other hand, mostly all WDs I come across are DOGS. They are either failing - sometimes in odd ways, like a enterprise-class 6-drive array in which 3 were saying goodbye, and one of them was really loud about it. The drives were short of being three year old. Hell, even the Seagates that the mailman dropped surpassed the 3 year mark! - or too god damn SLOW. I am doing some SSD upgrades for a client and I have tons of 250GB WD drives that turn any machine into a turtle.

HGST/Toshiba I did not have much experience until recently, but the HGST I got from a fellow forum member for my notebook is pretty fast.
 
I like how big sets of data make our personal experience look flawed.

Yep, that's how things work, and why personal experiences don't mean squat. All products fail on occasion, and if you are the unfortunate recipient of one of them, and you only buught one, that's a 100% failure rate for you, on your tiny sample set, when the true population failure rate.
 
You have no clue what you are talking about.

Just to replicate the calcualtions they are using to get the annualized failure rates (maybe then you'll understand.)

To use the first line as the example. They had 4,773 drives. Not all of them where there the entire period, or were used the entire time, so, instead of using the raw drive count, they use the total amount of time the drives were in use. This is found by totaling up the number of hours each drive was in use. (presumably they are pulling this from SMART data)

Their number is 441707. Thats 441,707 drive days. Divide by 365, and we have 1210.16 drive years.

Since we want the annualized failure rate, we take the 5 failures they had, and divide by 1210.16, to get an average of 0.41% failures per drive per year.

Make sense, yes? This normalizes the data. Otherwise you are comparing apples to oranges, as you don't now how much each drive was running. It doesn't make sense to compare a drive with only 1 day of use installed right before the end of the period, to a drive that was there for the full 365 days. Balancing by usage is something that is done in reliability studies all the time.

It isn't a perfect measure, mind you, as it completely disregards the reliability bathtub curve, but it is a hell of a lot better than just looking at raw drive count and raw failure count and completely disregarding how much each drive was used.

Make sense now?
 
Last edited:


I actually mostly agree with that.

There was a period there where just about every company out there was going to do "six sigma", whether it made sense for their business or not. (Chances are unless you are in some form of medium to high volume manufacturing, it doesn't)

Just like how the Long Island Iced Tea company renamed themselves something with blockchain, it was really a buzzword exercise aimed at getting an easy stock price bump due to demand from unsophisticated investors.

Problem was, they didn't have a clue what they were doing. They had corporate six sigma events, appointed six sigma VP's, and assigned "six sigma champions" and held six sigma trading programs and meetings where they were like "look at us, six sigmaing all over the place in here", but they just didn't have a clue what they were doing and ultimately since they were only paying it lip service anyway, it fell by the wayside.

Some took to heart the DMAIC lifecycle (Define, Measure, Control, Improve) and that's all they did, which is cool I guess, but the real benefit from six sigma comes from the application of statistical methods in the manufacturing process, to prove out that a process is capable of doing what you are asking if it, so it doesn't just produce junk.

The thing is, any engineer worth their salt should be doing this already both in design and manufacturing. It's basic fundamental evidence based design and manufacturing. Without applying statistics you don't know shit.

You tested it? That's great and all, but it's not enough. Tell me how many samples you used. How did you determine that sample size? What statistical methods did you use to analyze the data in order to prove that your design or process is adequate? Did you apply statistical methods to your measurements system? How do you know that your gage, tools or fixture is actually measuring what you think it is measuring? The good old "I'm an engineer and and I know from experience" bullshit doesn't cut it anymore. It never should have. Prove it to me, and you better do so using inferential statistics!

But sadly most out there still don't apply statistics to "engineering" work.

I'd argue that without statistics what you are doing is barely engineering at all. It's opinion.

I took my Six Sigma black belt because it was offered to me through work, but IMHO there is another qualification that is in the same vein, that is probably better, and that is the ASQ's Certified Quality Engineer or CQE program. This stuff shouldn't only be for people with "quality" in their title, but for all engineers. It is that important.
 
Last edited:
Back
Top