Serious thread: Best way to store 40 petabytes

lordsegan

Gawd
Joined
Jun 16, 2004
Messages
624
I am working on a project with a goal of storing 40 petabytes in the near term. The project has already been mostly specced out, but I want to see if anyone has any innovative or radically differing ideas. I should add that I am not a technical project leader, and this project was just introduced to me, so I don't have many of the details. Just want to get a feel what what is out there.

Note: Project leader is very cost sensitive. Real-time access is not necessary for 99.99% of the data at any given time. In theory they do eventually want up to 2 PB of data on HDDs. TTL of data must be at least 100 years, with 1000 years preferred.

Please recommend solutions. I can google a lot and will be working with a technical team. However, please specify at least the following details:

1. Primary archival medium and vendor: (e.g. tape and Oracle)

2. Real-time cache medium: (e.g. HDD and Western Digital in SuperMicro bays)

3. File system: (e.g. SAM-FS, Hadoop, ZFS)

4. Security considerations: (e.g. xxxxxx)

5. Other details:
 
Last edited:
For cost-efficient long-term archival of massive amounts of data tape will be pretty difficult to beat when only limited real-time access is required.

How much space do you need for the data that will be active at one point in time?
 
No matter how careful you are, the tape may still produce no data, missing data, corrupt data, etc... when you try and perform a restore down the road. If you don't care about the need to ever actually restore that data just prove that you took the backups and sent them offsite then fine, go for it but you're gonna need a massive scale deployment to do this. The highest capacity single rack tape libraries peak out at about 5PB per rack and you'd need quite a fair deal of data movers/pushers, temp data storage, etc... in order to facilitate that kind of tape backup. If it must be tape you're looking at something similar to the StorageTek SL8500 in order to get this job done in a timely manner, it scales well to the level you're looking at (and well beyond) but there are a lot of other factors to consider with Tape. You may be able to get what you're after out of the SL3000 but I don't know if it'll be quick enough as you get that close to it's max native capacity (max ~50-51PB native)

Remember, the highest capacity LTO tapes are the LTO Ultriums which are 2.5TB per tape native and a little over 6TB compressed (assuming compression is even possible within backup windows). The highest step is the StorageTek T10000 tapes that give you 8.5TB native. Keep in mind you would need over 4800 T10000 tapes to hold 40PB and thats assuming you could fill each tape to capacity, the cost of the tapes alone would be around $775,000 just to get you started for a single backup not to mention all the extra tapes you'll need for successive backups.

The type of data makes a difference as well, some dedupes well, some doesn't, some compresses well, others not so much.

If budget is a big concern I'd seriously look into cloning yourself some backblaze type rigs with the 4TB drive configs. You'd save a fortune in the near and long run vs your other options, enough so to have a complete scale replica target at multiple DR destinations of your choice and still save money by bumping up your site to site bandwidth vs any other solution you're looking at rather than looking at the tape solution and the data can be migrated to other platforms down the road as needed and still save money. But how much of an issue can budget really be if you're looking at 100-1000 years TTL for 40PB of data. The kind of data that is worth half a damn in 100 years is pretty minimal, 1000 and now you're just going after bragging rights with your CEO buddies.

Tape is expensive
Tape in this scale is REALLY expensive
Tape has ongoing costs, it's not a one and done thing
Tape doesn't mean the data will be there when you try to restore (any backup engineer with more than a few years experience can tell you this)
Tape in this scale has painful operational costs (get an Iron Mountain quote for 100-1000 years of storage time on 40PB).

ZFS has its issues as does every massive scale filesystem you've listed. SAM-FS, HDFS, ZFS, GPFS, etc... all have ups and downs, strengths and weaknesses. There will be license costs to consider, some you can get on the cheap if you incorporate it into your plan early enough but the good ones tend to be costly to license. ZFS will be your cheapest option but has issues when adding more storage, its a very painful process to grow a ZFS filesystem in case you didn't know.

Anyhow, what you're asking is for someone else to do the work for you when it comes to scoping out a multi million dollar project that will take months if not a year to fully flesh out to ensure a complete delivery meeting the project goals and needs of the organization.

So if this is a theoretical idea of how would it be done, hope what I said helps.

If this is serious and something that is really going to be built, you're either in way over your head or you really don't trust the people you have working on this. Either way PM me if you have questions, I'll try to help as best I can. I don't normally surf this sub-section of [H] tho so I'm not sure if I'll see any response.

For what it's worth, I have 16 years IT experience currently working for a SAAS Cloud operation.
 
For cost-efficient long-term archival of massive amounts of data tape will be pretty difficult to beat when only limited real-time access is required.

How much space do you need for the data that will be active at one point in time?

Using really round numbers, the actual amount of real-time data would never be more than say 2 Petabytes (this is a long-term goal, initially it would be in the high terabyte range) and the approx budget goal is say a "few million" a year.
 
Last edited:
Mtnduey,

Thanks for that post, it was helpful just to get my head around it.

I am essentially a risk manager for my organization, working for a senior executive oversight team. I know just enough about the technical stuff to ask the right(ish) questions. A different team is going to do the actual system design and build.

We literally do want to keep the data for 1000 years. Some (not all) of it is of priceless cultural and religious importance. It's extremely unusual data, and not all of it will belong to us.

We are going to have a single DR site in the current plan. Our bandwidth is virtually unlimited. Essentially beyond any top tier data center. That is one plus we have.

I will share the Backblaze idea with the technical lead. It looks very interesting.
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
 
Last edited:
You'll have to replace both tapes and hdds on a regular basis, hdds maybe even more so depending on the usage patterns. I somehow doubt a backblaze setup can be significantly cheaper than tape (or even at the same level), even given the need for a ridiculously expensive tape library, for this amount of space. Feel free to set up an estimate to prove me wrong though, this is 100x more data than I have at home ;)
 
Wow. With the Backblaze 3.0 setup and 4TB drives, we could store 40 PB x 2 (including a DR site) for about $5 million initial hardware cost. But the power requirements are probably.... much higher than tape. I wonder how the economics of that works out.

This really isn't real time data.
 
Last edited:
You'll need several hundred backblazes and several tens of thousand HDDs (depending on level of redundancy ). I hope you'll let us know what you end up doing.

Is this for film storage? I've been remotely involved in some discussions regarding film restoration and archival...
 
You'll need several hundred backblazes and several tens of thousand HDDs (depending on level of redundancy ). I hope you'll let us know what you end up doing.

Is this for film storage? I've been remotely involved in some discussions regarding film restoration and archival...

Yes -- almost all of the data will initially be video or audio. Later we may increasingly have some other data types as well.
 
I notice the Backblaze doesn't have ECC ram called out in its config, but the mobo and CPU are both familiar to me and do have ECC capability. Is anyone else familiar with this config and why they made that choice?

Never mind -- I looked up the Supermicro mem list and the Samsung modules are all ECC
 
It looks to me like the electricity consumption will be in the area of 100 kW. (Based on a rough calculation given the number of HDDs, I may be off by quite a bit)
 
Last edited:
I'm not personally familiar with backblaze, but I can't imagine there's a requirement to power everything all the time. Why not turn on a few PB at a time (depending on what data you want to work with at any given moment). The rest can sit powered off.
 
Storing digital data for a millennium is inconceivable there is so many things to go wrong un-maintained hard drives will demagnetize and degrade recorded media will rot tape dvd/bluray the organic dye that you record to rots and the disc is un readable after 3-5 years they are saying they have 300 year disc that they are supposed to last that long. But making a disc that can survive times savage beat down is one thing in 1000 years would we even have a working bluray player left likely no the people of the future would likely have to create a device to translate the old method into the newer way of storing data.

So what would be needed is a company that can last for over 1000 years the data base of information to be saved will need to be upgraded and monitored for the full 1000 years so as new tech comes out the storage that this data is stored on will need to be transferred to new drives when each advancement occurs and the data integrity will need to be monitored.
 
Last edited:
Backblaze are storing data as cheap as possible, they have done lot of research surely. So I think their solution is one of the cheapest.

One problem that backblaze has, though, is data corruption. When you have large amounts of data, there will always be data corruption. And with 40PB you are bound to have data corruption. If you dont care about data corruption, then mimic the back blaze directly. Maybe they will be willing to sell pods directly to you? If you care about data corruption (which you should because the longer you store data, the more corruption will occur) then I would modify backblaze to use ZFS. Linux uses ZFS now, ZFS-on-linux. With ZFS you can check the data periodically for data corrupiton (called Scrub). Scrub data every week/month and you will be safe from data corruption. I would use raidz3 and scrub data every month or so, using a script.


If you need high performance, you can use Lustre + ZFS. For instance, the new IBM supercomputer Sequioa stores 55PB and has 1TB/sec bandwidth using this solution.
 
Backblaze is a cheap option for HDD based servers, but tape as a raw medium is currently cheaper than HDDs. You'll have to spec out several options to a quite detailed level to make a good decision, including how to deal with bit rot, backups, and other factors that can influence the final cost and usability.
 
I'd do a combo of backblaze or equivalent storage, and maybe a commercial tape library (which are very expensive, but can get you to this capacity in a couple of 42u racks)
 
Openstack: Swift.

We are storing 1.5PB RAW in a cluster or 10 servers.


Cost far less than Tape and very viable with Vendor support.

40PB is just fine for setting this up and they do offer consulting services.
 
The tech part is the least of your problems.
For inspiration I'd like to suggest watching this:
http://en.wikipedia.org/wiki/Into_Eternity_(film)

Absolutely. The retention beyond a few years is the sticky part, given changing formats and potential data degredation. Not the capacity requirement. That being said, what is the random need requirement of the data to be stored? Is it all needed upon quick demand, or CAN it be archived to tape for automated return to online status? Like hotcrandel said, if the large chunk of the data does not need to be ready and online, an enterprise-grade tape library and standby archive software will probably be the most economical compliment to a smaller online disk pool scenario. Along with Quantum, check out Spectralogic's T950 or T-Finity stuff or hit me up in PM.
 
@OP

Have you considered Amazon Glacier as a option? I'm not saying Amazon Web Services will be around for 1,000 years (they won't) - or even 100 years - but for long term non-real time data storage it could be a viable part of a solution.
 
Absolutely. The retention beyond a few years is the sticky part, given changing formats and potential data degredation. Not the capacity requirement. That being said, what is the random need requirement of the data to be stored? Is it all needed upon quick demand, or CAN it be archived to tape for automated return to online status? Like hotcrandel said, if the large chunk of the data does not need to be ready and online, an enterprise-grade tape library and standby archive software will probably be the most economical compliment to a smaller online disk pool scenario. Along with Quantum, check out Spectralogic's T950 or T-Finity stuff or hit me up in PM.

All but a petabyte or two can be offline.
 
@OP

Have you considered Amazon Glacier as a option? I'm not saying Amazon Web Services will be around for 1,000 years (they won't) - or even 100 years - but for long term non-real time data storage it could be a viable part of a solution.

They won't be around long enough. My organization is already orders of magnitude older.
 
@OP

Have you considered Amazon Glacier as a option? I'm not saying Amazon Web Services will be around for 1,000 years (they won't) - or even 100 years - but for long term non-real time data storage it could be a viable part of a solution.

This is going to be ridiculously slow and expensive I'd imagine.

IF my math is correct it's .01/GB/Mo. At this rate it's 400,000/mo for 40PB to cold store. That is NOT counting transfer fees or arrangements back and forth. I would image at that price you are going to get a damn great deal. Even 25% off is 1.2M/year.
 
I'd be surprised if Amazon had 40PB spare to house them at this point, not to mention future retention. Can you imagine the transfer of 40PB to their environment?

You're not with the NSA perhaps, are you? :D
 
I'd be surprised if Amazon had 40PB spare to house them at this point, not to mention future retention. Can you imagine the transfer of 40PB to their environment?

If you are planning to do this over the course of Months/Years a long term viable partnership with Amazon is expected. I can honestly see Amazon+OP's company sitting down and working out how to transfer all of this information securly and rapidly over the course of time with a GREAT deal on pricing.

Something of this magnitude catches Amazon's eye and they want to be prepared for a multimillion dollar deal as it adds right to their bottom line.
 
I would create a new department in your company just to handle this with a forward moving plan of data checking and storage medium replacement.
I wouldnt trust ANY storage medium for 10 years, let alone 100 or 1000 years!
I also wouldnt trust any company to do this, if they go bust, your data might too.
You will need a lot of redundancy with duplicated data stored in many places (geographically), all of which will have to be replaced periodically.
 
In all honesty and seriousness, if you are really hoping to store this for 1000 years you should ditch the computer idea and store it on microfilm.


Microfilm will easily last 100 years just sitting on a shelf, and with the proper temperature and humidity controls easily 500.


that's your best bet for large amounts of video and audio.
 
They won't be around long enough. My organization is already orders of magnitude older.

Orders of magnitude eh? Are you Knights Templar or Masons, or something just as old?

If you are planning to do this over the course of Months/Years a long term viable partnership with Amazon is expected. I can honestly see Amazon+OP's company sitting down and working out how to transfer all of this information securly and rapidly over the course of time with a GREAT deal on pricing.

Something of this magnitude catches Amazon's eye and they want to be prepared for a multimillion dollar deal as it adds right to their bottom line.

This is what I was thinking. To be honest NO organisation is going to be around long enough (except maybe IBM, lol) and no storage medium is going to be around long enough either.


@OP

I think the most important part of your plan is zero vendor lock-in. Any cloud-based solution will mean using the cloud provider's tools. Additionally, it will mean using someone else's data centre. In turn this may well mean that you will have to build the solution yourself (and house it, as in build your own data centre) from scratch, likely replicated in at least three different locations around the world. If your organisation is capable of making money, it may be worth looking to buy a small-scale data centre or two so that you own them 100%, then build your solution within it.
 
I'd be surprised if Amazon had 40PB spare to house them at this point, not to mention future retention. Can you imagine the transfer of 40PB to their environment?
Dont under estimate the bandwidth of a truck fully loaded with 5TB tapes. I think that should be a very fast and safe way to transport 40PB. ;)
 
Absolutely, physical media transport would be the only solution for offsite storage of that magnitude.

40PB+ with 100+ year retention is bigger than just how do we logically store the data. The problem is as much a procedural as technical. Media-types, formats and capacities change constantly. Therefore a complete rewrite of all data to the newest media type will be required periodically in order to preserve data quality as well as availability. At present the most cost-effective media for offline archival storage is Tape (be it Storagetek T10000-series or vendor-agnostic LTO6). The media life-expentacy of Tape? Good question, it depends on storage environment, media quality, backup software, etc. Smarter people than me can determine that better. Assuming the LTO6 route and regular upgrades to increase data density and capacity, and due to the LTO spec of readability for 2 pervious standards (i.e.LTO6 can READ LTO4 media), figure a complete archive refresh rate of say 5-10 years as standards change.

That being said, that will get you capacity for a single location. If you are concerned about longevity of data for 100+ years, you'll definitely want 3+ copies is geographically diverse locations over the world and a mechanism to keep all copies in sync based on data growth and such. And this is just scratching the surface of the risk-management of a cache of data of this magnitude.

...Or we're writing someone's research paper for them. :D
 
We literally do want to keep the data for 1000 years. Some (not all) of it is of priceless cultural and religious importance. It's extremely unusual data, and not all of it will belong to us.

this on its own is a ridiculous idea. in 1000 years, they'll store all 40pb on a single "hard drive", what hard drives are in the future. you only have to worry about 50 years tops, and transfer it from there.
 
Ironically, you just tried to demonstrate thats its not ridiculous lol.
 
this on its own is a ridiculous idea. in 1000 years, they'll store all 40pb on a single "hard drive", what hard drives are in the future. you only have to worry about 50 years tops, and transfer it from there.

I think you have the right general idea. I'd guess we're no more than 20 years away from multiple petabyte or exabyte hard drives, and I'd also suggest planning for a 50 year time frame with the idea of transitioning to a better, more affordable medium as technology improves.
 
Clustered System Environment with multiply Redundant Enterprise SSDs :) but for the cost you could buy a country.
 
Back
Top