TB data archiving

Joined
Mar 15, 2002
Messages
782
Suppose you have a project coming up that is dealing with several TB of data. Some portions of the data may need to be retained for up to 75+ years. These are video archives and there are 20 cameras that each have the 40GB hard drives in them. When the drive is full it will be dumped onto a server that has several TB of disk space. These are really estimated quesses about how long it will take for one of the 40GB camera disks to fill up, but for the sake of this post we will say 2 months to fill up the 40GB drives.

So each year all 20 cameras will produce roughly almost 4.8TB of video (estimated).

This would be a Windows environment and the upfront cost isn't a limiting factor however long term costs are.

Given the above info what would be an optimal solution for data archiving.

Some ideas being thrown around:

Using 1TB USB drives to archive data for long term storage.

Having two servers, one onsite and the other offisite and have them replicate.

Using tape for this instance may not work due to the amount of data in question.

Any ideas on this very rough outline? It is important to me that I get as much input from other organizations that deal with large amounts of data too. We have never had this much data to deal with before so I'm reaching out to several options to see what is in use out there.

EDIT: I just spoke with one of the persons in chagne and it seems that only certain videos will be reatined for 75+ years. All other videos will only be retained for 1 yr.
 
Last edited:
Fist off your not thinking far enough ahead,

Your saying 4.8 TB of data a year? but only planning for 8 TB.

I would configure two SANs at remote facilities (your need a large pipe to support them) you can either replicate the SANs or use a backup archiving program of your choice.

Bottom line, your SANs need to be easly expandable beyone 8 TB. You don't want to go in next year and say "hey we are out of space" cause that will not go over well.
 
You are right k1pp3r. My fault I didn't update my notes before I posted. This was a prevous number before we actually did some in the field testing with the cameras.

The 8TB of space was the most disk space I could get with a Dell PE at the time. I have edited the post to reflect that. Thanks for the input.
 
I see some sort of replication needed with tape archiving of the replicated data. I'm not sure what you mean when you say that tape might not be an acceptable solution.
 
For off-site retention I'd go with a solution that used a dedicated provider to move tapes around for you and transport them to an off-site storage facility. Recurring costs would be fixed and it's better than tagging this on as an extra sysadmin duty. My previous and current sites both used managed backup/tape services.

75+ year retention is a little unreasonable. At some point the medium on which the data is stored will be unreadable because you can't find any operating drives left (feel free to ask NASA about those Apollo master recordings). That's not to mention the medium itself will decay at some point.

If you need data integrity I'd go with WORM drives onto something like the latest generation LTO tapes
 
75+ year retention is a little unreasonable.

Not really, i have a customer that we do some data storage for that implemented a 99 year retention period. You can't even delete a single item out of your email due to retention. I helped them design part of their exchange 07 environment they are workin on.

Its a pretty bad ass setup.

OP, i would replicate two SANs at remote facilites with a tape vault for archiving data on a montly/yearly schedule. That way you have data in two locations, plus on tape if you have a major meltdown.

Do go with storage build into Dell PE servers, you can't expand them at the rate you need. A SAN will cost you upfront, but its made for storage.

Exit:

I wanted to add that instead of dumping the cameras when they are full you should create a dump schedule and download the video logs according to that schedule, not when the camera is full.

If you wait until the cameras are full to dump them i bet you a billion dollars they all fill up at once, then you get backlogged with your replication and backups. Which isn't good
 
Last edited:
Not really, i have a customer that we do some data storage for that implemented a 99 year retention period

I was referring to the fact that anyone can say "we want to save data for a million years" but the logistics of migrating the data from one obsolete hardware platform to another as time passes must be taken into account.

"Back in the day" I worked at a place that used large magneto-optical disks for storage. Good luck finding drives that can still read them now. Even if you can still fine the right drive to read them you'll need to jump through some hoops to interface it with a modern computer. Not a lot of 8-bit SCSI left out there, etc, etc...
 
I was referring to the fact that anyone can say "we want to save data for a million years" but the logistics of migrating the data from one obsolete hardware platform to another as time passes must be taken into account.

"Back in the day" I worked at a place that used large magneto-optical disks for storage. Good luck finding drives that can still read them now. Even if you can still fine the right drive to read them you'll need to jump through some hoops to interface it with a modern computer. Not a lot of 8-bit SCSI left out there, etc, etc...

You raise some very good points.

So basically if data is needed to be retained for 75+ years we will have to everntually dump all the data onto newer media as time progresses. Just don't' wait so long that the media the data is stored on is already degraded.
 
If you want 20, 30, 40+ year retention you will need to work into your plan a migration every 10-15 years.

Decide on what hardware you want now, then you will need to re-evaluate your storage medium every decade at least. Imagine that back in the year 2000 Zip drives were the hot thing. Remember those? Imagine how we will look back at our technology now in another decade.

Plan to migrate your entire infrastructure every 10 years (minimum) to make sure that 75+ years is obtainable. Just like another poster mentioned, ask NASA what happens when you store something and then forget about it for 40 years. It's so obsolete it costs more to try and recover it than it would to migrate it to newer technology every few decades.
 
One thing I haven't seen answered yet is this. How often will the retained data be accessed? Will you be accessing the data on a regular basis, or is this data going to be archived and collect dust 99% of the time?
 
Interested in seeing any suggestions for offsite storage/synch of large amounts of data. I have a client now with a NAS where the server backups are about 3 TB..they're image backups from Paragon Server Backup, which break down the nightly full backups of each server. Each backup job makes a bunch of 1gb files....so a smaller server may have a nightly backup which will make 6-8 of these spanned 1gb files..and a larger servers will have up to say, 20 of these 1gb spanned files which constitute the backup image. I'd like to, once a week, copy these files offsite. Right now I manually copy these files to external WD Passport drives, to carry off site...encrypted via TrueCrypt.
 
I was referring to the fact that anyone can say "we want to save data for a million years" but the logistics of migrating the data from one obsolete hardware platform to another as time passes must be taken into account.

"Back in the day" I worked at a place that used large magneto-optical disks for storage. Good luck finding drives that can still read them now. Even if you can still fine the right drive to read them you'll need to jump through some hoops to interface it with a modern computer. Not a lot of 8-bit SCSI left out there, etc, etc...

Seems like me if a company needs such a high retention policy, they should buy a few extra machhines capable of reading the media and put those in a vault or something...
 
Seems like me if a company needs such a high retention policy, they should buy a few extra machhines capable of reading the media and put those in a vault or something...

The machines would also degrade over time.

The bottom line is that archiving on a single medium for ~75 years is unreasonable, and should be avoided.

There was a thread regarding this that I posted in a while back--I'll see if I can find the results from that thread. I haven't thought about this in some time, so the "best" solution isn't immediately coming to me.

2.png


EDIT: If you have access to GenMay then you can check out this thread. If not, I'm sure someone else can post a synopsis--kinda' swamped now. :)
 
Last edited:
The machines would also degrade over time.

The bottom line is that archiving on a single medium for ~75 years is unreasonable, and should be avoided.

There was a thread regarding this that I posted in a while back--I'll see if I can find the results from that thread. I haven't thought about this in some time, so the "best" solution isn't immediately coming to me.

2.png


EDIT: If you have access to GenMay then you can check out this thread. If not, I'm sure someone else can post a synopsis--kinda' swamped now. :)


Yes they would, however it would make is easier to move the data to another medium if you have a working machine that can access it. If they do it every 10-15 years, i think the machines might last at least that long if they are server grade.
The only issue might be interfacing a new medium, but theres always ethernet and FTP, ETC. I don't think Ethernet is on its way out yet.
 
last year we deployed EMC Centera for the back end storage for e-mail archiving (symantec enterprise vault) We currently have 30TB of storage available and are replicating it to a 2nd data center on the east coast.
 
After reading all the suggestions and comments, I've decided to go the following route:

--1 Dell PE R510 server with 8TB of disk space (4.8 TB annualy estimated but this will give us plenty of breathing room just in case some folks fill up the 40GB sooner than others).

--1 Dell PV 124T LTO3 for archiving.

--CA Arcserve Backup software (we use this in all our locations currently)

We will have a monthly schedule for the data to be dumped onto the server. If anyone fills up the 40GB drive before the scheduled dump day comes along they will simply have to perform the data dump early and also on the scheduled day.

After all the data has been dumped onto the server a backup will be performed on schedule and the tapes taken offsite to our other location we use for tape storage.

Any video that is considered to be worthy of the 75+ year retention time will be archived onto a yet TBD media. Any video on the server that has reached 1yr in age will be removed. The media to archive the data for 75+ years will probably be tape.

The data will not theoretically be needed to be accessed at all unless we are told to retrieve a video once maybe every three or four months.

Comments?
 
A good start, however i feel its a mistake skipping out on storage that is rapidly expandable like a SAN. Your just limiting yourself to what the server will hold, which according to you is two years worth of data.

Thats not planning far enough ahead
 
A good start, however i feel its a mistake skipping out on storage that is rapidly expandable like a SAN. Your just limiting yourself to what the server will hold, which according to you is two years worth of data.

Thats not planning far enough ahead

Please keep in mind that after 1yr any data that is not considered to be pertinent (this decision will not be made by the IT staff) will be removed from the server after 1yr.

I know that my original post didn't convey that very well, however that is the reality of the project.

I wish I could go with a SAN solution though, howver selling a SAN at this location hasn't been an easy sell.
 
If this is data that is non regulated (ie HIPPA, PCI) which it sounds like it is, I would suggest looking at cloud storage, such as Amazon S3. Infinitely scalable, pay for only what you need. Keep an additional copy on site, but if you are looking into the realm of SANs that replicate you are looking at huge amounts of money. Depending on your performance needs, I would put high-use on local storage, then have a fat pipe to cloud storage for all of your archive.

I also think that your calculations will not hold up over a 20 year time frame; as technology continues to develop, there will be more data per the same length of content. You should budget that each year forward you will need more storage space than the year before due to technology improvements such as shooting in 1080 vs 720.
 
Besides media changes over 75 years, data format should also be considered, especially if it is at all proprietary to the system. Be sure to include this as part of the migration planning, or someone 60 years from now will be cursing you and this 3ivx/whatever thing that they can't get to play.
 
Back
Top