How do you back up Terabytes of data?

Pocatello

DC Moderator and [H]ard DCOTM x6
Staff member
Joined
Jun 15, 2005
Messages
6,703
I'm curious how this can be done on a budget: How do you back up Terabytes of data?

I've got about 5 TB at home now, and I plan to build a RAID 6 array using an Areca controller. That is my plan so far. I realize that running RAID 5 or 6 is no substitute for backing up my data. But I have no way to back up that much data. Right now my data is spread over different hard drives connected to different computers. There is some redundancy there, but there is not much organization. That is why I want a large array to put all of my data in one place.

Any ideas?

The only one I can come up with is to build a second RAID 5 array at some point in time to mirror the first RAID 6 array. Mirror might not be the right word, but what I mean is to have a second multi-TB array and have everything copied there. This redundancy would allow me to have a backup... and would reduce the chance of failure from a motherboard or controller card failure.

I look forward to your response.



 
Tape becomes economical for enterprise style deployments, for home users, the $/gb still prefers a backup to harddrives, even multiple drives. For me, I find it economical, and a decent backup plan to have a set of cold harddrives in a drawer, offsite, that has a backup of all my data. 1.6 Tb.
 
I'm curious how this can be done on a budget: How do you back up Terabytes of data?

I've got about 5 TB at home now, and I plan to build a RAID 6 array using an Areca controller. That is my plan so far. I realize that running RAID 5 or 6 is no substitute for backing up my data. But I have no way to back up that much data. Right now my data is spread over different hard drives connected to different computers. There is some redundancy there, but there is not much organization. That is why I want a large array to put all of my data in one place.

Any ideas?

The only one I can come up with is to build a second RAID 5 array at some point in time to mirror the first RAID 6 array. Mirror might not be the right word, but what I mean is to have a second multi-TB array and have everything copied there. This redundancy would allow me to have a backup... and would reduce the chance of failure from a motherboard or controller card failure.

I look forward to your response.




Not to tangent too much - but how the hell do you have five TB of data to back up at home? DVR'n I guess, but seriously? I am a bit pissed that my 120 gig hd is down to 13 gigs... but five terrabytes? wow!
 
No kidding, 5TB is a ton to have to need to backup, but I guess doable if you record a lot of stuff.

I have just under 250gb of stuff that is important and needs to be backed up.
 
I'm guessing video and pictures. 5TB is an huge amount though.

Based on the criticality of the data stored, you can decide what you want. In this case, your data is probably not critical. Meaning that you would not lose money if it were lost, or anywhere near enough money to justify the best solution. Tape storage is one option, but you'd be cycling at least $2000 worth of tapes plus the drive itself. So probably not what you want. An external 6 drive RAID 5 array is another option. Still not ideal unless you store it offsite. The 3rd option is online storage. Currently Upline (HPs service) offers unlimited storage for less than 10$ per month. Yes, backup and restore times are horrendous, but at least it's somewhere offsite. Broadband speeds are only getting faster, and this is a very viable option with the newest Comcast 50/5 Mb of FIOS.

My 2 cents.
 
My recommendation is make a printout of all your "primary" file structures. Then rank them 1 to 5

1 - no big deal..don't care if it ever comes back
2 - no big deal, can easily recover from another form of backup (DVD's, etc)
3 - can recover...but will take alot of work (edited videos, etc, mp3 collections)
4 - cannot recover...damn I wish I would have had a backup (family photo's/videos, etc)
5 - It is lost, I'm screwed...no clue what to do now (financial data)

Then total up the space for each type. Please be honest with this. If you put DVR recordings @ anything above 2, IMO, you really need to re-evaluate what is important in life.

What I've found when I do this for my family and neighbors is it falls like this

1 - 20% (OS/programs/etc)
2 - 75% Downlaoded videos and non-legit MP3's/Videos
3 - 10%
4 - 5% (family photos)
5 - <1% (finance/tax records)
 
^That's the best advice yet.
I'm in the same pickle with my 8x1TB Raid5 array and the way I see it backing up all the HD movies I have is pointless. Although tedious that is not unrecoverable data.
I plan on using a 1TB cold drive for all the other stuff, mp3s photos, financial and business data...
I'll add another one (or a bigger one) as my need grows.
 
My recommendation is make a printout of all your "primary" file structures. Then rank them 1 to 5

1 - no big deal..don't care if it ever comes back
2 - no big deal, can easily recover from another form of backup (DVD's, etc)
3 - can recover...but will take alot of work (edited videos, etc, mp3 collections)
4 - cannot recover...damn I wish I would have had a backup (family photo's/videos, etc)
5 - It is lost, I'm screwed...no clue what to do now (financial data)

Then total up the space for each type. Please be honest with this. If you put DVR recordings @ anything above 2, IMO, you really need to re-evaluate what is important in life.

What I've found when I do this for my family and neighbors is it falls like this

1 - 20% (OS/programs/etc)
2 - 75% Downlaoded videos and non-legit MP3's/Videos
3 - 10%
4 - 5% (family photos)
5 - <1% (finance/tax records)

I like your way of thinking.

I would say that I agree with this:

2 - 75% Downloaded videos and non-legit MP3's/Videos
--- and maybe as much as 85% of my content falls here.

Except that this is not so true:

2 - no big deal, can easily recover from another form of backup (DVD's, etc)

because I have no way to recover from another form of backup. Because it was dl'd in the first place. Or other wicked ways. ;)
 
Buy a set of extra hard drives and copy everything over to them, then take said hard drives and put them in static bags, then put the static bags in ziploc bags with a silica packet or two, then put said ziploc bags into a fireproof lock box and put it somewhere safe.

This is how my old company would backup files, the firebox lived in my boss's house inside of his gun safe.

90% of the files were totally un-replaceable autocad stuff drawn over 20+ years, these drawings and documents are a more accurate then the state/county records of land ownership and infrastructure.
 
My recommendation is make a printout of all your "primary" file structures. Then rank them 1 to 5

1 - no big deal..don't care if it ever comes back
2 - no big deal, can easily recover from another form of backup (DVD's, etc)
3 - can recover...but will take alot of work (edited videos, etc, mp3 collections)
4 - cannot recover...damn I wish I would have had a backup (family photo's/videos, etc)
5 - It is lost, I'm screwed...no clue what to do now (financial data)

Then total up the space for each type. Please be honest with this. If you put DVR recordings @ anything above 2, IMO, you really need to re-evaluate what is important in life.

What I've found when I do this for my family and neighbors is it falls like this

1 - 20% (OS/programs/etc)
2 - 75% Downlaoded videos and non-legit MP3's/Videos
3 - 10%
4 - 5% (family photos)
5 - <1% (finance/tax records)

QFMFT!

Hell man, this advice almost warrants being a sticky itself, especially the "life advice" about what's important. Absolutely brilliant...
 
Our business data is under 20GB backed up. I don't think many people have that much data.

Copy to hard drive is the cheapest way for most people.

5TB is under $1000 for 10 500GB drives. Are your DVDs worth that much?
 
Except that this is not so true:

because I have no way to recover from another form of backup. Because it was dl'd in the first place. Or other wicked ways. ;)

Look at it this way downloading them all again is almost the same as ripping them all over again.
You can even look at Internet for being your backup for movies. All the movies will be re-upped at some point. If you want you can always backup locally some of the really hard to find ones from your collection.
So unless you edited them yourself by adding your touch to the encodes, you can never really loose them. Just be sure to backup a list of your films :D
If the worst happens you will have a list to sift through and discard some you don't want anymore.
Thats the route I plan on going, I see no pont at investin 2x the cash for another array just to backup movies, It is not like you need all of them on a daily basis.;)
 
+1 for the hard drives backing up solution. If you go with Areca you can just write a little script to and make 1 spot on the card your spot for swapping. So when a new drive is inserted on that position the scrip can run assign it a drive letter and then you can just use your backup software to hit it and swap it without having to fux around with it.
 
yea id say backup only what you need.if you already have redundancy in the raid array for not so important files i dont see the need for an entirely different server. can always use an old pos computer and load it up with older ide drives non raided and dump your "backup media...cough...movies" on them. the main problem with safestoring backups offsite is now feasible will it be?? you ll have at least one hard drive hooked up to something until it gets full anyways. you can redundantly backup your backups till your blue in the face, but how far are you willing to go and how much money are the files really worth.
 
Buy a set of extra hard drives and copy everything over to them, then take said hard drives and put them in static bags, then put the static bags in ziploc bags with a silica packet or two, then put said ziploc bags into a fireproof lock box and put it somewhere safe.

This is how my old company would backup files, the firebox lived in my boss's house inside of his gun safe.

90% of the files were totally un-replaceable autocad stuff drawn over 20+ years, these drawings and documents are a more accurate then the state/county records of land ownership and infrastructure.


Becareful with this method though, Hard Drives like to seize up and I have almost had more failure with hard drives sitting stored than active drives.

Make sure to run the drives every now and then and verify the data.
 
Becareful with this method though, Hard Drives like to seize up and I have almost had more failure with hard drives sitting stored than active drives.

Make sure to run the drives every now and then and verify the data.

Oh wow...just wow. So you are saying that a drive that has only a few operational hours has a much better chance of siezing that one that has thousands of hours?

I know what you are getting at and I can see how your logic got you there....but sorry dude, not gonna happen.
 
My recommendation is make a printout of all your "primary" file structures. Then rank them 1 to 5

1 - no big deal..don't care if it ever comes back
2 - no big deal, can easily recover from another form of backup (DVD's, etc)
3 - can recover...but will take alot of work (edited videos, etc, mp3 collections)
4 - cannot recover...damn I wish I would have had a backup (family photo's/videos, etc)
5 - It is lost, I'm screwed...no clue what to do now (financial data)

Then total up the space for each type. Please be honest with this. If you put DVR recordings @ anything above 2, IMO, you really need to re-evaluate what is important in life.

What I've found when I do this for my family and neighbors is it falls like this

1 - 20% (OS/programs/etc)
2 - 75% Downlaoded videos and non-legit MP3's/Videos
3 - 10%
4 - 5% (family photos)
5 - <1% (finance/tax records)

Great post!

If you are running Raid 5 or switching to 6 in the future for your 5TB of storage then it makes sense to rank the importance. The chance of loosing data on the Raid 5 or 6 is going to be very low. Ranking the data and only backing up the most important stuff probably isn't going to amount to too much space. Once you figure that size out it might be better then to pose the question. What backup now? I think if your running the Raid 5 or 6 you would only need to go to this backup in a catastrophic event. Fire, flood, etc.

You might look at the option of online backup too. Get your Raid 5 or 6 setup. Subscribe to something like carbonite.com or mozy.com and set the software to backup certain folders. Then if the contents change you won't need to be keeping track on the last time you did second backup.
 
If you have a lot of data that doesn't change much (i.e. videos, etc), then the bare drive + storage somewhere would be my choice.
Google "ESD zip lock bag" and you'll find you can get 100 anti-static resealable bags for <$10 + shipping.
Then get yourself one of these: http://www.newegg.com/Product/Product.aspx?Item=N82E16817990001 if your motherboard/controller supports hotswapping and then just add as many drives as you need. Store them at your office, and periodically test a sample to be sure they're working.
It's not as easy as a complete backup array, but it's more cost effective if you have lots of data that doesn't change frequently - just keep a decent index of what's backed up where and you're set. Also allows for you to mix and match drive capacities as prices allow.
 
I do a similar thing as Trepidati0n, I have about 3tb usable capacity of stuff on my server that has 4.25tb worth of drives.

I use Windows Home Server and have about 1tb worth of stuff duplicated (folder duplication turned on) for category 3 stuff. Things like categories 4 and 5 I backup onto an external weekly.
 
Oh wow...just wow. So you are saying that a drive that has only a few operational hours has a much better chance of siezing that one that has thousands of hours?

I know what you are getting at and I can see how your logic got you there....but sorry dude, not gonna happen.

I am speaking from experience, not the theoretical side ;)

I have drives of the same age, and usually the ones that are active or at least access every now and then outlast those used for archive type use ;)
 
To the op..

I was in a similar boat, I have around 5TB and was looking into alternatives,

Online backup - costly and will take a lifetime to upload and download
Another server - *2 the cost
Tape - costly, may as well just use hard drives
DVDs - forget it
Blu-ray - too costly, may as well just use hard drives
Hard drives stored offsite = again expenisive + difficult to backup regularly.
Don't worry about - Took me absolutely ages to re-rip DVD's and game ISO's.
Rely on the current RAID5 - pretty solid, but if there is some corruption, virus, power surge, fire I'm screwed.
Span the content over other pc's - not really an option when it comes to that amount of data, though I can do this for everything except video due to size.

What I eventually did was find a friend and told him that if he put up the cash for a server I'd make it and he could have a whole lot of content for free, he went for it, I made basically a replica of mine and it works out perfectly for me, whenever I go round there now I take a USB2 drive and copy some new stuff across.

Unless they're techy they will probably stop you mid sentence when the word "server" comes up though :)
 
The only practical option is to build another server and load it with drives. Hopefully your backup software will also do some compression so you can save a bit on space but oh my its gonna take awhile if you try and do all 5TB over ethernet. At this stage of them game I would be buying a switch that supports port bonding and put dual gig nics in both storage servers.

Of course determining the importance of your data and only backing up criticals would also be smart. If it seems overwhelming there is a free tool that can help you visualize your disk usage which is called WinDirStat

GL! :cool:
 
To the op..

I was in a similar boat, I have around 5TB and was looking into alternatives,

Online backup - costly and will take a lifetime to upload and download
Another server - *2 the cost
Tape - costly, may as well just use hard drives
DVDs - forget it
Blu-ray - too costly, may as well just use hard drives
Hard drives stored offsite = again expenisive + difficult to backup regularly.
Don't worry about - Took me absolutely ages to re-rip DVD's and game ISO's.
Rely on the current RAID5 - pretty solid, but if there is some corruption, virus, power surge, fire I'm screwed.
Span the content over other pc's - not really an option when it comes to that amount of data, though I can do this for everything except video due to size.

What I eventually did was find a friend and told him that if he put up the cash for a server I'd make it and he could have a whole lot of content for free, he went for it, I made basically a replica of mine and it works out perfectly for me, whenever I go round there now I take a USB2 drive and copy some new stuff across.

Unless they're techy they will probably stop you mid sentence when the word "server" comes up though :)



ha!

you got someone else to pony up the money for your backup server! Brilliant! :)

I'd rather not allow others access to my files, but that is just me.

Thanks for the tips in this thread.

I still don't see a way to do this cheaply. I need to delete some of my stuff! :D




 
5 TB !!! :eek:
that's a lot of pr0... ergh... i mean "important data" ;)
you beat Kuyaglen !!! :D

seriously now, at work we use tape drives, which can get pricy for a home user. Your best bet would be to buy 1TB hard disk and use them to backup your "important data" ;)
 
You should consider doing something like Windows Home Server that uses deduplication to automatically get rid of redundancy and can replicate only the important things without replicating/generating parity for the entire amount of data you have stored like a RAID group would (also no reliance on a specific RAID controller). The fix is right around the corner for the corruption bug (its included in the power pack):

http://blogs.technet.com/homeserver...ows-home-server-power-pack-1-public-beta.aspx
 
[LYL]Homer;1032593171 said:
Yeah, giving away a copy of your categorized pr0n might be awkward.
Maybe they have the same kinky taste?
Probably didnt share the porn.
Movies, TV, music (the stuff that makes 90% of my data) is probably what he meant.

Personally, I have the 100MB important stuff mirrored and a directory listing of the rest for "re-ripping" (lol), should my R5 array fail for whatever reason.

Also: Double LOL at the first replier. Get a TiB or get off the subforum. You are playing with the big boys around here. (well, bigger boys, corps are the real big boys)
 
Maybe they have the same kinky taste?
Probably didnt share the porn.
Movies, TV, music (the stuff that makes 90% of my data) is probably what he meant.

Personally, I have the 100MB important stuff mirrored and a directory listing of the rest for "re-ripping" (lol), should my R5 array fail for whatever reason.

Also: Double LOL at the first replier. Get a TiB or get off the subforum. You are playing with the big boys around here. (well, bigger boys, corps are the real big boys)

what is a TiB?
 
Me - here at my shop I picked up a used IBM 9 Tape Autoloader on eBay. It has a single LTO2 drive installed. Also buy used tapes on eBay, never once had a bad one. Total cost including a copy of Backup Exec 9 with the autoloader option and a Adaptec 29160 SCSI card was under a grand for everything. This was about a year ago and I'm now looking at upgrading the drive in the loader to LTO3.
 
I like to use TiB and MiB when I want to avoid confusion between bits and bytes. Like avoid explaining megabits per second or megabytes per second.
 
I like to use TiB and MiB when I want to avoid confusion between bits and bytes. Like avoid explaining megabits per second or megabytes per second.

http://web.njit.edu/~walsh/powers/newstd.html

proposed new name for the terabyte. i thought it was common place, tibibyte.

:D

wow. never heard of TiBs before. thanks for the info.

pr0n? not me. not even a single .jpg or .avi or .mpg or .mp4. Really. :)

I have lots of TV shows and a few movies. I have lots of Monty Python.
 
pr0n? not me. not even a single .jpg or .avi or .mpg or .mp4. Really..

rolleyes_big.gif


:p
 
Me - here at my shop I picked up a used IBM 9 Tape Autoloader on eBay. It has a single LTO2 drive installed. Also buy used tapes on eBay, never once had a bad one. Total cost including a copy of Backup Exec 9 with the autoloader option and a Adaptec 29160 SCSI card was under a grand for everything. This was about a year ago and I'm now looking at upgrading the drive in the loader to LTO3.

I recently went down a similar route - got a tape autoloader from Ebay.

I don't have quite so much data. More like 1.4 TB sitting on RAID 5 (file server), and 600 GB total across various RAID 1 arrays (OS installs, apps, databases, emails, home dirs...).

I agree that videos, isos, music etc are of less importance as far as backups are concerned (so long as you keep a file listing!)... whether the stuff is legitimate or not, it can probably be recovered by some other means, plus it's no big deal if you're without the data for a while. However I consider the OS and application installs to be more important... I put a lot of time and effort into setting up most of this stuff, and doing so again from scratch at a later date would not be a matter of sticking in the install CD and spending a day customizing some settings. It would be more like an entire year's worth of evenings down the drain. So although it wouldn't directly cost me anything, I do value my own time and sanity (since I would be stuck with a semi-functional network for quite some time).

For a while I was happy with backing up that 600 GB of data to disk on one of my other machines (using an array of 6 old IDE drives... they were the disks from a previous incarnation of the fileserver, before I ran out of space and decided to upgrade it to a new array).

But that doesn't provide off-site backups. Which is why I went for a tape drive off Ebay. Originally I only intended to get a standalone LTO-3 drive to handle just the 600 GB of data, not worrying about the stuff on the fileserver. But somehow I got an awesome deal on an LTO-4 autoloader, which gives me more than enough space to back up the 1.4 TB fileserver as well. So yeah, I've now got everything on 2 LTO-4 tapes. And the added bonus is that LTO-4 does AES-256 encryption in hardware, so securing the off-site tapes themselves is no big deal either.

Also: the idea of copying data to a 1TB hard drive and taking that off-site is good for a one-off backup (eg if the data doesn't change). But it would be more of a pain if you want to do regular backups.

So for me at least its not so much a question of how much of the data is irreplaceable, or how much money I would lose if I lost the data, but more a question of "how much do I value my own time". I now have an automated backup system. The only manual intervention it requires is to occasionally take some tapes offsite. The autoloader, SAS card, cable, tapes, labels came to about £800. And it will last for years to come considering it can take 24 tapes with 800/1600 GB each. FYI: Although tape drives do hardware compression, if you're backing up mostly AVIs the compression will be pretty much useless so you should go by the native, uncompressed tape capacity when estimating out how many tapes you would need.
 
I'm running Linux with 8 500GB drives in (software) RAID6 for 3TB array.

I use it as one large array with no partitioning but I do pay attention to keep each folder to a reasonable size to manage backups with.... like I have two movie folders rather than one huge one..

/data/array/movies.dvd
and
/data/array/movies.hdrips

This keeps dvd under 500GB and hdrips under 750GB to make it easy to use two drives to backup. A single 1.1TB folder would have been more difficult to backup. Eventually I'll have to deal with this but I'll think about that when needed.

At the moment heres backup method:

I have 4 750GB, 1 500GB, and 2 250GB drives marked 'backup1' through 'backup7' sitting in the plastic clamshells they came in.

Whenever I feel like it I take a backup drive, plug it into a SATA->USB caddy (the thing coolmaster makes), Ubuntu auto-mounts it to /data/backupXX depending on drive label, and then I run a backup script I made.

The backup script first checks which backup drives are mounted. For each backup drive there is a list of directories that will be rsynced to it.

Eg if I plug in backup01 drive the directories assigned to it are:
/data/array/music
/data/array/iso

Script will rsync these two directories to /data/backup01/ updating files as needed (based on time stamp-- doing crc checks takes way too long over usb).

Basically I just plug it in, run a script and then a few hours later I unplug drive and put it back in the plastic clam shell. I do only keep these drives on my desk so no protection verse fire but whatever. Maybe one day I'll get one of those fireproof boxes to put them in.
 
Back
Top