Need inxpensive 10 terabyte offside bi weekly backup.

southpaw

n00b
Joined
Jun 8, 2004
Messages
34
I have a small client that does video production they have 10 terabyte of video that they work with. They have their video stored on server with a a Raid 1 15 terabyte (yes 30 terabyte in drive space). I over the last year I have changed out 2 - 3 drives when there was a problem. But recently the owner called and said he wants off site backup off all data. His college friend's studio burned to ground and lost 36 months of work on a movie. He wants a tape solution but that is out of my knowledge base. I have never looked into backing up 10 - 15 terabytes to tape.

He wants at least bi weekly full backup with a 10 tapes to cycle through. He will keep them at home. :)eek: I know he thought 10tb could fit on one tape.)

The IBM Tape Drive TS2230 only does 400gb when backing up video. That's 25 tapes and will take 134 hours to do the backup if someone is changing tapes. Plus the cost of drive and tapes is in $3000 range.

Please help looking for workable real world solution. What do video studios usually do?
 
You need a proper backup program, im pretty sure backup exec will be fine for the 1 server, though the adult version is netbackup, could use ibm products TSM but its kinda hard to manage if your not used to it.

These will do one major backup, then incrementals over a set period, though a full backup is required every so often or restoring can be real bad generally weekly. IF its a windows file share you can do backups while the server is running though it will hurt performance.

Oh i recall backup exec can only stream to one drive at a time so net backup would make sense.
You need a semi decent server spec, single/dual xeon server/ 4gb of ram, netbackup server licence, netbackup client license.
A decent tape library, 2 drives id guess LTO4, library for 40 slots or more.
TS3200 is ok as its expandable if you need extra disk/drives and offsite them nightly.
What you really need to now is the ammount of data change.
 
Remember its not really the total ammount of data but the data change rate thats really important, he could just hire rack space in another data center and do IP replication constantly depending on his rate of change might work out better, but 15 TB is alot of data via IP.
What storage subsystem is he using for the data and how is it configured , multiple 2tb luns ?
 
There are very few files over that 10tb around 900 files very large video files and small source audio and video files. Most of the big project videos change every day so 4 project videos of 1.5tb each and around another 1tb of source videos change every day.

The file server is a custom built linux box using a consumer mb and a Adaptec SAS raid card and 20 Seagate 1.5tb SAS 10k drives.

The workstations are Macs and Windows attached with 10GB network cards.

The same setup I use for the MRI imaging offices. But those files are small enough to burn to a dvd-dl or for in some cases a blu-ray.

My main problem here is the files are bigger than a tape.
 
You need TSM with block level differential. That'll allow you to only backup actual changed data. Not changed files - changed data. That will shrink your backup sizes significantly right there. From there, add a small autoloader, and you're done.
 
You need TSM with block level differential. That'll allow you to only backup actual changed data. Not changed files - changed data. That will shrink your backup sizes significantly right there. From there, add a small autoloader, and you're done.
Thanks yes that will reduce the incremental backups but you still need a full backup to start and schedule full backups periodical. I am looking into tape cartridges but spanning tapes with files is not reccomended. The largest I have found is 800gb tapes.

I am thinking that there might be a way to run fiber to a nearby building and setup a colocate backup agreement with someone.
 
southpaw, where are you located? Do you have a specific set budget for this?
 
I doubt there is a cheap version in all honesty, its a specialised situation with such large files.
 
high speed network to off-site server is what most do. OC3 or something I think.
 
If the raid is split between enclosures...
Get another enclosure or two, then break the RAID 1 once in a while. Haul the mirrored enclosure away, and plug in the third enclosure. Rebuild array.

By the time the array rebuilds, it will be time to swap it out again.
 
If the raid is split between enclosures...
Get another enclosure or two, then break the RAID 1 once in a while. Haul the mirrored enclosure away, and plug in the third enclosure. Rebuild array.

By the time the array rebuilds, it will be time to swap it out again.

This is really bad advice. You never want to break an array to be a backup solution, this is quite dangerous for your data and may end up worse than some disaster hitting your facility.

A better solution would be to have two separate arrays on separate systems, and then transfer that said system. You can make a super portable SFF if you will using 2tb drives... even this idea is tedious. A real proper solution will not be cheap.
 
This is really bad advice. You never want to break an array to be a backup solution, this is quite dangerous for your data and may end up worse than some disaster hitting your facility.

Yeah, but the thought of hauling an enclosure home with him every two weeks and keeping several at home for rotation, might get the client to either adjust his expectations or loosen his wallet.
 
Thanks yes that will reduce the incremental backups but you still need a full backup to start and schedule full backups periodical. I am looking into tape cartridges but spanning tapes with files is not reccomended. The largest I have found is 800gb tapes.

I am thinking that there might be a way to run fiber to a nearby building and setup a colocate backup agreement with someone.

Wow.

This is all TOTALLY WRONG. Firstly, TSM doesn't work that way. Please don't presume how something works, because I really do know backups far better than almost everyone else here. This crap about files "not spanning tapes" is complete garbage and doesn't even remotely apply here.

With TSM doing block level, you take a full backup once per year, if that. TSM is differential forever after the initial backup, unless you specify file expirations of, say, 365 days. That's a pretty typical number, and takes you down to one full backup per year. At all other times, you would only be backing up individual changed disk blocks. That means if Joe changes 2MB of a 940GB file, you back up 2MB - the part that's changed. TSM is the only product on the market that goes into the low-mid segment, that's capable of this. (Everything else that does costs hundreds of thousands to license.)

So let's say we've got what, 30TB? That's NOT a lot of data, and it's also enough data to REQUIRE a proper backup solution. Not this ridiculous juryrigged crap that is going to get you fired. You need a tape library, and you need proper backup software. That's just how it works. The only reasonable solution from both a technical and licensing standpoint is TSM.

I'm going to run the numbers with 32TB of data, because you need to backup the important servers too. Not just the videos. Don't like it? Tough. Backups are worthless if you have nowhere to put 'em. It's only another 2TB estimated. Feel free to provide more information so I can give you better estimates.
NotBackup, that's going to chew up well over 30TB of tapes per week - presuming low compression ratio of 1.25TB per LTO4 because of the file type, that's over 30 tapes for a full backup - every week. Your backup window will need to be well over 14 hours, because you have so much data and will be passing it over GigE. Then there's the fact that you'll need to license 224 "Virtual Terabytes" at ~$5k per VT - $1.1M in licensing right there.
We won't waste time on BackupExec, because it's just going to chew up tapes like nobody's business, to no avail. BackupExec is the only place where tape spanning is fatal, too. So scratch BackupExec. (Though average LTO4 for MPEG4 video IIRC is ~1.3TB.)
TSM is going to need an initial backup of 32TB, 'cause we're just going to get everything. We'll call it 34 x LTO4 initial backup, which is just fine. Because of the way TSM works, you'll need additional disk for the disk pools, which are required. These also make backups and restores much, much faster. With proper tuning, we'll say you need about 15-20TB of cheap SATA disk. You can do it with an LSI 8840 and EXP3000's to get an IBM supported solution, as an example. What about tape though? Your initial backup is murder - but you don't size the tape library by that. You size it by your typical daily backup. Because we're doing block level differential, we'll see a slightly better compression ratio overall. An IBM TS3100 library (one of the smallest and cheapest on the market) with two LTO4's has you covered. The block level differentials also reduce your backup windows dramatically, which is even more important. This gets multiplied by the included de-dup at no charge. Licensing is by PVU, but that's it. So you need 70 PVUs per core on dual core and up Xeons, or about $1100 per 4 core server, then 20% of that per year for 24x7x365 support until the end of time. You never have to re-buy TSM, just keep paying maintenance. You can check my math at ibm.com, and price out your servers.

Here's the thing: everyone just goes "I have X data and it doesn't fit on tape."
WRONG.
You have X terabytes that TAKE Y HOURS TO BACK UP. Backup windows are a cold, harsh reality, period. If you think you're going to do full backups to ANY system nightly, you are DEAD WRONG. I don't care what you THINK you can do; I've DONE 30TB+. With anything other than TSM or much much more expensive products, you're backing up those ENTIRE files, EVERY night. You can't do that. Period. 30TB over a very fast GigE network which carries only backup traffic, with the network as the bottleneck (which it almost never is,) in your best case, you're talking over 50 hours per backup run. You can't backup files people are working on, so that means, you can't perform backups that way. There's too much data. You have to get the data backed up per day down to <12 hours, presuming people work on it 8 hours a day. (If they work two or three shifts, you need to hire a backup consultant and get out your checkbook to pay for Advanced Data Protection stuff.)

So ultimately the question comes down to; how much is your data worth to your business? If you can afford to start over from scratch on all your projects, then sure, halfassed weekly copies are fine. But if starting from scratch starts costing your company thousands of dollars the second it's determined to be corrupted? You need a real backup solution. And no, TSM may not be the absolute best out there. But from my experience with various backup systems and software, TSM is the best balance between licensing costs and necessary features for this particular scenario. (ESPECIALLY for low server count environments.)
Is there better? Might be, but I haven't worked with anything that isn't going to start your licensing costs at $150,000 plus 40-50% per year for maintenance and support that fill the requirements of this environment.

EDIT:
You asked what video studios do? TSM is what video studios do. TSM and SAS or huge FC libraries, with Heirarchical Storage Management. Because video is sequential data, it works wonderfully for going to HSM, because a tape drive can stream it fast enough for playback. The greater risk is shoe-shining on the drives; LTO4 drives can go at 100MB/s-120MB/s easily. They'll maintain a fairly large disk environment with proper arrays - not iSCSI crap, and definitely not home-built junk - stuff like IBM DS3k's, DS4k's, DS8k's for large environments or HDS AMS2x00's or USP's. But the vast majority of their completed video work, files that are done but need to be kept, sit on tapes in midsize to large libraries. Effective data management for video environments is a pretty expensive prospect any way you slice it, and you do not want to learn the hard way what cutting corners there will do to you.
A couple relevant case studies from IBM ('cause they're easiest to find for me.)
http://www-01.ibm.com/software/success/cssdb.nsf/CS/MCAG-6VQEF4?OpenDocument&Site=tivoli&cty=en_us
http://www-01.ibm.com/software/success/cssdb.nsf/CS/DNSD-6MFEAP?OpenDocument&Site=tivoli&cty=en_us
 
Last edited:
Thanks AreEss I appreciate that post. I spoke with my customer tonight and he was stunned that it would be in the thousands of dollars for a backup. He was under the impression it would be in $600 - $800 range. :eek:

He said offsite backup need to be done and will see what he can budget. This is a small video production company the entire computer setup only cost $98,000 hardware and $54,000 for software. He just contracted with iron mountain to store his source material backups.

I will discuss the backup cost vs work reproduction with the customer on Monday.

Most customers don't value a properly engineered computing environment. They want something that will work as long as it doesn't cost to much. They have a hard time understanding why a SAS 1.5tb drive is $600 but consumer sata is only $140. it is a daily battle between providing a proper solution and having customer leave and go with the local virus cleanup shop for servers.
 
Thanks AreEss I appreciate that post. I spoke with my customer tonight and he was stunned that it would be in the thousands of dollars for a backup. He was under the impression it would be in $600 - $800 range. :eek:

Damn, I swear. Just remind him that most businesses that suffer from catastrophic failures usually aren't around the following year to second guess that costly backup they should have taken care of.

"The American Red Cross, among others, estimates that as many as 40% of SMBs simply never reopen after a disaster such as a flood, tornado, or earthquake. In many of those cases, of course, insurance covers replacement of physical assets. But if companies haven't protected their digital assets, such as critical financial and customer information, they may be out of luck -- and out of business."

http://www.inc.com/articles/2004/05/datadisasters.html
 
southpaw ---

You should be able to reconfigure your computers so that you can do a full backup disk in less than overnight. The cost to do backups could cost under $1000/backup copy you wish to keep.

But the best way is to prevent a catastrophic failure.
 
southpaw ---

You should be able to reconfigure your computers so that you can do a full backup disk in less than overnight. The cost to do backups could cost under $1000/backup copy you wish to keep.

But the best way is to prevent a catastrophic failure.

This is completely wrong and false.

If we presume a maximum throughput rate of even 125MB/s, it is impossible to copy 30TB in anything under 40 hours, period.
It is technically impossible to make a copy of 30TB in less than 45 hours without failing the backup and offsite requirements, period. No solution you claim to have can or does meet these requirements, no matter how much you want to protest otherwise. This is immutable technical fact.
We won't mention the fact that your method here will cost the shop FAR more than any other solution. By several hundred thousand dollars. Keeping only a 14 day rotation, even if this could work, at $750 per day, is $11,000.00 before Iron Mountain costs (hard drives cost extra,) additional hardware costs (probably around $4,000-5,000,) and replacing 3-4 disks per month at their own expense (based on someone I know who is using hard disks properly, shipping 32 offsite daily on a 7 day rotation.)
And that statement? Could you PLEASE be more utterly ridiculous. "Prevent a catastrophic failure." Do you even understand what catastrophic failure is?! Jeezuz. Excuse us while we go ask them to relocate to somewhere that never has any weather at all, and ask the building very nicely not to catch on fire, have any water pipes burst, ever have the air conditioning break down, and the list just goes on for another fifty pages. There is a reason people state very clearly, repeatedly, that failures are a "when not if" thing. Because THE WORLD HAPPENS and there isn't jack you can do to prevent that building next door from having a natural gas leak and exploding tomorrow.

In addition to this, you're completely wrong. Completely and utterly wrong. Backing up the systems is pointless and a waste. The systems themselves are good dedup targets, because nobody gives a crap about the OS and apps. There is very little that would not be dedup'd. The important data is the work files, which are very, very large and need to be backed up daily. (And in some shops, they use advanced data protection to back every single change up, as it's made.) They also need to be taken offsite daily, which is part of why it's hard limited to an upper bound of 125MB/s. No RAID, so it's either network or single drive. Single drive won't get near 125MB/s, and network also won't get there either. So you're still talking somewhere around 72+ hours for one copy.

There is a reason there are things as "backup administrators" and consultants who specialize in backups and disaster recovery.
 
I'd add that offsite backups are meaningless in the event of a regional catastrophe such as Katrina or 9-11. What the hell are you gonna do when everywhere for 100 miles is leveled? Thus depending on how critical the data is, you might want a backup solution a few states away.
 
I'd add that offsite backups are meaningless in the event of a regional catastrophe such as Katrina or 9-11. What the hell are you gonna do when everywhere for 100 miles is leveled? Thus depending on how critical the data is, you might want a backup solution a few states away.

Actually, this is a common problem I hear about. The fact that they're using Iron Mountain is very beneficial; Iron Mountain evacuates data from facilities in advance, and will deliver it pretty much anywhere in the U.S. you like. No backup solution can be done "a few states away," period. There just is no such thing as a solution better than doing daily offsite moves.

So what do you do when everything for a hundred miles has no power? You call Iron Mountain and tell 'em to deliver it to your B site that's up and over a state, then wait 10-18 hours for them to arrive with ALL your tapes.
 
just hire ockie to do offsite backup...... :D

I do house calls, lol. not.


There is really no cheap solution for this problem. I know management and how they work and that they do not understand this, but even if you cheaped out, you will need to fork over some dough.

My suggestion would be to think about your backup time spans, since you are more worried about disasters such as fire, have you thought about the risk/cost of say having your backups done monthly?

A solution which might work for you best, let me know if you want a quote:

Keep your primary server in house
Establish a very cheap but high capacity backup server in house with storage capacity, perhaps keep this system on a separate room or floor of your facility.
Purchase a storage hosting account, such as a co-location or dedicated services (you can also opt to have this storage archived in another facility via tape library)\
Use this storage account to do monthly backups and full archiving of your backup servers data, this will not impede with performance of the primary production system and you can span out your bandwidth over a period of a month.

This method you maintain two physically separate local locations and redundancy should one room burn or whatever... and will establish a remote location for backups should in the event you experience total loss of the entire facility.

A 10TB package should not cost you much per month and you would be able to span that transfer over long periods of time.


It's not the ultimate solution, but it's a decent solution that gives you 3 layers of redundancy.



Alternatively you have Iron Mountain and you also have a fire server vault that you can purchase... but don't think that its cheap... its very expensive.
 
this thread brings a smile to my face x 100:). 2 years ago our current backup infrastructure was super terrible. around 3TB total environment (which has since doubled), that had various local tape drives using at least 5 different versions of backup exec, 1 arcserv, and a blade setup going to a backupexpress + tape library (2 arms, 25tapes). needless to say backup windows was not measured in hours... more like a week to get a single full backup done for whole environment... $200K later (and wow what a battle that was getting that money), got commvault + data domain target that has backup window reduced from 8 days to 2 hours... had a hell of a time explaining differential backups were the way to go versus full backups, AKA backup administrator hell. Props to all you backup admins :cool:
 
AreEss you are right on point on almost everything you are suggesting. There is one thing I am going to have to disagree with you though.


With anything other than TSM or much much more expensive products, you're backing up those ENTIRE files, EVERY night. You can't do that. Period. 30TB over a very fast GigE network which carries only backup traffic, with the network as the bottleneck (which it almost never is,) in your best case, you're talking over 50 hours per backup run.


I will have to disagree with you that it will take over 50 hrs to backup, I mean it is only what 32TB? I have a customer that their backups to LTO3 drives go at 2.5TB an hour. The software they are running the company I work for sells for less than what commvault's normal maintenance costs per year with the company's we have currently talked with. I am not saying in this case that the user looking for the solution could achieve that type of speed, because there is a LOT of hardware backing that speed in the form of tape drives in silos but it is possible.
 
AreEss you are right on point on almost everything you are suggesting. There is one thing I am going to have to disagree with you though.





I will have to disagree with you that it will take over 50 hrs to backup, I mean it is only what 32TB? I have a customer that their backups to LTO3 drives go at 2.5TB an hour. The software they are running the company I work for sells for less than what commvault's normal maintenance costs per year with the company's we have currently talked with. I am not saying in this case that the user looking for the solution could achieve that type of speed, because there is a LOT of hardware backing that speed in the form of tape drives in silos but it is possible.

I think what he was estimating was based on a more realistic budget. There are backup solutions that can back that type of data up in less than an hour, but you have to realize that it's high priced and carries a lot of hardware like you said.

You can easily spend up to one million on a high end backup solution. The poster is trying to do it with 800 dollars.
 
I think what he was estimating was based on a more realistic budget. There are backup solutions that can back that type of data up in less than an hour, but you have to realize that it's high priced and carries a lot of hardware like you said.

You can easily spend up to one million on a high end backup solution. The poster is trying to do it with 800 dollars.


Gotcha, and yes makes perfect sense.
 
I think what he was estimating was based on a more realistic budget. There are backup solutions that can back that type of data up in less than an hour, but you have to realize that it's high priced and carries a lot of hardware like you said.

You can easily spend up to one million on a high end backup solution. The poster is trying to do it with 800 dollars.

Dingdingding.

Actually, the point here is that the OP isn't going to be doing that kind of speed period. His absolute max throughput is 125MB/s. Whyyyy?

One, we have a single server. Two, we have an environment where cost cutting measure #1 will be to remove the second LTO4 drive. (Which is just fine. Doing it with 1 drive is not a huge deal and with TSM doesn't affect throughput anyway.) Two, he's not going to be doing this FC, he's going to be doing this Ethernet.
So there you have it. Remember, just because you have to spend a LOT more to do it, doesn't mean you have to spend a ridiculous amount more to switch the entire environment to FC. Especially when past initial backup presuming TSM, a small SAS library gets the job done just fine and at a much lower price. (~$10K with 24x7x4 3yr included versus $26-32K.)
 
datadomain appliance would be perfrect for what he needs, depends on what compression / dedupe he can get on it as it depends on data but a ddr510 or ddr530 should be more than enough, get a pair of them, one osite which you backup to, one offsite and they replicate, they only replicate bit level changes so even if you throw 10tb at it if most of the data is the same only a few gig may go over the pipe.

edit: crap, dead thread, wont let me delete it, sorry!
 
This is completely wrong and false.

If we presume a maximum throughput rate of even 125MB/s, it is impossible to copy 30TB in anything under 40 hours, period.
It is technically impossible to make a copy of 30TB in less than 45 hours without failing the backup and offsite requirements, period. No solution you claim to have can or does meet these requirements, no matter how much you want to protest otherwise. This is immutable technical fact.

...

Your presumption is wrong. Since a large number of institutions have offsite backup requirements much larger than those stated and have found ways to handle the situation in the price range I suggested, I guess that you are just wrong.

You might note that the data is 10TB not 30TB. 30TB is the RAID array. 15TB of data - only 10 used. 15TB of duplicate data. The backup speed is determined by how fast the largest file, 1.5TB, can be copied. About 4 hours and change.

I can back up the system every night in under 5 hours. Do it for $1000 per copy. And have the data stored offsite.

---

I was once watching some fellows move data across a satellite link. I pointed out that putting the data on an airplane would get it moved faster.
 
Last edited:
srsly George you are just now reading this 7 months later? :rolleyes:
 
Give the customer what he wants. But tell him the tradeoffs.

The idea that it would cost 600-800 for 30TB of data is ludicrous. You can't even buy 30TB of commodity hard drives for that. My numbers put it at 2250 with no parity. I guess that if he really wants that, set him up with an rsync server at home. At the rate it syncs, running 24 hours a day, it will be weeks between updates. And, locked files will cause issues (if it will work at all).

Something that I haven't seen mentioned here is that maybe it makes business sense to have ghetto backup. Time-value for a couple of weeks at his employees' rate may be far less than a "proper" backup.

What did the guy eventually do?
 
Last edited:
Back
Top