20TB Data Archival a year on the cheap .... help

rdnkjdi

n00b
Joined
Dec 18, 2012
Messages
3
I'm the IT guy for a fledging IT company that scans images for government. We generate 15TB-20TB a year in stuff we want to keep (high quality JPG scans). In the beginning, we archived on DVD's. About three years ago, we moved to high capacity hard drives. Making 2 copies of 2TB hard drives before deleting them off the server. We are running into hard drive failure. We've been using the WD green drives. We have to pull these old hard drives to do another process on the drive. The first time around, we may microfilm the images. Four years later, need to print books with them. We just ran into a scare where two hard drives failed with the same data on them. We were able to get away with it because the images were somewhere else. But I'm trying to figure out if there is a better solution for our problem.

Ideas?

Only thing I can come up with is something like MSA600.
 
As an eBay Associate, HardForum may earn from qualifying purchases.
Microfilm seems appropriate for your application

* Creates an unalterable copy - good for government records
* Can be indexed and loaded into a database for tracking and referencing
* low cost for long term storage (Guessing these need to be kept for at least 5-7 years?)
* properly stored microfilm has storage life of several hundred years
* microfilm can be transferred to another scanned image type in future if needed
 
1. What kind of budget do you have?
2. Does the data need to be live at all times?
3. What kind of performance do you need?
 
Chrisroman - We burn a lot of microfilm. It isn't practical because we need to store original digital information of color or greyscale scans.

spankit - No, it doesn't always have to be online. Right now we keep our hard drives unplugged and use a Rosewill adapter to access the green drives.

I tried unraid across a bunch of green hard drives two or three years ago, and it was so slow, I couldn't hardly get a tb copied. As long as it's fast enough to copy data to and from, it's fine.

Budget ... right now not much. Long term solution, I'd say $5,000.
 
You could build a ZFS based server for fairly cheap that would suit your needs. When the pool gets full. Pull the drives and put them in storage. Add another set of drives, create a new pool, rinse, repeat. This won't really cover having backups of the data though. As was mentioned in this forum countless times, RAID is not a backup.
 
You said Data Archival. Go to tape and don't look back. LTO6 is 2.5TB native capacity.
 
2 copies of 20TB costs about $2000/year (plus labor) on hard drives.

You only want to spend $5000 total (or is it per year). If it is total, you are out of money in 2 years. If per year, make 5 copies a year.

Hard drive failure should be a couple percent. 5 copies brings the probable loss of data so far down that no one should be worried.

---

$5000 should be a petty cash expense in an operation like yours.
 
I would go tape, or if your Internet connection is very fast (10mbps on the upstream at least) , look into Amazon S3 or Amazon Glacier
 
First question you should be asking yourself (as someone who gets to deal with HIPPA etc) is what are the penalties for not having the images (fines, repercussions against your company if copies are lost, etc). If you lose a batch of images, how utterly insignificant will $5k be? In the medical field if patient data is lost due to failed backup process or hardware failure.... Yeah, $5k is nothing as far as consequences. So I guess since your customer is the government then $5k might be nothing.

So depending on budget since it seems pretty low/not large, I would look at ZFS for the onsite data combined with some method of off site backup (Amazon Glacier, tapes that go to Iron Mountain, etc). It sounds like your data is very static once its created, so an offsite tape backup would work pretty well if so. For the server, create VDEVs and fill them up, then keep adding some as your data grows.
 
First off, you are being put into a really hard position as you don't have a lot of money to really work with. All solutions on this will be fairly patch worked together.

Do they actually have offsite storage for these tapes/drives?

Anywho, drives can fail, especially if they have come out of the same production line (like your dual failure). Long term tape is nice or something like a glacier service that Philmatic mentioned.
 
JPEG isn't the right format to store these documents. TIFF would be better.
 
@OP

I agree on using tape for archival purposes; seriously, you can't go wrong. An online/off-site backup scheme would be good additional also, such as Amazon S3 or Glacier with some decent storage-level or file-level encryption.

I also agree that you shouldn't be using lossy compression on your images. If TIFF is too big, you could either use a batch process to compress the TIFF files with BZip2, or use a lossless compressed image format such as PNG.
 
Yep, go lto5 tape, one drive, 1500 to 2k depending, and tapes are cheap at $50 a tape.

Worst case, 27 tapes per year. at $1300 a year, and if you want to make double the copies, $2600 a year.

This 5k has to be a per year budget atleast, as he has to buy new disks every year too.
Spending about 2k a year on disks, without any redundancy, so tape will be cheaper than disks, but not by much.
 
Why are you even sweating this? Government is throwing lots of money around for data archiving and e-files are now essentially "the law".

Raise your rates!
 
Ask HP and oracle and IBM and dell. You will get some free expertise in the answers. I'm worried for you right now.
 
Tapes are still the best for cheap long term archival. Here's a cost estimate for 20 TB a year:

LTO-5
First year: $2,200-3,000 ($2,000 for the tape drive, $1,200 if used, 2 copies of 14 tapes at $35 for 1.5 TB each)
Next years: $1,000 for the tapes, set aside an extra $670 for a new tape drive every 3 years.

LTO-6
First year: $5,300-6,300 ($3,500-4,500 for a Tandberg or HP tape drive, 2 copies of 8 tapes at $110 for 2.5 TB each)
Next years: $1,800 for the tapes, set aside an extra $1,200 for a new tape drive every 3 years.

LTO-6 is pretty recent so quite expensive and no used tape drive is available. It's just an approximate estimate, you probably can find cheaper prices.
For JPG files (which are not high quality by the way), you probably shouldn't use compression, so you won't get more from the tapes than their native capacity.

Store the extra copy of each tape at a different geographical location, you might even set up a second tape drive there as a backup.
You'll need to set up an archive database too so you know which tape contains what you want to retrieve.
Better have three copies of this database, the original, a local backup, a remote or cloud backup.
 
Thanks for the advice guys - going to look into the tape solution.

FWIW - these are not like medical records. Storing the images (the original scans) is more an issue of making subsequent jobs easier for us to do and edging competitors out. Losing these images internally will just create more work for us - not put us in a legally awkward position.
 
^^^
tape backup is the best solution for you as I Know

Had been worked with a small company that do geospatial technology in my past.
Many static files in TIFF, Compressed and CAD files are archived in the tape for a backup.
some tapes are stored for months/years, and some tape are put in backup rotations
^^ told by sys-admin during my work in that small company.
 
Back
Top