"RAID is not a backup" ..ok then, what is?

For home use, copying to another host (RAID) and/or off-siting is a good option.
For larger operation, LTO tape can be a good choice (but more expensive initially).
I generally keep my backups on RAID, DVDs (2 copies each) and a separate 2TB disk.
 
No one here is talking much about tape backups

I mentioned my disk based backups for home. At work I do not use disk at all for backups. I now have 100 LTO2 tapes and a 2 drive 24 slot LTO2 autochanger that I bought in 2006. I use the same software (bacula) and similar procedure when it comes to rotation however the rotation period is longer. LTO2 is beginning to get more expensive than disk but I believe it is far more reliable and far easier to expand even considering that 10 LTO2 tapes costing $25 each are more expensive and take a lot more physical space than a single 2TB hard drive. Tapes are less fragile and can be stored for years without worrying about them dieing sitting on a shelf.
 
Its also a good idea to run MD5SUMS every day/week/month/year (you pick) and make sure your data is recoverable on hard drives/RAIDS/DVD media.
 
What's your "backup" solution since RAID is not/should not be one.

My important files are my digital photographs which I keep an archived copy of on a HDD that is stored in the office of a family member the next town over. The same files are also backed up online as well using Flickr (there are automated methods to download files).

This backup solution is resistant against virus, (many) acts of god, fire, theft, power surge, accidental deletion, corruption, etc. RAID does not help against any of these.

edit: for periodic backups, I keep my flickr account up to date and I have another backup that every so often I swap with the one in the family member's office. The files are always on my computer, two archived externals, and the web.
 
I have experience with LTO-2,3,4 they are great and require no power once you have written to them, they are also more reliable, they also Read Write Verify all data that is written to them (for the LTO technology).
 
Its also a good idea to run MD5SUMS every day/week/month/year (you pick) and make sure your data is recoverable on hard drives/RAIDS/DVD media.

Thanks for the idea, I should schedule this on my tapes at work during periods of inactivity..
 
You guys did read about NASA's 40 year old tapes right?
http://pcworld.co.nz/pcworld/pcw.ns...ring-lunar-images-after-40-years-in-the-vault
http://www.computerworld.com/s/arti...years_in_the_vault?taxonomyId=12&pageNumber=2

"So far, all the tapes have proved usable. The data is read into a quad-processor Macintosh Pro workstation with 13GB of RAM and 4TB of storage. Data acquisition is done through a PCI Express card from Canadian firm AlazarTech that can read 180 million samples per second, although only 10 million are needed, Wingo says."

Tape > the way to go if you can afford it, if stored properly, it will outlive you.
 
Thanks for the idea, I should schedule this on my tapes at work during periods of inactivity..

Generally not needed for LTO tapes (read->write->verify) but always good to test the restore process regularly. Unless they are transported onsite and offsite, I have seen tape damage from miss shipment of tape media. Again, you need to treat all media, disks, tapes, etc properly.

Also test the software itself, in Debian Testing the 'dump' problem actually backed up the filesystem successfully, or so it said! There was a bug in extfsprogs that created a corrupted dump and you don't find that unless you test the restore!
 
Tape is so yesterday. But until recent years it was the most economically way to store a lot of data offsite (you need a manual tape change routine. Using a BIG robot alone isn't enough as some people think). It is still probably the most economically way to store rarely accessed data long term in enterprises, because they don't use cheap 2TB S-ATA drives for storage.

For home users tape has no usage because storage is so cheap for us. Tape is not.

I agree, tape is expensive, but it is not yesterday.

Huge firms and companies don't make a "backup" of data on 7200RPM SATA drives.

If they have a live backup, it is a RAID array consisting of SAS 15K RPM drives which are far more robust and durable than standard desktop drives.

Honestly, tape is great if, like you said, is off site and is used for long term storage.
 
I have experience with LTO-2,3,4 they are great and require no power once you have written to them, they are also more reliable, they also Read Write Verify all data that is written to them (for the LTO technology).

LTO libraries are awesome for management and the tapes can hold very large sums of data. The only downside is the price, but for budgets that need to be spent, it's a great way to continually add more long-term storage, especially when they use fibre channel.
 
I wonder how many off-site/remote date storage companies offer tape backup as opposed to HDD storage arrays.... for consumers.
 
As others have said, RAID is a system designed to solve physical drive issues. A drive can go out and you can still have live access to the data while you solve the physical problems.

A backup is a system designed to solve data protection issues. No RAID system will protect against you accidentally deleting all of your pictures. It won't protect against you overwriting some document and later realizing you needed whatever you erased.

So a good backup is a copy of your data somewhere other than your live filesystem. So thats why having additional disks and offsite backups are important. Additional disks in case there is a physical disaster that the raid cannot protect against (Multiple drive failures, building burning down).

So, "backup solution" means multiple external locations of important data..RAIDed or unRAIDed...

No one here is talking much about tape backups... which I heard was the only real backup solution.
 
I'd say thats far less than optimal. If all of a sudden you notice your data is corrupt, you have virtually 0 options to recover it. You'd be better off reed-solomon encoding your data if you are worried about that sort of issue.

Its also a good idea to run MD5SUMS every day/week/month/year (you pick) and make sure your data is recoverable on hard drives/RAIDS/DVD media.
 
According to this, the only true "backups" are discs(which can get damaged), offsite(pay subscription) and true tape drives(super expensive).
What's your "backup" solution since RAID is not/should not be one.

When you think about it, backups are just data replication, preferably offsite.
RAID is redundancy, not replication.

I
1. Keep copies on my webhost account
2. Burn DVDs and give them to my family at Christmas.
3. backup to family computers that I admin
4. Have photos printed at a cheap online photo service (few things last longer than real RA-4 photos)
5. backup online
 
RAID = performance + reliability. That's it.
BACKUP = creating a duplicate copy of your files on a physically separate medium, regardless of medium.
 
I wonder how many off-site/remote date storage companies offer tape backup as opposed to HDD storage arrays.... for consumers.

I would bet a lot of them use disk then dump the disk to tape for cheaper and more reliable storage.
 
^ yep. disk2disk2tape, pretty common in a lot of multi-level enterprise backup schemes these days. with the shift from disk2tape backup to disk2disk backup, its forced tape pricing downward. tape is still pretty viable, though as recently as a few years ago cost-per-GB was still pretty outrageous at least for the higher LTO levels, and a lot of people got fed up and moved to harddisks.
 
Last edited:
I wonder how many off-site/remote date storage companies offer tape backup as opposed to HDD storage arrays.... for consumers.

I know that a lot of MMORPG game companies backup with server redundancy and then make weekly tape backups.
 
I'd say thats far less than optimal. If all of a sudden you notice your data is corrupt, you have virtually 0 options to recover it. You'd be better off reed-solomon encoding your data if you are worried about that sort of issue.

You get this if you run regularly scrubbing (default on with Linux mdadm, has to be scheduled manually on my Adaptec RAID 5805 controller) on your RAID5/6. This is one of the "forgotten" but important things about RAID and configuration of your setup.

It does however not detect and correct if a file is changed in a valid way. For example if another person changes your data on purpose you need manual checksuming to detect that the backup has changed since it was made. Manually checksuming is very CPU intensive and will soon be a problem if there are many files or much data.
I tested this for my backup script, but both creating the checksums and checking them was way to heavy for my A64 3000+ on my 700GB of data each night (individual checksum per file). And this would only give me detection, not correction as stated. In other words, much work for no gain.
 
Last edited:
RAID is:
- convenient: zero down time due to single/multiple disk failure (depending on RAID method)
- performance: high availability and high performance compared to single disks and standalone volumes
- redundancy: a level of protection against hardware failure (assume that hardware WILL fail)

Backup is:
- having multiple copies of data (redundant)
- a way to recover your data when the existing hardware is gone (fire, theft, etc)
- not foolproof; always "stress test" your backup solution, and _restore_ solution

The amount you spend on backup, and WHAT you backup should be proportional to how much you will suffer in the event of data loss. For example:
- windows dying due to virus: minimal, windows can be reloaded etc
- photos lost due to virus: major, photos cannot simply be re-created

My own data (photos, documents, accounts, digitized paper), represents memories, hard work and lots of time spent organizing and entering information into the computer. As a result, I have a ZFS SAN that holds all of my information in a 20TB raid-z2 array ("raid6"). This snapshots hourly, and I keep a schedule of snapshots (hourly, daily, monthly). I also replicate this SAN to my mother in law's place where I have another identical ZFS based NAS, snapshotting in exactly the same way. This remote NAS also periodically dumps to tape (LTO4). Comprehensive and expensive, I know, however I would HATE to loose any of that data, or have to spend time re-creating it.

Fundamentally, backup should prevent the need to re-create data, and allow for recovery in the event of a disaster. You need to determine what this means in your particular situation. Again, always assume that your hardware (disks, controllers, boards) WILL eventually fail.
 
A single hard drive is not however. It is expected that > 1% of your hard drives will fail each year. At work (where I usually have 200 to 400 drives spinning 24/7/365) over the last 15 years we have a 1% to 7% annual failure rate on drives. And these are not just 5+ year old drives a number of failures were from drives that were less than 2 years old.


Do ya'll track these failures? Would be interesting to see that data.
 
I do amateur video work for my kids and their friends. And I'm paranoid...

My working PC has Raid for data integrity/single drive failures.
- Large data raid-6 for performance a simple data protection (8x 2TB Hitachi, raid6, Areca controller)
- System disk is 4xSSD raid 0 for performance.

System disk is ghost-imaged daily to a local, bootable copy on a 500Gb hard drive, fully automated by Norton. I completely expect the raid-0 to fail.
System disk ghost image is made weekly to separate WHS. Just in case I screw something up and need to go backwards. I usually keep the last 6-months or so plus selected older snapshots.

Data array gets "sync" backup daily to WHS server, automated at 3am. I happen to use Goodsync for this but there are litereally dozens of sync products on the marketplace.

Monthly, the same Sync program is used to copy all "new" video files on the WHS to a hard drive. I define "new" as <62 days old. Drive is taken to a storage shed away from home. I'm generating less than 500Gb/Month right now and 500Gb drives are dirt cheap.

Every 3 or 4 months, I copy what I consider "high value" files, mostly my wife's digital photo's, onto a Blu-Ray and store them at the storage unit. I use Blu-Ray because - unlike DVD - the recordable disks do not contain any organic materiel and are considered "archival" quality. Theoretical readable life is 50-100 years (vs 4-5 years for DVD+/-R).

This somewhat paranoid protocol was put together after a raid-controller failure completely wiped a 5x1.5TB raid array that I thought was completely safe. Safe from single drive failures - yes. Safe from completely whacked raid controller that decided to randomly write all over all 5 drives, not so much. Since I've been doing this it has saved my A$$ several times, usually from human error rather than equipment failures (e.g., ah crap, why did I over-write that file).

You buy a new 500GB drive every month?
 
Do ya'll track these failures? Would be interesting to see that data.

Not anything very formal. This year we had 10 to 12 failures. 1 was a WDC black 1TB of less than 6 months. I believe 4 or 5 were 750 GB and 500GB Seagate 7200.10s and Seagate 7200.11s. And the rest were WDC 250GB SATA1 drives purchased between 2005 and 2006. I had more accurate numbers earlier this year when we had a much higher failure rate. For some reason it has been quiet for over 3 months. I am not complaining. Anyways the total failures for this year were not too much more than any other year however I do admit the number of less than 3 year old failures was higher this year than in the past.
 
google have done recent large scale studies on disk reliability. The papers is available on the internet: http://static.googleusercontent.com.../labs.google.com/en//papers/disk_failures.pdf

It's a recommended read! There are graphs there that shows the failure rates at different ages. About 6% of the disks failed within their first year.
It is an interesting study, but please be careful not to project the conclusions on home users so quickly.

The tests done in a laboratory environment does not simulate a home environment. In particular, what is very wearing in home environments is the expansion and contraction of metal caused by heat variations; rather than a static temperature that always stays the same. The latter is much better for HDDs, who consist of alot of metal and heat up quite rapidly without proper cooling.

Those home users that do cool their (often 7200rpm) HDDs, often do so improperly, by only allowing part of the disks surface to be cooled; creating an imbalance where the HDD has multiple temperature ranges throughout its metal body. I believe this is particularly troublesome for mechanical 3,5" HDDs.

So while this is invaluable if you run a datacenter yourself, not all conclusions have to apply to home users as well; just keep an open mind.
 
You expect nukes? :eek: Honest to goodness nuclear warheads?

My company used to be at 7 World Trade Center which had two entire floor datacenters. We all know what happened there. Almost 400 servers were crushed. Within 1 week we had the data recovered to servers at different sites. It wasn't RAID that allowed us to recover the data to another site but offsite backups. There was one server that we couldn't recover because it was misconfigured to back up to the silos at 7WTC instead of our other datacenter.

RAID is your uninterrupted backup for a hard drive failure (or two). It doesn't protect against catastrophic failures (or simple user error such as deleting the wrong file). It doesn't have to be an extreme such as a terrorist attack but could be something as simple as a broken water line, flooding the server room.
 
It doesn't have to be an extreme such as a terrorist attack but could be something as simple as a broken water line, flooding the server room.

or the cleaning lady poking around where she doesn't belong. or a family of mice, nature's stone killers setting up shop in your case and chewing your wires - i've seen it. my friend brought over his PC after it "stopped working". i opened the case and next thing i know i'm eyeball to eyeball with their leader, and he's looking at me with a mix of fascination and disgust, like "you got a fight comin'. and it's comin today."

@SeanG: sorry to hear about your co. having people at 7WTC. crazy that next year will be 10 years.
 
Last edited:
It is an interesting study, but please be careful not to project the conclusions on home users so quickly.

The tests done in a laboratory environment does not simulate a home environment. In particular, what is very wearing in home environments is the expansion and contraction of metal caused by heat variations; rather than a static temperature that always stays the same. The latter is much better for HDDs, who consist of alot of metal and heat up quite rapidly without proper cooling.

Those home users that do cool their (often 7200rpm) HDDs, often do so improperly, by only allowing part of the disks surface to be cooled; creating an imbalance where the HDD has multiple temperature ranges throughout its metal body. I believe this is particularly troublesome for mechanical 3,5" HDDs.

So while this is invaluable if you run a datacenter yourself, not all conclusions have to apply to home users as well; just keep an open mind.

You are aware that the study used the drives in googles production systems, not a lab environment?
Google uses desktop grade harddrives because they have so much redundancy in their system, so this paper is absolutely something that you can interpolate directly to your home environment.

It is not 100% accurate for laptop/desktop users as you say, because they are regularily beeing turned off and on and exposed to more shocks and movement. For the file servers you use at home that actually store your data a 24/7 scenario makes the most sense and that's were this research is interesting.
 
Last edited:
You are aware that the study used the drives in googles production systems, not a lab environment?
Google uses desktop grade harddrives because they have so much redundancy in their system, so this paper is absolutely something that you can interpolate directly to your home environment.

It is not 100% accurate for laptop/desktop users as you say, because they are regularily beeing turned off and on and exposed to more shocks and movement. For the file servers you use at home that actually store your data a 24/7 scenario makes the most sense and that's were this research is interesting.
Google may use desktop drives, but they also climate control their DCs. Once the drive is spinning the temperature probably doesn't vary by more than 2-3C between idle and load - and it is that stable for years. The same is not true of your garage.
 
Google may use desktop drives, but they also climate control their DCs. Once the drive is spinning the temperature probably doesn't vary by more than 2-3C between idle and load - and it is that stable for years. The same is not true of your garage.

Sounds like you aren't [H]ard enough.
 
Google may use desktop drives, but they also climate control their DCs. Once the drive is spinning the temperature probably doesn't vary by more than 2-3C between idle and load - and it is that stable for years. The same is not true of your garage.

That's great for google, but for other companies and organizations, I would rather have reliable SAS 15K HDDs with RAID 5/6, off-site backup, and incremental tape backups.

7200RPM non-server drives are crap for server environments unless they are in a low-usage SAN/NAS. They just don't have good enough seek times to accommodate heavy usage, and SATA sucks for network connectivity.
 
This CMU study suggests that 15K Drives are statistically no more reliable than consumer based SATA.

www.cs.cmu.edu/~bianca/fast07.pdf

Interestingly, we observe little difference in replace-
ment rates between SCSI, FC and SATA drives, poten-
tially an indication that disk-independent factors, such as
operating conditions, affect replacement rates more than
component specific factors.
 
Google may use desktop drives, but they also climate control their DCs. Once the drive is spinning the temperature probably doesn't vary by more than 2-3C between idle and load - and it is that stable for years. The same is not true of your garage.
AC units go out, PSU fans fail. All of these things happen in a datacenter somewhere, even in Google's.
Google keeps their datacenters in the low 80s Fahrenheit btw


7200RPM non-server drives are crap for server environments unless they are in a low-usage SAN/NAS. They just don't have good enough seek times to accommodate heavy usage, and SATA sucks for network connectivity.

They're good for disk-based backups and simple OS drives. I typically start an unsized Linux's VMs OS and Data/App partitions on a SATA LUN and then just move the Data partition to a SAS LUN if the response times get unreasonable.

Some units like the Compellent move LUN blocks from SSD<->SAS<->SATA as activity picks up on those blocks... really lets you keep the busy blocks where you need them without splitting up the filesystem across pools.
 
That's great for google, but for other companies and organizations, I would rather have reliable SAS 15K HDDs with RAID 5/6, off-site backup, and incremental tape backups.

7200RPM non-server drives are crap for server environments unless they are in a low-usage SAN/NAS. They just don't have good enough seek times to accommodate heavy usage, and SATA sucks for network connectivity.

You do realize the contradiction in your statements here, right? "That's good for Google"..."they don't have good enough seek time...heavy usage...SATA sucks...". Right. They're good enough for the largest data aggregator in the world with the biggest and most heavily accessed databases ever created...but they are only good enough for low-usage SAN/NAS. Comical.
 
Everyone here has different experiences and expectations. Everyone gives different advice.

---

In regard to a question not directed toward me: "You buy a new 500GB drive every month?"

I upgraded my HTPC twice this year. I bought several 1.5TB drives early in the year and several 2TB drives later. (My old drives went into backups.) I can understand the need for people to buy 500GB every month. But I think 2TB drives are cheaper but sometimes smaller drives solve the correct problem.
 
You do realize the contradiction in your statements here, right? "That's good for Google"..."they don't have good enough seek time...heavy usage...SATA sucks...". Right. They're good enough for the largest data aggregator in the world with the biggest and most heavily accessed databases ever created...but they are only good enough for low-usage SAN/NAS. Comical.

If you have enough of them, I suppose that's fine, but for high-usage situations within a firm, they absolutely will not do.

SATA is only half-duplex and is unreliable in high-usage environments.

As was stated, google uses a ton of them, so if one goes down, another can kick in.
Companies don't work this way and normally won't have hundreds/thousands of SATA drives, rather, a few good SAS drives for high reliability.

If you want to use SATA drives for a company's RAID array, then good luck.
Those drives were meant for single-user environments, and in a pinch, can be used for for a small server.
They were never intended for heavy usage such as databases, accounting, video streaming, etc. in a many-user environment.

...and yeah, SATA does suck complete ass compared to SAS, and any IT admin worth his salt would know that.

Very comical, yeah, this coming from the "paranoid" guy who buys a new 500GB HDD every month. :rolleyes:
 
ZFS works fine with SATA?

It's not that SATA is bad as a whole, but it is not nearly as robust as SAS nor does it have the hardware features of it.

I'm using SATA in my server right now (sig), but I'm not doing anything majorly intensive either, nor is it a many-user environment.

SAS has a much higher reliability and throughput than SATA since it is full-duplex, thus it can handle large loads of traffic simultaneously without being bottle necked or having any wait time.

Seriously, the only SATA drives I would use in a mid-high traffic instance would be in a SAN, and the HDD would have to be enterprise-class drives, no desktop drives allowed.

The servers however would have SAS, whether they be 7200 or 15K RPM drives depends on the load the servers would have.
 
It's not that SATA is bad as a whole, but it is not nearly as robust as SAS nor does it have the hardware features of it.

I'm using SATA in my server right now (sig), but I'm not doing anything majorly intensive either, nor is it a many-user environment.

SAS has a much higher reliability and throughput than SATA since it is full-duplex, thus it can handle large loads of traffic simultaneously without being bottle necked or having any wait time.

Seriously, the only SATA drives I would use in a mid-high traffic instance would be in a SAN, and the HDD would have to be enterprise-class drives, no desktop drives allowed.

The servers however would have SAS, whether they be 7200 or 15K RPM drives depends on the load the servers would have.

I agree with you 100%
But now we can use cheap discs due to ZFS, ya?
Assuming drives WILL fail, doesn't it make sense to appropriate cost per array instead of cost per drive and then maximize on the inexpensive part of RAID?
I already pay too much for SSD, I cannot justify cost of enterprise drive, what else am I not understanding?
I look forward to your reply
 
If you have enough of them, I suppose that's fine, but for high-usage situations within a firm, they absolutely will not do.

SATA is only half-duplex and is unreliable in high-usage environments.

As was stated, google uses a ton of them, so if one goes down, another can kick in.
Companies don't work this way and normally won't have hundreds/thousands of SATA drives, rather, a few good SAS drives for high reliability.

If you want to use SATA drives for a company's RAID array, then good luck.
Those drives were meant for single-user environments, and in a pinch, can be used for for a small server.
They were never intended for heavy usage such as databases, accounting, video streaming, etc. in a many-user environment.

...and yeah, SATA does suck complete ass compared to SAS, and any IT admin worth his salt would know that.

Very comical, yeah, this coming from the "paranoid" guy who buys a new 500GB HDD every month. :rolleyes:

I was just ROTFL at the obvious contradiction in your post...its good enough for Google but not good enough for "real" data storage problems...right!

As for the backup drives - they are cheap and worth it to keep my data safe. At the scale I add video its cheaper than all but the most fly-by-night network backup services. Besides, I've been rotating a dozen or so of them for a while now, not buying new, so its even cheaper, $0 monthly marginal cost. No worries here.
 
Back
Top