Need help to build a green & fast home file server with serious data protection

MountainX · Aug 4, 2010

I'm looking to upgrade in order to achieve a much more power efficient system and I also want to simplify administration.

My current home file server consists of two big Ubuntu boxes. One is a 5U server-quality dual Xeon box with two Areca RAID cards and 8 SCSI drives + 4 SATA drives. The other box is an older Pentium 4 with two 3ware RAID cards and a bunch of PATA and SATA HDDs. The second box just backs up the main file server. I back up certain directories every 15 minutes, and I do full backups every day. The backups are done with storeBackup which is very space efficient.

My whole house is wired with cat6 cable, all computers have GigE NICs and all my computers have their /home directories NFS mounted on the file server. I want to keep doing it this way. Basically I keep no important data anywhere except the file server.

The system has been working well for a few years. However, I'd like to move to something a lot simpler and more power efficient. But I also want to have an even better data protection scheme.

Like this guy, I find Nexenta with zfs attractive. But unlike that guy, I don't have the expertise to manage a Nexenta system the way he does.

I have also thought about Ubuntu 10.04 LTS with btrfs. That would be more familiar to me, but going with btrfs might be risky...

However, a copy-on-write fs seems like the way to go...

I'd like some advice.
Which OS: Nexenta with zfs or Ubuntu 10.04 LTS with btrfs?
Which hardware?

Considering other options, is there an affordable pre-built appliance that has copy-on-write and good backup software?

My budget is a couple thousand dollars max. (I'm trying to save on my electric bill, but I doubt my current servers are consuming enough electricity to justify spending even a couple thousand dollars. So the upgrade needs to also get me some additional benefits such as automatic versioning of all file changes.)

aaronspink · Aug 4, 2010

MountainX said:
I have also thought about Ubuntu 10.04 LTS with btrfs. That would be more familiar to me, but going with btrfs might be risky...

btrfs is still pretty early in its development, I'd wait a bit before I trusted real data to it.

jonnyjl · Aug 4, 2010

I don't know if I have any advice, but I will say, before I built my OpenSolaris Server (Norco 4220) in December, I barely (and I mean barely) had any interaction with any *nix OS.

I think ZFS is the way to go if data integrity/protection is important.

I really do not backup my data, other than having most of it replicated from my Windows' desktop.

My server (also hosts 3 VMs, 2 DCs, and Win7 that i use for various things) has been humming along nicely without downtime other than having to install more memory or another controller. I did have some odd glitch a few weeks back where 3 of my disks became detached (probably controller error, the entire vdev was on the same controller) but two of them came back, both of my spares kicked on, and there was zero downtime and no data corruption. Haven't seen any errors since *fingers crossed*.

I didn't really like the e-mails coming to me every 30 minutes (script I found in SUn's repository for zpool status checks and scrub) at work because I can't remote back home, but it was nice to know that when I came home, my server was still up and running.

drescherjm · Aug 4, 2010

aaronspink said:
btrfs is still pretty early in its development, I'd wait a bit before I trusted real data to it.

I agree with that. Do not put valuable data on btrfs. In either condition RAID is not a backup so you take that into account and have at least a second copy of your valuable data that is not on the source raid array.

drescherjm · Aug 4, 2010

MountainX said:
I'm looking to upgrade in order to achieve a much more power efficient system and I also want to simplify administration.

My current home file server consists of two big Ubuntu boxes. One is a 5U server-quality dual Xeon box with two Areca RAID cards and 8 SCSI drives + 4 SATA drives. The other box is an older Pentium 4 with two 3ware RAID cards and a bunch of PATA and SATA HDDs. The second box just backs up the main file server. I back up certain directories every 15 minutes, and I do full backups every day. The backups are done with storeBackup which is very space efficient.

Besides replacing the machines with a modern dual core i3 or low end AMD box get rid of the raid controllers. You do not need these if you run linux or most other non microsoft operating systems. Then replace all the disks with 2TB green drives. For backups either use 2TB externals, a hot swap bay, or a separate raid (possibly on a second machine that suspends between backups).

drescherjm · Aug 4, 2010

but I doubt my current servers are consuming enough electricity to justify spending even a couple thousand dollars.

Depending on where you live that probably is a few years of electricity for running the current machines.

MountainX · Aug 4, 2010

drescherjm said:
RAID is not a backup.

I agree under ext3/ext4 (and other fs). But consider coupling ZFS copy-on-write features plus RAID 1: why is that not a backup?

As I understand it, ZFS with copy-on-write gives me unlimited file versioning history. If I mirror such a ZFS volume, I've got a good backup, right?

Without copy-on-write, RAID (any level) is not a backup. But with full file versioning history, I fail to understand how RAID 1 wouldn't accomplish most of my backup goals. I do not need to go so far as offsite backups. And I do not need to ensure uptime at all costs. I just need to take reasonable steps not to lose my data. As far as I can see, a copy-on-write file system that is mirrored gets me most of the way there. I might want to go further, such as either mirroring again (2 mirrors total in the same machine) or autoCDP over IP to a second box (most likely a low end NAS). But since I'm not worried about uptime, I think there is little to gain by copying to a second box. Simply mirroring once or twice on separate storage devices in the same box seems sufficient. Am I missing something important?

nitrobass24 · Aug 4, 2010

RAID guards against one kind of failure: hardware failure. There's lots of failure modes that it doesn't guard against.

File corruption
Human error (deleting files by mistake)
Catastrophic damage (someone dumps water onto the server)
Virus'
Software bugs that wipe out data

i.e. If you accidentally overwrite your PhD thesis with garbage, redundancy ensures that you have multiple copies of garbage, in case one gets bad. A backup ensures that you can restore your PhD thesis.

MountainX · Aug 4, 2010

drescherjm said:
Besides replacing the machines with a modern dual core i3 or low end AMD box get rid of the raid controllers. You do not need these if you run linux or most other non microsoft operating systems. Then replace all the disks with 2TB green drives. For backups either use 2TB externals, a hot swap bay, or a separate raid (possibly on a second machine that suspends between backups).

For hardware, it seems like I need a special ZIL device if I'm going to use ZFS. Can it be an ordinary SSD, or do I reallly need something like the DDRdrive X1? (see http://jmlittle.blogspot.com/2010/03/zfs-log-devices-review-of-ddrdrive-x1.html)

What do you recommend for getting a Linux box to suspend between backups? It doesn't seem trivial to get a Linux server to suspend and then wake on LAN.

It does seem a shame to replace my high-end server (ECC RAM, 15k RPM HDDs, etc.) with a low end green box. Maybe I need to focus on installing a copy-on-write capable file system and implementing sleep/suspend when there is no activity.

MountainX · Aug 4, 2010

nitrobass24 said:
RAID guards against one kind of failure: hardware failure. There's lots of failure modes that it doesn't guard against.

File corruption

Human error (deleting files by mistake)

Catastrophic damage (someone dumps water onto the server)

Virus'

Software bugs that wipe out data

i.e. If you accidentally overwrite your PhD thesis with garbage, redundancy ensures that you have multiple copies of garbage, in case one gets bad. A backup ensures that you can restore your PhD thesis.

That's a canned reply. I've seen it a million times. Did you understand my point about COW? Am I right about COW?
Copy on write, as I understand it, guards against at least the common forms of the things you list:

File corruption - the earlier version is available
Human error (deleting files by mistake) - an earlier version remains available
Software bugs that wipe out data- an earlier version remains available

Am I right?
As to the other items on the list:
I don't have any virus problems in Linux (and I don't have WIndows machines on my LAN). And I choose not to worry about catastrophic damage -- if I was concerned about that I would have to have offsite backups, right? If I am concerned about someone dumping water on a server, I have to be concerned about a flood or hurricane etc. that destroys all my local servers. By my reasoning, if I have two separate storage devices in one box, that's good enough provided that I have full versioning of all files and that I have continuous mirroring from one device to the other.

[LYL]Homer · Aug 4, 2010

6. Theft

nitrobass24 · Aug 4, 2010

I dont think thats how COW works.

Its my understanding that when writing it will write the data to a new block with a pointer back to the data-in-use block (aka Uberblock) then once its done being written it moves the pointer to the new block.

However this does not prevent a virus/bugs from destroying your data including the not in use data, nor does it prevent human error. If its accidentally deleted the pointer is gone. You cant roll back to something that no longer exists.

MountainX · Aug 4, 2010

[LYL]Homer;1036020645 said:
6. Theft

I'm not trying to protect against floods, earthquakes, theft or anything else like that. I want to protect against HDD failure and the other problems in the prior list such as user errors and anything else that can overwrite a good file with bad data.

The best SIMPLE solution I have thought of so far is a copy-on-write file system with continuous mirroring (autoCPD or similar). Mirroring with RAID 1 would make things really simple. No administration would be required. Replicating to a second box greatly increases the things that can go wrong as well as the admin requirements and it doesn't buy me much extra value (given my goals).

I think the perfect solution for me might be:

ZFS (or any viable COW fs for Linux -- maybe even SVN)
RAID 1 mirroring
some way to sleep/suspend the server when there is no activity and wake it up instantly. I don't know the solution for this yet.

MountainX · Aug 4, 2010

nitrobass24 said:
I dont think thats how COW works.

Its my understanding that when writing it will write the data to a new block with a pointer back to the data-in-use block (aka Uberblock) then once its done being written it moves the pointer to the new block.

However this does not prevent a virus/bugs from destroying your data including the not in use data, nor does it prevent human error. If its accidentally deleted the pointer is gone. You cant roll back to something that no longer exists.

Your understanding and mine are different. As I understand COW, it provides a time-shifting interface that allows a (read only) view of the file system as it existed at any previous point in time. This allows users to access their file system as it appeared in the past, including before a file was deleted or corrupted. If ZFS won't do that, other Linux COW fs's will. [And some file systems (I believe btrfs) also writing to those prior views, but that's not my interest at the moment.]

nitrobass24 · Aug 4, 2010

You may be right. Im surprised Sub.mesa (ZFS-Nazi

) hasnt cleared this up.

MountainX · Aug 4, 2010

nitrobass24 said:
You may be right. Im surprised Sub.mesa (ZFS-Nazi ) hasnt cleared this up.

I just looked up another versioning file system for Linux. This explanation seems to clear up the prior questions about features. I suspect that if NILFS will do all this, ZFS will too.
-------------------

The upcoming 2.6.30 kernel is loaded with a number of new file systems some of which are ext4 and btrfs. Another of the hot new file systems that is in 2.6.30 is NILFS. This file system is definitely one that you should test.

"NILFS is a log-structured file system supporting versioning of the entire file system and continuous snapshotting which allows users to even restore files mistakenly overwritten or destroyed just a few seconds ago."

NILFS is a new implementation of a log-structured file system (LFS) supporting continuous snapshotting. In addition to versioning capability of the entire file system, users can even restore files mistakenly overwritten or destroyed just a few seconds ago. Since NILFS can keep consistency like conventional LFS, it achieves quick recovery after system crashes.

NILFS creates a number of checkpoints every few seconds or per synchronous write basis (unless there is no change). Users can select significant versions among continuously created checkpoints, and can change them into snapshots which will be preserved until they are changed back to checkpoints.

There is no limit on the number of snapshots until the volume gets full. Each snapshot is mountable as a read-only file system. It is mountable concurrently with a writable mount and other snapshots, and this feature is convenient for online backup.

MountainX · Aug 4, 2010

http://www.linux-mag.com/id/7345 <-- good article

And this from Wikipedia, which reinforces my perception that adding mirroring to a versioning file system will address my needs really well:

Backup
A versioning file system is similar to a periodic backup, with several key differences.
Backups are normally triggered on a timed basis, while versioning occurs when the file changes.
Backups are usually system-wide or partition-wide, while versioning occurs independently on a file-by-file basis.
Backups are normally written to separate media, while versioning file systems write to the same hard drive (and normally the same folder, directory, or local partition).
Revision control system
Versioning file systems provide some of the features of revision control systems. However, unlike most revision control systems, they are transparent.
Journaling file system
Versioning file systems should not be confused with journaling file systems.

MountainX · Dec 21, 2010

I would like to revive this thread. It is time to replace my my home office file server, but I still don't know the right solution. I would appreciate some new advice.

Currently I'm running Ubuntu on an Intel dual Xeon server with an Areca RAID controller and 8 sata drives. It's loud and power hungry. It's about 6-7 years old.

I want to build a new system. My goals are:

continuous snapshots (see post 2 above this one)
very high performance with NFS over GigE (without expert tuning--I'm not an expert)
quiet and power efficient
will go to sleep according to a schedule or lack of activity
low sysadmin maintenance requirements (the closer to Ubuntu, the better)

I plan to purchase new hardware asap. I can take some time to evaluate the OS options and then transition from my current file server to the new NAS at a comfortable pace.

Although it is not my post, my requirements are similar to this:
http://serverfault.com/questions/99...le-snapshot-saving-crypto-backupping-raid-nas
I could go a few hundred dollars higher on the budget (e.g., up to $1200 or so).

All my files are kept on the NAS, so it impacts performance of everything. My whole house is wired with GigE. My client PCs are Ubuntu or OS X. I mainly just work on one Ubuntu client, so the file server usually only has to serve that one client. I do software development and technical writing and the usual mix of home user stuff (some videos, music, etc.). Again, all files are kept on the file server.

Thanks for any advice about file systems (NILFS, ZFS, btrfs, ext3cow, etc.), operating systems (FreeNAS, FreeBSD, NextentaStore, Ubuntu 11.04+btrfs) and related products (e.g., R1Soft, etc.).

Dangman · Dec 21, 2010

Have you thought about just trying out all those different file systems and OSes in virtual machines and see for yourself if they're good or not for your needs?

However the one FS I would not trust with live data is btrfs: It's still relatively experimental at this stage.

drescherjm · Dec 21, 2010

Danny Bui said:
However the one FS I would not trust with live data is btrfs: It's still relatively experimental at this stage.

Agreed stay away from btrfs for at least 1 more year probably 2 more years at the pace that it is being developed now..

MountainX · Dec 21, 2010

My guess is that trying the OS/file system in a VM would not be all that helpful. I couldn't do any meaningful benchmarks, could I? For example, I/O rates would depend partly on the host OS. Seems better just to try the OS on the bare metal.

BTW, I found out that I was wrong in my earlier post about continuous snapshot features. Neither ZFS nor BTRFS support the type of unlimited file versioning history that NiLFS supports. But NiLFS is even less ready than BTRFS. I guess ZFS is the only (free) choice...

Currently I'm using EXT4 and storeBackup. I back up everything daily to an external drive, and I back up frequently used directories every 2 hours to a different drive. Now that I better understand the limitations of all the other options, my current option doesn't seem so bad. storeBackup is a really nice tool. It's very space efficient.

fmaster · Dec 22, 2010

Honestly it really seems like you want to pick ZFS and it is undoubtedly a great choice for the things that you are trying to do.

now the question what is best/reliable/future proof way to implement ZFS?

OpenSolaris - Questionable future so out

FreeBSD - Solid future but, I hear they are a few versions back and performance is very bad?

FreeNAS + ZFS - Great option but, not sure how it handles versioning.
OpenIndiana - Cast off OpenSolaris community who are continuing the great work. I installed their lastest build in a vm and it works great. They have a cool plugin for nautilus where you use a slider to help chose between previous versions....very awesome. Problem is they aren't even at a beta yet so wouldn't want to trust my data yet.

Nexenta - Haven't used it but, maybe the best option of the bunch.

I am not sure how FreeNAS + ZFS handles versioning but, it might be the best/easiest solution.
I would probably go with either FreeBSD, FreeNAS or Nexenta as they seem the most stable projects.

BTW, if you are going to use ZFS in a networked setting, I hear performance is really poor unless you use an SSD.

MountainX · Dec 22, 2010

Thank you fmaster. That's interesting. I'll look closer at all those options. And I do plan to use an SSD. I was just reading this: http://www.nerdblog.com/2010/03/zfs-nas-followup-ssd-is-amazing.html

vraa · Dec 22, 2010

I have OpenIndiana machines that create snapshots every time a file is accessed through samba through ZFS

I am switching to ZFSGuru (Freebsd + ZFS)

ZFS is open source

I am not worried

omniscence · Dec 22, 2010

While ZFS with snapshots can serve as a backup solution, it will not protect against a controller or OS that goes haywire and writes garbage to all disks or a failing power supply that puts 100V to your 5V lines and fries your harddrives. Just look at that recent firmware glitch of Samsung F4 drives. Not even ZFS can completely protect you against drives that write to corrupt blocks. Does ZFS feature a verify-after-write option?

While something like that never happenend to me, I have a relative which lost a complete computer including his 3 harddrives to a lighning strike. Luckily he backups his data to DVD-Rs regularly.

I really like to have a physically distinct copy of all of my data.

brutalizer · Dec 25, 2010

1). ZFS can not protect your data if you use a hardware raid card. ZFS must have exclusive access to the disks. If you use ZFS, you must not use a hw raid card. Only JBOD.

2) Hw-raid does not protect your data against silent corruption, nor against bit rot. Read this carefully. Researchers show hw-raid does not protect your data:
http://en.wikipedia.org/wiki/Zfs#Data_Integrity

3) COW is a solution where all new data never overwrites old data. This means old data and new data exist on the disk, at the same time. (Some time later, the old data will be overwritten). This means that you can go back to the old data if the filesystem supports this (ZFS does support this via "Snapshot"). COW does not protect your data. It only means old data will not be overwritten, even if you make a small change. Several versions of data will exist on the drive.

Need help to build a green & fast home file server with serious data protection

Limp Gawd

2[H]4U

Limp Gawd

[H]F Junkie

[H]F Junkie

[H]F Junkie

Limp Gawd

[H]ard|DCer of the Month - December 2009

Limp Gawd

Limp Gawd

Supreme [H]ardness

[H]ard|DCer of the Month - December 2009

Limp Gawd

Limp Gawd

[H]ard|DCer of the Month - December 2009

Limp Gawd

Limp Gawd

Limp Gawd

Ninja Editor SuperMod

[H]F Junkie

Limp Gawd

n00b

Limp Gawd

Gawd

[H]ard|Gawd

[H]ard|Gawd