Best Storage Solution?

Red Squirrel · Mar 21, 2014

Personally I like mdadm raid. You can add 1 disk at a time to expand, you can even convert raid levels to some extent. All live, without needing to bring the data offline. Linux permissions can be a real pain in the ass, even with backups, having to restore stuff would take a long time just to get all the permissions back properly.

HICKFARM · Mar 21, 2014

DPI said:
The nice thing about pooling and parity being separate is you can choose best of breed for each, as they are modular, and in my experience Stablebit Drivepool has been flawless on Windows Server 2012R2, I have three pools of various quantities of drives, windows index service picks it up (which isn't true for some of the other competing jbod pooling software) and best of all when I go to read or write a file to or from the pool it only spins up the one disk rather than all pool members. Ive also configured it to fill up one disk at a time instead of round robin that would scatter files randomly across all disks, this all but eliminates folder splits and keeps related files together..

How does drivepool exactly work? Select several hard drives and then it maps it to one Drive Letter? Would i also have to be running snapraid in the background for parity?

Got my 3 4TB drives in the mail, but need to acquire a solid state drive for the OS on my server. My OS install decided to corrupt itself same time my Raid card went out. That or that drive is going bad.

ptirmal · Mar 21, 2014

Red Squirrel said:
Personally I like mdadm raid. You can add 1 disk at a time to expand, you can even convert raid levels to some extent. All live, without needing to bring the data offline. Linux permissions can be a real pain in the ass, even with backups, having to restore stuff would take a long time just to get all the permissions back properly.

What? You can't add a single disk to an MDADM array with redundant protection as is being discussed with snapraid.

drescherjm · Mar 21, 2014

ptirmal said:
What? You can't add a single disk to an MDADM array with redundant protection as is being discussed with snapraid.

mdadm allows expansion with the --grow command.

Code:

datastore4 snapraid # mdadm --help --grow
Usage: mdadm --grow device options

This usage causes mdadm to attempt to reconfigure a running array.
This is only possibly if the kernel being used supports a particular
reconfiguration.  This version supports changing the number of
devices in a RAID1/5/6, changing the active size of all devices in
a RAID1/4/5/6, adding or removing a write-intent bitmap, and changing
the error mode for a 'FAULTY' array.

Options that are valid with the grow (-G --grow) mode are:
  --level=       -l   : Tell mdadm what level the array is so that it can
                      : interpret '--layout' properly.
  --layout=      -p   : For a FAULTY array, set/change the error mode.
  --size=        -z   : Change the active size of devices in an array.
                      : This is useful if all devices have been replaced
                      : with larger devices.   Value is in Kilobytes, or
                      : the special word 'max' meaning 'as large as possible'.
  --raid-devices= -n  : Change the number of active devices in an array.
  --bitmap=      -b   : Add or remove a write-intent bitmap.
  --backup-file= file : A file on a differt device to store data for a
                      : short time while increasing raid-devices on a
                      : RAID4/5/6 array. Not needed when a spare is present.
  --array-size=  -Z   : Change visible size of array.  This does not change
                      : any data on the device, and is not stable across restarts.

Edit: And no I did not intentionally run that from a snapraid dataset (testing purposes) on top of a zfs raidz2 x2 vpool. The putty session happened to be the top one on my second monitor..

ptirmal · Mar 21, 2014

drescherjm said:

mdadm allows expansion with the --grow command.

Code:

datastore4 snapraid # mdadm --help --grow
Usage: mdadm --grow device options

This usage causes mdadm to attempt to reconfigure a running array.
This is only possibly if the kernel being used supports a particular
reconfiguration.  This version supports changing the number of
devices in a RAID1/5/6, changing the active size of all devices in
a RAID1/4/5/6, adding or removing a write-intent bitmap, and changing
the error mode for a 'FAULTY' array.

Options that are valid with the grow (-G --grow) mode are:
  --level=       -l   : Tell mdadm what level the array is so that it can
                      : interpret '--layout' properly.
  --layout=      -p   : For a FAULTY array, set/change the error mode.
  --size=        -z   : Change the active size of devices in an array.
                      : This is useful if all devices have been replaced
                      : with larger devices.   Value is in Kilobytes, or
                      : the special word 'max' meaning 'as large as possible'.
  --raid-devices= -n  : Change the number of active devices in an array.
  --bitmap=      -b   : Add or remove a write-intent bitmap.
  --backup-file= file : A file on a differt device to store data for a
                      : short time while increasing raid-devices on a
                      : RAID4/5/6 array. Not needed when a spare is present.
  --array-size=  -Z   : Change visible size of array.  This does not change
                      : any data on the device, and is not stable across restarts.

Edit: And no I did not intentionally run that from a snapraid dataset (testing purposes) on top of a zfs raidz2 x2 vpool. The putty session happened to be the top one on my second monitor..

I was under the impression grow was to grow the size of the array with the same number of disks (upgrading all disks). Thanks for that. Still can't add (and make use of) a larger drive though.

SirMaster · Mar 21, 2014

ptirmal said:
I was under the impression grow was to grow the size of the array with the same number of disks (upgrading all disks). Thanks for that. Still can't add (and make use of) a larger drive though.

Actually, MDADM can do that too with the help of LVM. You can combine RAID5 with RAID1 or with RAID6 it combines it with additional RAID6 of the split disks into the main volume to add and utilize the space on drives of different sizes in one array.

This is exactly what Synology does with their NAS software. They call it SHR and SHR2 for 1 and 2 disk redundancy. It manages your array with LVM+MDADM.

Check out their drive calculator to see what you can actually match up and use with MDADM.

http://www.synology.com/en-global/support/RAID_calculator

MDADM is very flexible. You can start with a single drive, convert to RAID 1, then convert to RAID 5 then convert to RAID 6, then grow the array disk by disk.

ptirmal · Mar 21, 2014

SirMaster said:
Actually, MDADM can do that too with the help of LVM. You can combine RAID5 with RAID1 or with RAID6 it combines it with additional RAID6 of the split disks into the main volume to add and utilize the space on drives of different sizes in one array.

This is exactly what Synology does with their NAS software. They call it SHR and SHR2 for 1 and 2 disk redundancy. It manages your array with LVM+MDADM.

Check out their drive calculator to see what you can actually match up and use with MDADM.

http://www.synology.com/en-global/support/RAID_calculator

MDADM is very flexible. You can start with a single drive, convert to RAID 1, then convert to RAID 5 then convert to RAID 6, then grow the array disk by disk.

So it (MDADM) can't?

I know LVM+MDADM is powerful, I wasn't aware that's how Synology was doing it. I'm guessing unRaid does something similarly?

JoeComp · Mar 21, 2014

ptirmal said:
So it (MDADM) can't?

I know LVM+MDADM is powerful, I wasn't aware that's how Synology was doing it. I'm guessing unRaid does something similarly?

I'm not sure what you are asking "mdadm can't?". But if you are asking can it be done without LVM, just mdadm, then the answer is yes. You just partition your drives with sizes of the GCD (greatest common divisor). For example, if you have 1TB, 2TB, and 3TB drives, then fill all the drives with 1TB partitions. Then create multiple RAID5 or RAID6 volumes using the 1TB partitions. You will not have a single pool if you just use mdadm, but you will utilize all the capacity of your different-sized drives (minus redundancy, of course). Then you could add a same size or larger size drive and use OCE to expand the volumes. But you would have difficulty if you wanted to add a smaller sized drive (smaller than 1TB) if you are set up with 1TB partitions. SnapRAID would be much easier to use, and more flexible, than the complexity that mdadm would require.

As for UnRAID, it uses its own technique, similar to RAID 4. Not very flexible, since it only supports single parity and requires a specific filesystem on each drive (Reiser).

ptirmal · Mar 21, 2014

JoeComp said:
I'm not sure what you are asking "mdadm can't?". But if you are asking can it be done without LVM, just mdadm, then the answer is yes. You just partition your drives with sizes of the LCM (least common multiple). For example, if you have 1TB, 2TB, and 3TB drives, then fill all the drives with 1TB partitions. Then create multiple RAID5 or RAID6 volumes using the 1TB partitions. You will not have a single pool if you just use mdadm, but you will utilize all the capacity of your different-sized drives (minus redundancy, of course). Then you could add a same size or larger size drive and use OCE to expand the volumes. But you would have difficulty if you wanted to add a smaller sized drive (smaller than 1TB) if you are set up with 1TB partitions. SnapRAID would be much easier to use, and more flexible, than the complexity that mdadm would require.

As for UnRAID, it uses its own technique, similar to RAID 4. Not very flexible, since it only supports single parity and requires a specific filesystem on each drive (Reiser).

Thanks for the explanation.

dynabaul · Mar 21, 2014

HICKFARM said:
How does drivepool exactly work? Select several hard drives and then it maps it to one Drive Letter? Would i also have to be running snapraid in the background for parity?

Got my 3 4TB drives in the mail, but need to acquire a solid state drive for the OS on my server. My OS install decided to corrupt itself same time my Raid card went out. That or that drive is going bad.

You are right in what you've said. Drivepool just pools the drives together under a single letter. No redundancy there. You would need something else to provide parity such as snapraid.

DPI · Mar 21, 2014

dynabaul said:
You are right in what you've said. Drivepool just pools the drives together under a single letter. No redundancy there. You would need something else to provide parity such as snapraid.

Not technically correct as Stablebit DrivePool does do file duplication out of the box, which is 1:1 mirroring (think RAID1 but at the file level), but mirroring becomes a bit wasteful once you get past a certain number of drives and youre just storing blurays and DVD's let's say.

Philmatic · Mar 21, 2014

HICKFARM said:
How does drivepool exactly work? Select several hard drives and then it maps it to one Drive Letter? Would i also have to be running snapraid in the background for parity?

Got my 3 4TB drives in the mail, but need to acquire a solid state drive for the OS on my server. My OS install decided to corrupt itself same time my Raid card went out. That or that drive is going bad.

It pools all your hard drives (Or whatever hard drives you choose) into one drive letter, you can then set any level of file level duplication for any folder you want, or against the whole pool.

For instance, setting it to 2 copies (Default) will make sure that you have two full copies of all files in the folder you set it against. You can technically set the number to whatever you want, for example if you have 5 hard drives and you want to make sure all your data is on each of those hard drives, you set the duplication count to 5 and voila.

There's no parity, or snapshots, or anything like that. It's very much like Microsoft's original Drive Extender, except it works and scales extremely well.

Red Squirrel · Mar 21, 2014

SirMaster said:
Actually, MDADM can do that too with the help of LVM. You can combine RAID5 with RAID1 or with RAID6 it combines it with additional RAID6 of the split disks into the main volume to add and utilize the space on drives of different sizes in one array.

This is exactly what Synology does with their NAS software. They call it SHR and SHR2 for 1 and 2 disk redundancy. It manages your array with LVM+MDADM.

Check out their drive calculator to see what you can actually match up and use with MDADM.

http://www.synology.com/en-global/support/RAID_calculator

MDADM is very flexible. You can start with a single drive, convert to RAID 1, then convert to RAID 5 then convert to RAID 6, then grow the array disk by disk.

Heck if you wanted to you could put a drive that is twice the size and make two partitions, then add the partitions instead of the drives. I can't imagine the performance would be very good if you did that though. Idealy with raid you always want the same size drive anyway.

What you CAN do is replace ALL drives with bigger ones, and then grow the array that way. Ex: if you have a raid 5 with 4 1TB drives you can replace the 1TB drives with 3TB ones (one at a time, rebuilding in between) and then triple the size of the array.

It may not be the best thing out there but I really like it for it's simplicity and durability. I've done various tests on a non production machine such as unplugging the power cord in the middle of a rebuild and it's very solid.

brutalizer · Mar 22, 2014

dynabaul said:
You've also totally missed the point of this thread. Home media server. Not Enterprise storage with Petabyte installations with 100s of disks. And there's more than one way to pool disks. So that fact than flexraid/snapraid/unraid/etc don't pool is not an issue. And they don't update files individually. One command and they all do. Low overhead for files that don't change on a constant basis.

That fact that flex/snap/etc dont pool is a very big issue when we talk about scaling upwards into the multi PetaByte realm. So you agree that flex/etc does not scale upwards into the multi petabyte realm with billions of files changing constantly, on huge pools. Good.

I agree that flex/snap/etc are suitable for home users, and I have recommended them earlier, and will recommend them again for home users. But if we talk about scaling or data integrity - there are lot of misconceptions in this thread. I have read the research papers and write down that current research says. I think it is good to have some informed posts on the latest research here, and not just FUD nor false claims. You should know what you are talking about, grounded on science. Not subjective opinions ("snapraid gives full data integrity! Can you show us research proving that? No. In that case, it is just your opinion and wishful thinking").

Can someone else refer to academic research papers, except me? Why am I the only one doing that?

I remember when I talked about data corruption, years ago after reading research papers - people thought I was just trolling. Specially the hardware raid guys. They called me troll, spreading FUD, etc. Now they know better. It helps to keep the level of discussion up, if several people read the research papers instead of saying "false" - what kind of academic argument is that? People are just ignorant, nor they read the research papers. And because they dont know, they call me Troll. When in fact, they dont know anything.

And I've built a snapraid with 4 parity and 30 data disks so I'm not sure where you get the 10 disk number at either.

10 was just a number. Say 20 if that makes you happier. Or 30. My point is that you can not handle 1000s of disks for large enterprise solutions with flex/snap/etc. You can not manually handle huge arrays. It has to be done automatically.

JoeComp said:
False.
False.

Fine, if I am wrong on flexraid/etc does not scale upwards, please show me a multi petabyte installation with any of flex/snap/etc with billions of files. If you can not, then it is you that is wrong.

Regarding ZFS always can guarantee data integrity (provided the hardware is not faulty and you have redundancy enough). Here are some independent research papers proving that ZFs does guarantee data integrity:
http://en.wikipedia.org/wiki/ZFS#Data_integrity
Do you have independent research showing that flex/snap/etc - does give data integrity? No. In that case, you are wrong. Again.

DPI said:
Anyway pooling is beyond the scope of SnapRAID, and technically its not actually trying to compete with zfs in that regard, or anything else for that matter since its free and essentially a passion project for the developer.

Yes, there is like, one random developer probably with zero experience of big data and multi petabyte storage. So there is no way that snap/flex/etc does scale into the large Enterprise storage halls.

...in my experience Stablebit Drivepool has been flawless on Windows Server 2012R2, I have three pools of various quantities of drives, windows index service picks it up (which isn't true for some of the other competing jbod pooling software) and best of all when I go to read or write a file to or from the pool it only spins up the one disk rather than all pool members. Ive also configured it to fill up one disk at a time instead of round robin that would scatter files randomly across all disks, this all but eliminates folder splits and keeps related files together..

You should do some checksums from time to time, too. That can help. And dont rely on ReFS, as it does not have checksummed the data. Only the metadata is checksummed. Not safe, neither.

JoeComp · Mar 22, 2014

brutalizer said:
Here are some independent research papers proving that ZFs does guarantee data integrity:

False.

brutalizer said:
Yes, there is like, one random developer probably with zero experience of big data and multi petabyte storage. So there is no way that snap/flex/etc does scale into the large Enterprise storage halls.

Faulty logic (and unwarranted assumption).

bexamous · Mar 22, 2014

Why is 1000s of disks ever even mentioned on these forums? Its totally irrelevant to anything gong on here.

TCM2 · Mar 22, 2014

I run a redundant ZFS pool for the OS and day-to-day changing data like VMs, home dirs, mail dirs etc. and single-disk ZFS pools with SnapRAID on top for static data, all in the same system. Best of both worlds and you kinda "double check" SnapRAID using ZFS checksums. If SnapRAID should ever miscalculate something, you'd have a way to verify what's going on using ZFS's tried code.

So far, SnapRAID has been flawless for me. It really is the best way for static media collections.

BTW, re: Flexraid:

"Flexraid uses checksums to validate files, but such checksums are not verified when data is read to update the parity. This means that any silent error present will propagate into the parity, making impossible to fix it later, even if it can be still detected comparing the file checksum.
You can get in a state where the "Validate" operation reports errors, but the "Verify" one reports no problem in the parity, making impossible to fix the errors. "

-- http://snapraid.sourceforge.net/compare.html

Yeah, no thanks.

DPI · Mar 22, 2014

brutalizer said:
That fact that flex/snap/etc dont pool is a very big issue when we talk about scaling upwards into the multi PetaByte realm.

blah
blah
blah

dynabaul · Mar 22, 2014

bexamous said:
Why is 1000s of disks ever even mentioned on these forums? Its totally irrelevant to anything gong on here.

So VERY VERY right. This discussion was about best storage for a small home media server. The only argument others want to give is the 1000s of disks and such as to why their solution is better. Totally out of scope and missing the point of what was asked.

dynabaul · Mar 22, 2014

brutalizer said:
That fact that flex/snap/etc dont pool is a very big issue when we talk about scaling upwards into the multi PetaByte realm.

Again, missing the point of what we're after in this thread. Home media server. Hello? The fact that a piece of software designed to generate parity on data doesn't pool is not a big deal.

The reason being is that for one there are other pooling solution on this scale. Home media server scale.

And two, saying that a piece of software doesn't do something that it was NEVER written to do is a very big issue is, well, a little stupid. It's the same as saying the calendar app on your phone doesn't send SMS so it's a flawed piece of software for that reason. Something totally out of it's purpose is it's flaw. Doesn't make much sense really.

HICKFARM · Mar 23, 2014

bexamous said:
Why is 1000s of disks ever even mentioned on these forums? Its totally irrelevant to anything gong on here.

Exactly I didn't make this post for people to argue about petabyte nonsense when i'm not even hitting 20TB.

Philmatic · Mar 23, 2014

JoeComp said:
False.

I am posting facts and correct information.

You are posting incorrect information.

Rectal Prolapse · Mar 26, 2014

_Gea said:
Discussions like this are ending too often in comparing features of different solutions that cannot be compared as they completely targeting different problems - mainly questions that depend on filesystem features and questions that depend on raid features.

(rest of this informative message clipped)

Well said Gea!

I plan to move to a SnapRaid + ZFS solution for my non-critical media collection. Maybe in a few months or even a year down the road. For now, my RAIDZ2 has been running nicely for the last couple of years, and I use that for more critical data (with a backup). Power usage has been less than I expected, but my drives spin down when idle.

I cannot live without the end-to-end checksums - if I wasn't using ZFS when my old server had random PSU issues I would have lost all of my data, and that would have been a pain in the butt to recover!

JoeComp · Mar 26, 2014

Rectal Prolapse said:
I cannot live without the end-to-end checksums ....

Something that is not possible for almost all media fileservers, since the first end is usually an optical disc, and the other end is often a network-connected media player.

TCM2 · Mar 26, 2014

JoeComp said:
Something that is not possible for almost all media fileservers, since the first end is usually an optical disc, and the other end is often a network-connected media player.

What's your point? I'd expect that a movie gets watched at least once after ripping it to storage. At the time of watching or even ripping, you verify that the movie is visually and audibly OK. From that point on, you want to protect against bit rot.

If that's your argument against ZFS for media storage, you need to try again. I'm using SnapRAID myself, but don't let the crusade against ZFS for media get ridiculous.

JoeComp · Mar 26, 2014

TCM2 said:
What's your point?

As I said, "end to end" data protection is not possible for almost all media fileservers. It is possible to protect data on a media fileserver with checksums in various ways (SnapRAID, ZFS, btrfs, file-level checksums, etc.), but it will not be "end to end" for the vast majority of setups, since you would have to generate the media on the fileserver and play it back with the player on the fileserver for it to be "end to end" (and even then it is arguable).

TCM2 · Mar 26, 2014

JoeComp said:
As I said, "end to end" data protection is not possible for almost all media fileservers.

Fine. Then "end to end" was bad wording. There are enough reasons why ZFS is overkill for media storage, no need to nitpick on words.

It's a fallacious argument anway as we are looking at storage as a closed system, not the whole chain of processing. Even if your BD drive never rips a movie correct and your media player crashes after 5s, ZFS doesn't make any of this worse. It's neutral and it protects your data, whatever your data is. So this is no argument against a full-blown ZFS for media.

Concentrate on the real points, which for me are 1) power usage through number of spinning disks 2) expandability 3) mixed-size disks possible, and build your argument on that.

JoeComp · Mar 26, 2014

TCM2 said:
Then "end to end" was bad wording. There are enough reasons why ZFS is overkill for media storage, no need to nitpick on words.

No, it was much more than "bad wording", and certainly not a "nitpick".

"End to end data protection" has a specific meaning -- that data is protected from beginning to end. This almost never has any practical meaning when discussing a media fileserver. Yet some people keep bringing it up, as if there is some magical way to protect media from beginning to end on a media fileserver. There is not. It could be done manually, in the rare case where it was actually important. You'd need to read the data from an optical drive and immediately generate checksums (probably read it at least twice and compare checksums to minimize possibility of errors due to the path from the optical disc to RAM), then send the data to the media fileserver, then transfer the data to the playback computer and verify the original checksums, then play it back from the RAM buffer. However, it is hard to imagine a home media fileserver where such precautions are necessary.

_Gea · Mar 26, 2014

End to end in the checksumming context of modern filesystems like btrfs, ReFS or ZFS always mean protection from the moment data is under control of the OS-disk subsystem. It includes disk-driver, cabling, disk controller and disk and includes a raid if raid is used. You can extend to RAM if you use ECC. The term is used to contrast to disk checksumming that does not really protect your data.

Realtime checksumming at this level is the only place where it can be done perfectly. If you do afterwards at an application level like with snapraid it is not realtime and does not include the initial write and read - only better than not having checksums at all. Does also not protect on any read but only on a verification run.

These features of modern filesystems are a huge advantage. I expect them to be the default in some years. Filesystems without such features are not filesystem that I would prefer for terabytes of data - media or others. They are relicts from a time when disc capacity is measured in megabytes where the statistical error rate was not a problem like now when you have filers with douzens of terabyte or more.

In some years, ext 3/4, hfs+ or ntfs are like Fat16 or other old filesystems - history. Filesystem checksumming and CopyOnWrite is the future and absolutely needed for the amount of data we use now.

The use of non realtime raid-features for a mediserver is a completely different aspect. But for the filesystem itself, I would not use anything without these two core features as they are the only way to trust data and keep a filesystem consistent.

JoeComp · Mar 26, 2014

_Gea said:
End to end in the checksumming context of modern filesystems like btrfs, ReFS or ZFS always mean protection from the moment data is under control of the OS-disk subsystem.

In the context of a media fileserver, which is what we are talking about, "end to end" must mean from the original media data to playback of the media in order to make any sense. And in many cases, as-early-as-possible manual checksumming will catch more errors than that caught by a checksumming filesystem alone.

But again, it is hard to imagine a situation where such extreme measures are necessary for a home media fileserver.

TCM2 · Mar 26, 2014

Why stop at the BD? Did the mastering facility use ZFS? Noone knows? Well this must prove ZFS is not fit for a media server.

This reasoning is retarded.

You manage to piss off people who even agree with you in principle. Troll galore.

HICKFARM · Mar 26, 2014

I find it hard to believe that everyone "rips" their media straight from discs for the majority of their media collection. Yes I have done this some and need to start it more for Blu-Ray rips. But most of my media came from a program with a U icon, which I am sure most others are as well.

So for the checksum end to end thing you guys are arguing about. My first End is as soon as it passes its checksum in the download manager and then the server raid. Stop arguing about enterprise backups and stuff. I can just redownload a movie or show if it somehow becomes corrupted. None of my files are that important.

One the bright side the cheap 50$ rocketraid card automatically recognized my RAID 5 array and all my data is safe.

TeeJayHoward · Mar 26, 2014

HICKFARM said:
I find it hard to believe that everyone "rips" their media straight from discs for the majority of their media collection.

Everyone might not, but I know of several that do, including myself. (.ISO FTW!) Oddly enough, all of my "mu" traffic has been legal for a very long time.

iroc409 · Mar 26, 2014

HICKFARM said:
I find it hard to believe that everyone "rips" their media straight from discs for the majority of their media collection. Yes I have done this some and need to start it more for Blu-Ray rips. But most of my media came from a program with a U icon, which I am sure most others are as well.

So for the checksum end to end thing you guys are arguing about. My first End is as soon as it passes its checksum in the download manager and then the server raid. Stop arguing about enterprise backups and stuff. I can just redownload a movie or show if it somehow becomes corrupted. None of my files are that important.

One the bright side the cheap 50$ rocketraid card automatically recognized my RAID 5 array and all my data is safe.

I have a lot of material, such as pictures and such, that are non-replaceable. For these reasons, end-to-end checksums are important to me. For a media server storing just movies, it would be of little importance to me. Everyone's situation is different.

saiyan · Mar 26, 2014

HICKFARM said:
How does drivepool exactly work? Select several hard drives and then it maps it to one Drive Letter? Would i also have to be running snapraid in the background for parity?

Basically StableBit DrivePool allows you to add bunch of NTFS formatted hard drives into one or more larger pool which can be mapped to available drive letters. The DrivePool dashboard program will show a list of NTFS formatted hard drives on your system and you can select specific hard drives and add them into one or more pools. DrivePool also provides some plugins allowing you to choose how to balance files or duplicate files (mirroring) across your hard drives.

The advantage of this pooling solution is that you can either access your files through the pool or individual hard drives. And if any hard drives failed, you can still access files on your other hard drives.

SnapRaid is a separate program you can use to add data parity to any hard drives on your systems. It's basically a command line executable you run to build and sync parity data or restore missing files. SnapRaid has no background service but you can create simple command line scripts and add them to your Scheduled Tasks to run once every night to update data parity or you can run execute SnapRAID sync manually. The only thing SnapRaid needs is a proper configuration file which you need to create.

Here is an example of how I use DrivePool and SnapRAID together.

Assuming you have 5 hard drives and you want to add 4 of them to DrivePool and the remaining 1 hard drive for SnapRaid parity.

1) Mount all drives to a few sub folders on an existing hard drive using Disk Management.
(You can mount them to drive letters too if you want to).

C:\Mounted Drives\Disk 1
C:\Mounted Drives\Disk 2
C:\Mounted Drives\Disk 3
C:\Mounted Drives\Disk 4
C:\Mounted Drives\Parity 1

2) Update your SnapRaid configuration file to reflect the drives you have mounted and the drive you want to use to store SnapRaid parity data.

3) On your StableBit DrivePool dashboard, select hard drives you want to use for data and add them to a pool. DrivePool will automatically assign a drive letter to the pool but you can go to Disk Management and change the drive letter if you want.

When a hard drive is added to a DrivePool, DrviePool will create a hidden sub directory with a unique poolPart ID name.

For example, DrivePool will create new sub folder with poolPart GUID name on each of the data drive you added to the pool:

C:\Mounted Drives\Disk 1\fdde62f0-44cf-4a1b-8e13-89b1bc4045c1
C:\Mounted Drives\Disk 2\e1ef38d1-c137-47eb-b55f-de61b09566ff
C:\Mounted Drives\Disk 3\44ffda3a-38a4-48d7-9b1a-9fb6f7e42544
C:\Mounted Drives\Disk 4\439e1868-fde3-4035-9ed5-ab9978522b81

Only files and folders stored in these poolPart GUID folders are accessible through the pooled drive created by DrivePool. When you copy files to the pool, DrivePool will only add files and folders to these GUID folders on each data hard drive.

So if you already have files on your data drives and you would like them to be visible through the pooled drive, you will need to move those files from their original folders to the GUID folders.

For example:

Move files from
C:\Mounted Drives\Disk 1\My MP3 Collection
to
C:\Mounted Drives\Disk 1\fdde62f0-44cf-4a1b-8e13-89b1bc4045c1\My MP3 Collection

So that's basically how you use DrivePool.

For data parity protection, you will need to run SnapRAID sync command to update your parity periodically (e.g. once per day or whenever large amount of data has changed). If your SnapRAID is configured properly, it should only use the drive mount points to access the data drives and not pooled drive.

When you want to remove a drive from DrivePool dashboard, DrivePool will migrate files from that drive to other data drives if space available.

What happens when you physically remove a data drive without removing the drive from a pooled drive using DrivePool first?

StableBit pool tracks its data hard drives by disk GUID (not to be confused with the poolPart GUID folder) which is written to the the hard drive when first *initialized*. Even if you reformat the partition or delete and re-create all partitions using Disk Management utility, the hard drive will still have the same disk GUID. (The only way to change the hard drive GUID is to use the diskpart command).

So if you remove a hard drive (e.g. unplug cable) physically, DrivePool will detect one of its data drive is missing. And when you plug the same drive back in, DrivePool will automatically add the same hard drive back to the pool because it recognized the GUID of the hard drive.

This means that if you swap out a data drive with another hard drive which was not in the pool previously, DrivePool will treat the new hard drive as a non-pooled drive.

So if one of the data drive failed suddenly and you want to replace the bad drive with a new one and rebuild data from SnapRaid parity, you will need to do the followings:

For example, if your Disk 2 data drive failed:
1) Remove the bad drive.
2) Plug in the new drive, format it and mount the drive to "C:\Mounted Drives\Disk 2"
3) Run your SnapRAID fix command to rebuild files previously stored in "C:\Mounted Drives\Disk 2\e1ef38d1-c137-47eb-b55f-de61b09566ff" on the bad drive.
4) Add the new drive into DrivePool.

Remember that the new drive will have a different disk ID so DrivePool will add the drive as a new member disk and create a new GUID sub folder.

For example, DrivePool created the following poolPart GUID sub folder when a new replacement drive pool:
C:\Mounted Drives\Disk 2\22b2f0b9-f23d-48cf-81a5-018f9596dbd7

But wait a minute, SnapRaid rebuilt files on this new drive to the old poolPart GUID folder used by previous bad drive:
C:\Mounted Drives\Disk 2\e1ef38d1-c137-47eb-b55f-de61b09566ff

Obviously files rebuilt by SnapRAID is not accessible thorough DrivePool at this point.

So your next step would be:
5) Move files re-created by SnapRaid from the old DrivePool poolPart GUID folder to the new poolPart GUID folder.

SnapRAID is not just limited to rebuilding all files stored on a failed hard drive.

Other use for SnapRAID include restoring files using parity data of the last snapshot (SnapRAID sync).
So if you realized you accidentally deleted your "My MP3 Collection" folder or if you have made unintended changes to your media files, you can run SnapRAID fix command with some optional filename filter string to "restore" files from parity data of the last SnapRAID sync snapshot.

Of course, if you did a SnapRAID sync already before you realized some files were accidentally deleted or modified, you will not be able to rebuild them from parity data.

Finally, SnapRAID can scrub your both your data files and parity data.
If I remember correctly, SnapRAID stores at least two separate content list files containing hash of every block of parity data to it can also ensure parity data integrity. And of course your data files will be checked against parity data during scrubbing.

I think SnapRaid approach is somewhat better than some hardware RAID controller scrubbing which allows you to choose either "assume data is good" or "assume parity is good" to correct silent errors.

So these are just some things I have learned from using StableBit DrivePool and SnapRAID.

IMO, DrivePool plus SnapRAID is a good choice for protecting *huge* media library files which don't change very often and can grow very large. (SnapRaid supports up to 6 parity drives though I have only tested up to 2 parity drives).

If you need more disk space, you can have the option of replacing just a few hard drives first starting with the parity drive(s).

You don't need to worry about replacing all data drives when replacing them with larger capacity drives or worry about losing all data due to hard drive failure during RAID expansion/rebuild.

Also DrivePool has the option to duplicate files across data drives. So if you want a RAID1 type of protection (mirroring), you don't even need SnapRAID.

I would suggest you test SnapRAID and DrivePool in a virtual machine first to see if you are comfortable with SnapRAID configuration file and its command line interface and if DrivePool works for you or not.

Another solution available is FlexRAID which provides both pooling and parity support. But from my experience testing FlexRAID, I don't like its GUI interface which is confusing and FlexRAID caused some problem when I simulated hard drive failure and removal in a virtual machine. IMO, StableBit DrivePool + SnapRAID is more reliable then FlexRAID.

Jeroen1000 · Mar 26, 2014

Nice topic full of good information ! Thank you Saiyan for describing your choice in detail.
It does seem like hardware RAID is on its way out unless it can somehow hardware accellerate or incorperate features found in SnapRaid and DrivePool.

On the topic of error correction: Does hardware RAID have the advantage that bit flip in memory (data corruption in general) can be detected and corrected since the RAM on a hardware RAID-card is ECC?
(I am asking because ZFS cannot offer protection for data that gets corrupted in system RAM).

Or does all data the is read from, or written to, a hard drive pass system memory?
IF above assumtion is correct, are there solutions for protecting data while it is traveling over system busses or while it resides in CPU cache?

saiyan · Mar 26, 2014

For very large media collection (e.x. Blu-Ray/DVD backups), traditional RAID may no longer be a good choice because when you want to expand your storage space you don't want to go through the process of backing up files, replacing all drives, create new RAID set and restoring files.

If you don't have enough space to backup files, you will have to go through the swap one drive with a bigger one and let RAID rebuild missing data steps repeatedly until all drives are replace. Or go through the RAID set expansion procedure if your controller has it. But you will be risking losing your data if too hard drive failed during your RAID rebuild or expansion..

So unless you need 24/7 up time for your media library, pooled storage with snapshot parity and easy expansion is probably the better choice than traditional RAID.

If data has bit flip error in system RAM before written to a hard drive or storage device, I don't think any hardware RAID or file system would be able to detect the error. So unless the program which writes data to the hard drive also does a read validation against a set of data that does not contain bit error, it's unlikely the error can be detected.

May be someone here knows more about system RAM to storage device end-to-end data integrity and protection.

JoeComp · Mar 26, 2014

TCM2 said:
This reasoning is retarded.

Don't be so hard on yourself. I'm sure you can reason better if you try.

TCM2 · Mar 27, 2014

JoeComp said:
Don't be so hard on yourself. I'm sure you can reason better if you try.

Good monkey. Here's a banana.

Rectal Prolapse · Mar 27, 2014

My definition of end-to-end checksumming is described by gea. It's good enough for me.

It really sucks watching a movie and it glitches (audio squawks are the worst, followed by image breakup that takes half a second or more to clear up) because of bitrot. This was worst when my old HTPC had an NVIDIA SATA controller which randomly corrupted data when reading from and writing to hard disk. Sometimes it would freeze the player completely, other times the image would break up. And it was random! Rewind to that scene, works fine, then screws up badly a minute later. Ugh.

In theory, bitrot shouldn't matter during media playback, but in practice, it does.

Best Storage Solution?

[H]F Junkie

Gawd

Weaksauce

[H]F Junkie

Weaksauce

2[H]4U

Weaksauce

[H]ard|Gawd

Weaksauce

n00b

[H]F Junkie

Weaksauce

[H]F Junkie

[H]ard|Gawd

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]F Junkie

n00b

n00b

Gawd

Weaksauce

Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

Gawd

Gawd

Limpness Supreme

[H]ard|Gawd

n00b

Limp Gawd

n00b

[H]ard|Gawd

Gawd

Gawd