ZFS Crash Course

GreenLED

Limp Gawd
Joined
Mar 1, 2014
Messages
141
I'm in the process of planning and building a FreeNAS box. My problem is, I haven't dealt with specifically with ZFS as a mode of data storage. There are a number of concepts in ZFS that are foreign to me. So, I'd like to find some method to resolve this by either getting answers one by one or getting references to sources that might be good to look over. The problem with links for me is that I can't interact with an article. I have to be able to ask questions. So, I don't know what the best way is to tackle this, but I can start by asking some questions that might (in addition to answering my own questions) help a number of other people searching around for answers.

1. I don't understand if a pool is supposed to be the container for a volume or not. Can you span a volume across multiple pools? What is the purpose of a pool?! (besides belly flopping into it :))
 
Watch this video and also part2.

http://www.youtube.com/watch?v=3-KesLwobps

If the links become bad in the future google search for ZFS ninja.

:D. Thank you for your quick reply. I have watched both of those. EXCELLENT videos both of them. I'd still would like to interact with you all and get direct answers to my questions because it helps me to build on those answers and get a much better understanding of how things work. But, in the meantime I will re-watch those videos and see if I can't get some of these concepts down.
 
A zpool is the data storage. It is built up from one or several vdevs. A vdev is a group of hard disks (or partitions, or files, etc). Typically you have a zpool consisting of one vdev. That vdev is configured as raidz1 (raid-5) or raidz2 (raid-6) or raidz3 (allow three disks to crash). You can have several vdevs in one zpool if you wish - which speeds up IOPS. One raidz2 vdev should consist of 5 to 11 disks or so. You can add another vdev to a zpool, but you can not change the number of disks in a vdev - it is fixed.
 
I appreciate that answer a lot. Thank you. Here's my response.

So, if I have say:

pool
- vdev_a (mirror)
-- dev1
-- dev2
-- dev3

- vdev_b (mirror)
-- dev4
-- dev5
-- dev6

So, this diagram means that whatever data is on dev1 is the same as is on 2 and 3. COrrect? It also means I can have dev 2 and 3 fail and still have the array in tact because dev1 is still available. Correct?

Question: If you have multiple vdevs, you are basically striping (or collect data from both on reads). Do I understand this correctly?

My diagram above is not disirable because of the waste of capacity as I understand it. Am I correct in saying this?

If I create a vdev with say ... 3 devices (like I have above), when I add a new vdev, I MUST add EXACTLY the same amount of physical devices, correct?

*taking a breathe*, baby steps, baby steps :).
 
This is basically raid10 with 3-way mirrors. Desirable? It depends on your requirements. Keep in mind that random read IOPs will be approx 6 disks, as ZFS can spread the reads between all disks on a vdev. No, you don't need to add exactly the same - you could add another mirror of 2 (or 4 or whatever) disks, or maybe a single disk. Not recommended though.
 
I cannot recommend running ZFS for anything important without doing actual exercises on a scratch array.

Pull a drive and rebuild with a different one. Add a drive to an existing array, expanding it, the right way. Try a drive fail simulation again.

ZFS is pretty alien even to experienced unix users due to its unusual commandline idioms and because it meshes up all the traditional layers of raw device, redundancy, filesystem and snapshotting. Practice is required, or you might have to face difficult situations with live data later.
 
One vdev can be a raidz1, raidz2, raidz3 or a mirror. A mirror can have lot of disks, not just two. So you can make a mirror with 20 disks, which means all 20 disks holds the same information.

You can have any mix of vdevs in a zpool, one mirror, one raidz1, etc with different number of disks. If one vdev fails, you have lost all your data in the zpool. Therefore, make sure each vdev has redundancy, and preferably, identical configurations.

If you later, happen to add a single disk by accident, to a zpool - it means if the single disk crashes you have lost all data. In this case you need to rebuiild the zpool from scratch - copy off all data, destroy the zpool and create it again.
 
I cannot recommend running ZFS for anything important without doing actual exercises on a scratch array.

Pull a drive and rebuild with a different one. Add a drive to an existing array, expanding it, the right way. Try a drive fail simulation again.

ZFS is pretty alien even to experienced unix users due to its unusual commandline idioms and because it meshes up all the traditional layers of raw device, redundancy, filesystem and snapshotting. Practice is required, or you might have to face difficult situations with live data later.

Hmm. NOw you're making me second guess myself. I was planning on dumping a full backup (which at first won't be anywhere near my full capacity) onto a single drive outside of the main server for just this type of situation. But, still, that recovery process would be very slow and that single drive might fail. I realy am torn. So, you're saying ZFS is not used by any mainstream server platforms?

Just to be clear, I was going to do mock trials using VirtualBox, but that's not with live hardware. I'm really glad you mentioned doing drive failure tests, etc.

If I setup an array and pull a drive out forcibly (unplug the SATA power connector, would that be a good test -- or is that a bad idea. I do have a UPS, but wouldn't a power failure on the drive be a good way to test drive failure? How can I get a "dashboard" of real-time smart data of each drive or is that even possible in FreeNAS / ZFS / debian?
 
"So, you're saying ZFS is not used by any mainstream server platforms?" No, that's not what he's saying. There are providers using zfs for serious data storage. The point is that this all comes from a solaris-centric background, so most implementations will have a steep learning curve for basic OS commands. On top of that, as others said, zfs lumps raid, volume and share management all in one piece, and the commands are not like you will have seen on any other Unix platform. smart has nothing to do with zfs - you can download and run smartctl for various platforms to dump out smart info for drives just fine. I'm not sure what you concern is, as any raid architecture will take some time to resync an array/pool/whatever if you have to replace drives. The specific gotcha being mentioned is that there is no real protection against adding a single drive as a new vdev, resulting in a single point of failure. In that case, yes, you would need to migrate the data off, rebuild the pool and migrate it back.
 
If I setup an array and pull a drive out forcibly (unplug the SATA power connector, would that be a good test

I have done this type of testing for zfsonlinux before I went live with a 20TB+ server. Also since I have collected a few unreliable drives I have had a chance to test other failure modes on two smaller servers that do not contain critical data. For me backups are on a tape archive.
 
"So, you're saying ZFS is not used by any mainstream server platforms?"

This is getting very meta meta. I'm quoting your quote that I said :).

So, to be clear. That was not intended to be sarcastic at all. Just want to be clear on that point. I have a river of sarcasm that flows from my wife 24/7. Anyhow. I'm not concerned really about going live with ZFS, just conerned about not approaching it properly and having to hit the panic button. I'm really glad you brought up the point of bringing a server online with 20TB+. The thing I AM concerned about is when I being to run out of SATA connections on my motherboard and I don't have enough storage to go around. What do people do in this case. I thought ZFS would provide me with some sort of elegant transition (I could have thought wrong). Do people just build a bigger server? Is there a way to bind a new server, with an old server so that data can be pooled together? The scaling issue is something I have been trying to figure out and I just haven't gotten a good answer on that anywhere.

I want to thank you all for your help and answering my questions. I can tell this is a really good place for technical people like me. I'll be spending a lot of time around here.
 
Scaling HW like this really has nothing to do with zfs, per-se. You can get a couple of jbod chassis and daisy chain them and throw a crapload of drives on, using one or more HBA. I would do that before I used sata ports for more than a handful of drives...
 
Scaling HW like this really has nothing to do with zfs, per-se. You can get a couple of jbod chassis and daisy chain them and throw a crapload of drives on, using one or more HBA. I would do that before I used sata ports for more than a handful of drives...

Could you link me some examples of these two things you're talking about. I knew there was some sort of expansion chasis like this, but I didn't know what they were called until just now. You are not talking about SAS expanders though right?
 
If you search this forum you can find discussion of the rackable systems se3016 jbod chassis.
 
I'd like to stay away from SAS expanders. If I want to use multipliers, can I still use that chassis or do I need to look at something else? I'd like to keep all my drives connected in the same way (i.e. either all to the SATA bus directly or all indirectly via port multipliers, etc.). Should I have a main server box and then expand to fit my scale? Here's a server case I was considering.

This looks like a server case + hot swap bays. I want just the chassis for drives only and cooling.

http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021

Here's the server case I was originally planning on using.

http://www.newegg.com/Product/Product.aspx?Item=N82E16811147165

I do realize the conversation is moving away from ZFS at the moment. I hate to have two posts going. This one has good responses. Should I move my discussion about chassis to a different post?
 
What are your needs, what do you want with your storage? Workload? Projected use? How many TB?
 
(i.e. either all to the SATA bus directly or all indirectly via port multipliers, etc.)

You can use SATA ports but I would avoid SATA port multipliers and instead use SAS/SATA HBAs.
 
What are your needs, what do you want with your storage? Workload? Projected use? How many TB?

This unit will be used for the backup of video editing files from my clients. It won't just be purely video editing files, but that will be a component.

Needs:
  • Data integrity

Workload: Peak times of backup with full utilization for approximately 2 to 3 hours. This will increase with with each user coming online.

The amount of TB will be determined by how modular I'm able to make my system. (i.e. the more I can add, the more I will)
 
Back
Top