ZFS Server

Ripley · Apr 13, 2010

I am going to build a ZFS file server. I had already discovered the awesomeness that is ZFS before finding this forum but have learned quite a bit more after reading the posts here. I do still have some questions and could also use some recommendations though.

I plan to start with a single 10 drive Raid Z2. I have a few 1TB drives now and will need to get 5 or 6 more. I believe, if I understand things correctly that I could use 2TB drives and only use 1TB of them. Am I also correct that when I eventually replace all of the 1TB drives with 2TB drives that I would then be able to use all of the space on all the drives without having to recreate my array?

I would like to have an SSD as a cache device. Do I have to add this when I create the pool or can I add it later? Also, can 2 pools share a single cache device or do they need their own? What are the drive size guidelines or recommendations for cache devices? Last, is it possible to use a RAM drive as a cache device?

I am leaning towards an AMD Phenom II processor mostly due to price, with the alternative being an i5. Should I spend the extra money for the Intel? If not, where in the Phenom II range would you recommend (i.e. X2, X3, X4)?

After choosing the processor, what motherboard features are important? I know I need something that supports ECC memory. I would also like dual GigE ports as my switch supports Link Aggregation. What else should I look for? Specific model recommendations?

My eventual goal is 20 drives in 2 Raid Z2 pools. I will need a standalone SAS HBA or two, or and expander. I was looking at the Supermicro AOC-SASLP-MV8. Is there something better? Should I plan on more than one or go with the expander?

Last, power supply sizing. This is something I don't feel I have any real handle on. So any information or reading that you have would be very helpful.

pjkenned · Apr 13, 2010

If the LSI 2008 chipset is supported, think about a Supermicro X8SI6-F. Dual INTEL lan onboard, 6x SATA + 8x 6.0gbps SAS = 14 onboard ports. Figure a supermicro add-on card is $100 for 8 ports and two Intel gigabit ports (since you don't want to be using Realtek or something else with that much hardware) is probably another $70. Difference is, of course, that the Supermicro board has IPMI 2.0 unlike consumer boards, so it has a dedicated management LAN with a remote KVM.

Add one 8 port HBA and you'll have 22 SATA ports, or enough for an OS SSD, a cache SSD (or 2x cache ssd's) plus 20 drives for a Norco case.

I moved my server over to a X8ST3-F from an Asus P6T7 WS Supercompter and am very happy with the move. IMPI is great!

Dangman · Apr 13, 2010

For a PSU, I'd recommend a quality 750W PSU with a minimum of 60A on the +12V rail to be on the safe side. Something like the Corsair 750TX would be an excellent choice

Krobar · Apr 13, 2010

CPU wise either should be more than fast enough to saturate a 1Gbit connection. Decent onboards NICs are important (Intel or Broadcom), if you have onboard Realtek or Nvidia then include money for an Intel Gbit card in your build. IPMI is a really nice feature and you wont need an Optical drive if you have this. Try the Supermicro X8SIL-F for 1156 CPU or the Tyan S8005AGM2NR for AM3.

If you dont have staggered spinup support then plan on 1.75 to 3 Amps on the 12V rail per HDD, this is on startup only but without staggering they will all start at once. With staggered spinup you can get away with as little as 0.5A per drive on the 12V rail. a 650W single 12V rail Corsair will work just about any single CPU board and a full 20 drive compliment without staggered spin. With staggered spin you can get away with a fair bit smaller PSU, as little as 400W.

Ruroni · Apr 13, 2010

Those Supermicro boards are really nice. Definitely something I'd buy.

Krobar · Apr 13, 2010

Ruroni said:
Those Supermicro boards are really nice. Definitely something I'd buy.

I know its overkill but......
http://www.supermicro.com/Aplus/motherboard/Opteron6100/SR56x0/H8SGL-F.cfm

p3n · Apr 13, 2010

Dont bother with AMD, if you want low power i3 is the best power/performance

sub.mesa · Apr 14, 2010

p3n: you're making it sound like chosing Core i3/i5 over Phenom II will save power. As AMD chips only use a few watts when idling, this won't have any effect. Cool'N'Quiet works very well to reduce idle power on AMD chips; to just 1.5-4W idle drain.

Also, there might be a reason to prefer AMD over Intel in fileserver setups. AMD chips have higher compression and encryption speeds in my tests; though some newer Core i5's have hardware acceleration for AES (not SHA), its likely to be not yet supported under OpenSolaris/BSD. This may be important when you frequently use either the live compression setting on ZFS or some AES-based encryption on your (part of) your array.

Speaking of which, i'm missing the Operating System details in the original post. Is this going to be a FreeBSD/Opensolaris/FreeNAS box?

The Supermicro AOC-SASLP-MV8 does not work well with FreeBSD; if you want to use that OS i recommend a different adapter based on another chipset that does work well:
http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm

This is a newer 6Gbps version of that board, but might/probably unsupported at this stage:
http://www.supermicro.com/products/accessories/addon/AOC-USAS2-L8i.cfm?TYP=E
This version lacks RAID support which might be a little cheaper - again likely unsupported at this moment:
http://www.supermicro.com/products/accessories/addon/AOC-USAS2-L8i.cfm?TYP=E

I would chose between those three controllers. Besides them having SATA 3/6Gbps per port, the major difference is also in the PCI-express bus bandwidth. The first generation is PCIe 1.0; thus 250MB/s per lane; 2GB/s total. The newer cards are PCIe 2.0 x8 and have 4GB/s full-duplex bandwidth.

sub.mesa · Apr 14, 2010

Please also consider a different setup:

make RAID-Z arrays per 4 disks; thus having a RAID50:
RAID0 ( RAIDZ + RAIDZ + RAIDZ)

Each RAIDZ will be 4 disks (3 data + 1 parity) leading to 25% overhead. This setup would be excellent to use in conjunction with copies=2 on select directories that you deem important. Those directories will - in addition to the single parity protection - store a full copy of each file in that directory on a different array. Thus if you have 3x RAID-Z and copies=2; at least 2 arrays will be used to store the copy. That also automatically means you have double the parity protection. This should or may be safer than using a single RAID-Z2 (double parity) without copies=2.

The cool thing is that copies=2 can be set per directory and thus you can decide which data needs additional protection. That data will take double the space, however. The obvious choice is text documents as these are often important but use little space.

Another benefit is expansion: it's easy to add disks 4 at a time, with being a different size. If you have 10 x 1TB disks, what would you do with the 1TB disks after you bought 2TB disks? In this setup, it's easy to mix 1TB and 2TB disks while regaining the capacity and making expansion easy. Howmany disks do you have at the moment? Say you have 5, if you buy three more you can make two four-disk RAID-Z arrays. Then later you use 2TB disks instead and create additional arrays.

Another option would be to combine the 1TB disks to 2TB volumes; and mix it with regular 2TB disks. Thus a 4-disk RAID-Z array of 2TB disks; but a few or all of them are actually RAID0 arrays of two 1TB disks; which makes 2TB. Beware of excess space; limiting capacity to 1GB boundary might make sense.

pjkenned · Apr 14, 2010

sub.mesa said:
p3n: you're making it sound like chosing Core i3/i5 over Phenom II will save power. As AMD chips only use a few watts when idling, this won't have any effect. Cool'N'Quiet works very well to reduce idle power on AMD chips; to just 1.5-4W idle drain.

The 32nm process sips power both at idle and full load...

http://www.anandtech.com/show/2972/...-pentium-g6950-core-i5-650-660-670-reviewed/8

AMD is still on old-fab. One can make the case that in some cases doing compression/ encryption (assuming you don't have the ability to use AES-NI instructions on the i5's in which case there's no contest) AMD is faster. AMD's power consumption on the low end is not comperable to the 32nm i3 and i5 line at this point. 32nm v. 45nm.

sub.mesa has some really good ideas regarding the Raid-Z setup.

sub.mesa · Apr 14, 2010

That article doesn't list its test setup; the used motherboard may explain the huge difference in idle power between AMD and Intel setups. Other articles will show a different story. Techreport lists power specs on AMD/Intel CPUs in different graphs just because they are not directly comparable.

FYI, i'm running 25W idle at the wall socket with AMD dualcore + AMD 780G MiniITX + 4GB DDR2 + SSD, powered by PicoPSU.

While Intel has the lead over production generation - generally they are one generation ahead - AMD does do pretty well in idle power consumption since the Phenom II. Where EIST makes little difference to Intel chips in reducing idle power consumption, Cool'N'Quiet is very effective in the case for AMD. My AMD quadcore server system uses 43W idle at the wall socket, with full size ATX board. Without any HDDs though.

Keep in mind that videocards can be quite high on the idle power drain, especially nVidia and older ATi series. 50W+ idle is normal for high-end cards. You don't need those in a NAS. Also, motherboards made for enthousiasts may have 10-phase PWM supplying power to the CPU. While that leaves extra room for overclocking at high electrical specs, it's not very good at reducing power consumption. Thus chosing a different class motherboard might make sense.

I don't think any computer system should be higher than 100W idle. If you consume more than that while doing absolutely nothing, it's more like an electric heater than a computer IMO.

LightningCrash · Apr 14, 2010

Note: You don't have ZFS deduplication in anything useful right now.

pjkenned · Apr 14, 2010

sub.mesa said:
That article doesn't list its test setup; the used motherboard may explain the huge difference in idle power between AMD and Intel setups. Other articles will show a different story. Techreport lists power specs on AMD/Intel CPUs in different graphs just because they are not directly comparable.

It does, just on page 2.

http://www.anandtech.com/show/2972/...-pentium-g6950-core-i5-650-660-670-reviewed/2

My i5-650 pulls 27w on my matched pair of kill-a-watt's (which are fairly inaccurate but hey) using a 150w PicoPSU + 150w adapter without enabling lower C states/ other BIOS tweaks, installing any additional motherboard power saving software.

The scary thing is that even doing x264 encoding, I can't get over 77w on that setup.

Is cool'n'quiet working in OpenSolaris yet? I know it has been working with FreeBSD 6.0 or 7.0 and Linux for awhile.

The i3's and i5's have made me retire all of my LGA775, 771, and Socket AM2+, AM3 gear at this point. I do have a Sempron 140 that I might fire up one day just for fun thoguh. I have to imagine that (or its dual core sibling) would have fairly amazing power consumption though.

DellMonitorGuru · Apr 14, 2010

Hardware RAID is overrated. I built a 192 Drive ZFS server at work for backups. It is all based on software RAID on cheap HBA cards. Supermicro JBOD's and head node.

jtg1993 · Apr 14, 2010

DellMonitorGuru said:
Hardware RAID is overrated. I built a 192 Drive ZFS server at work for backups. It is all based on software RAID on cheap HBA cards. Supermicro JBOD's and head node.

Do you have pictures of that monster?

sub.mesa · Apr 14, 2010

Without staggered spin-up, 192 x 3,5" class drives will mean (192*30) ~= 5760W just to spinup the disks.

Not even five 1kW power supplies would pull it; and you might even need 7 because the base system is not included.

But honestly, i think he is using staggered spinup or 2,5" class disks instead. With 2,5" disks you really can store alot of them, not just because of space but also of heat dissipation; the 2,5" disks use only 0,8W idle compared to 3-4W for "green" disks or 6-9W for regular 7200rpm disks.

Sure would love to see pics though. ;-)

Ripley · Apr 14, 2010

OS:
I can't decide between FreeBSD or OpenSolaris. I was leaning towards OS but after Oracle bought Sun I fear for its future. Is FreeBSD's implementation of ZFS as up to date as OS? I have also looked at kFreeBSD and Nexenta. kFreeBSD looks like it may be a good option soon. I am more familiar with Ubuntu/Debian than BSD.

On the HBAs:
Thanks for the info. Do you think there will be support for the newer cards anytime soon? Or should I get something that works now?

CPU:
After reading the posts and the AnandTech article I am still a bit confused on what to do here. sub.mesa seems to think the AMD is better and pjkenned is definitely in the Intel camp. From a price/performance standpoint which is better? And do both support ECC memory?

RAID:
Since I plan to start with 10 drives, is there some technological benefit to using smaller RAIDZ pools over a single RAIDZ2 pool?

jtg1993 · Apr 14, 2010

sub.mesa said:
Without staggered spin-up, 192 x 3,5" class drives will mean (192*30) ~= 5760W just to spinup the disks.

Not even five 1kW power supplies would pull it; and you might even need 7 because the base system is not included.

But honestly, i think he is using staggered spinup or 2,5" class disks instead. With 2,5" disks you really can store alot of them, not just because of space but also of heat dissipation; the 2,5" disks use only 0,8W idle compared to 3-4W for "green" disks or 6-9W for regular 7200rpm disks.

Sure would love to see pics though. ;-)

It's probably in 8x Supermicro 24 drive cases, so the power supplys wouldn't be a problem but the power feed could be. Also since they are probably in 8 cases there would be a delay in pressing the power buttons unless you got 4 people there pressing them.

sub.mesa · Apr 14, 2010

OS: i know FreeBSD quite well, but i'm not very familiar with OpenSolaris. FreeBSD 8.0 has ZFS version 13; 8.1 has ZFS version 14. Its running very stable for me. Biggest disadvantage of FreeBSD in this context is probably low Samba performance, as Samba is very Linux optimized and just a bloated project that needs to be rewritten. OpenSolaris has a nice kernel-level CIFS driver, and also a different iSCSI target daemon, if you're going to use that. NFS should work just as well, though.

HBA: i just mailed the developer responsible for adding support of the LSI 2008 (6Gbps) chips to FreeBSD.

CPU: i don't say AMD is a faster cpu; the Core iX family CPUs are great. But power consumption should be low with both, assuming you also have a power efficient motherboard (3/4-phase PWM). For all normal application the CPU would be over-powered. But with compression and encryption you might need the speed, and in that area AMD does traditionally better than Intel. So i would think both are suitable to serving a sleek ZFS box; just the AMD platform is cheaper.

ZPOOLs: with multiple arrays, you have the benefit of copies=2 generating a copy of each file on both arrays; would be even safer. It also makes expansion easier, as you're adding disks per 4 units; i suggest you maintain this start with either 8 or 12 disks. The HBA cards are 8/16/24 also, so 10 is kind of an odd number to computing.

Aside from that, both setups should work. Generally having multiple arrays is favorable. I would favor two RAID-Zs over one RAID-Z2, as copies=2 means you have more protection in this case.

khaoohs · Apr 14, 2010

sub.mesa: Please post back when you hear about the LSI SAS 2008 support. I'd really like to get the Supermicro board with that chipset and use it with FreeNAS, but obviously lack of support in FreeBSD is an issue right now.

pjkenned · Apr 15, 2010

Ripley said:
OS:
I can't decide between FreeBSD or OpenSolaris. I was leaning towards OS but after Oracle bought Sun I fear for its future. Is FreeBSD's implementation of ZFS as up to date as OS? I have also looked at kFreeBSD and Nexenta. kFreeBSD looks like it may be a good option soon. I am more familiar with Ubuntu/Debian than BSD.

On the HBAs:
Thanks for the info. Do you think there will be support for the newer cards anytime soon? Or should I get something that works now?

CPU:
After reading the posts and the AnandTech article I am still a bit confused on what to do here. sub.mesa seems to think the AMD is better and pjkenned is definitely in the Intel camp. From a price/performance standpoint which is better? And do both support ECC memory?

RAID:
Since I plan to start with 10 drives, is there some technological benefit to using smaller RAIDZ pools over a single RAIDZ2 pool?

I'm fairly sure that sub.mesa and I use two different types of boxes so it is really two different perspectives since I tend to throw a lot of stuff in VM's and do video encoding. The Core i7/ i5/ i3's all do really well on x264 encodes which is a big factor for me. I have a feeling that sub.mesa uses AMD CPU's in storage specific appliances (much lower CPU utilization) which is better in some ways. My CPU's tend to run at 75%+ consistently.

AMD does have the advantage with ECC. Their CPU's support it, for Intel you'd need a Xeon variant like a Xeon L3406. Realistically, that makes Intel much more expensive if you want ECC. If you don't want ECC, AMD is marginally cheaper... at least if you live near a Fry's where there are Core i3-540 + motherboard deals for $99 and core i5 + motherboard deals for $120.

One thing to triple check Supermicro wise is the memory compatibility list. I still remember the day I bought 8GB of DDR3 Kingston ECC memory that was a 4R instead of an 8R and the board was unhappy as could be. Supermicro makes GREAT product, but you do want to be sure to check compatibility because they are known to be less tolerant than consumer boards.

On the OS, I would suggest FreeNAS over Nexenta. The free version of Nexenta only supports 12TB which, if you have 10TB raw today, you will be bumping up against that limitation quickly. I am playing with EON Storage with napp-it which seems to be a pretty solid alternative.

Ripley · Apr 15, 2010

CPU: Its my understanding that ZFS strongly recommends ECC RAM. So that seems to leave me with AMD. I am leaning towards the X4 965 Black Edition. I would love some motherboard recommendations. Power efficient, on-board video and dual NICs are the only things I really need.

ZFS: The reason I was planning on starting with 10 was to fill one row of a Norco 4220. I could start with 8 or 12 just as easily I suppose. I am still not clear on why you suggest small arrays with single parity and copies=2, which seems to require more space, instead of one larger array with double parity. Also, any information on my cache/SSD questions would be much appreciated.

vladthebad · Apr 15, 2010

Pjkenned, the lynnfield stuff doesn't support x4 registered ram. It's not a supermicro limitation, it's a socket/memory controller limitation. it's also one of the reasons that the dual socket 1366 does support x4 was because due to dram die density they knew x4 stuff would be cheaper. Gainestown also added x4 chipkill options, and as far as I know, none of the other platforms really support chipkill on x4 modules. I agree though, and would likely go with a 1156 build for any 24/7 rig these days if it required more than atom level CPU performance. Heck the 45w tdp L3426 should be a damn nice chip. 4 cores/8 threads, 8mb L3 and northbridge for 45w tdp, what's not to love?

PulpDogsRacing · Apr 15, 2010

pjkenned said:
On the OS, I would suggest FreeNAS over Nexenta. The free version of Nexenta only supports 12TB which, if you have 10TB raw today, you will be bumping up against that limitation quickly. I am playing with EON Storage with napp-it which seems to be a pretty solid alternative.

Just want to ask for a quick clarification here on Nexenta as I have been looking at this product lately. I was under the impression that Nexenta itself was free, no limitations, NexentaStor was payfor, and NexentaStor Community Edition was free but limited to the 12TB support.

pjkenned · Apr 15, 2010

PulpDogsRacing said:
Just want to ask for a quick clarification here on Nexenta as I have been looking at this product lately. I was under the impression that Nexenta itself was free, no limitations, NexentaStor was payfor, and NexentaStor Community Edition was free but limited to the 12TB support.

That's true. I much prefer things with WebGUI's at this point since it just makes it easier for me when using 4 different OS'es on a daily basis.

And vladthebad, not necessarily speaking just about Lynlfiend and Supermicro here. In the past four years I've twice built systems on Supermicro boards where I for one reason or another didn't have memory on the compatability list (wrong vendor/ part number and etc) and ran into issues.

Ripley, I'm not as up-to-date on the newest AMD boards, but my only suggestion is make 100% sure whatever you use has Intel NICs. If you can't do that, buy a few cheap NICs like Intel Pro/1000 GT's (PCI) or Pro/1000 PT Dual/Quads for PCIe (I'm a huge fan of the PT's for no particular reason other than they work well and are fairly inexpensive for quads). Also, make sure you have onboard video, and if possible, you do want IPMI 2.0.

sub.mesa · Apr 15, 2010

Ripley said:
CPU: Its my understanding that ZFS strongly recommends ECC RAM.

A mailinglist query yielded this:

I've read in numerous threads that it's important to use ECC RAM in a
ZFS file server.

It is important to use ECC RAM. The embedded market and
server market demand ECC RAM. It is only the el-cheapo PC
market that does not. Going back to some of the early studies
by IBM on errors in PC memory, it is really a shame that the
market has not moved on.

My question is: is there any technical reason, in ZFS's design, that
makes it particularly important for ZFS to require ECC RAM?

No.

Is ZFS especially vulnerable, moreso than other filesystems, to bit
errors in RAM?

No. Except that ZFS actual does check data integrity. So ZFS can
detect if you had a problem. Other file systems can be blissfully
ignorant of data corruption.

I would want to add that ZFS uses significantly more memory than other filesystems, making RAM errors more likely to affect ZFS. On the contrary, the checksumming may provide protection to correct certain RAM-error caused corruption. I've used ZFS in my test server when it had some RAM errors; after benchmarking continuously i could see all disks having parts of data 'resilvered' because the checksum said data was invalid while in fact there was no on-disk corruption. However, the corruption didn't spread to (uncorrectable) data errors; thus no permanent corruption here.

Especially with copies=2 and parity protection, you wouldn't really need ECC for a home server. If this is business, then i would always opt for ECC.

So that seems to leave me with AMD. I am leaning towards the X4 965 Black Edition.

That chip has same TDP as the new hexacore Phenom. I would opt for something a little more modest; such as X4 945 @ 3.0GHz with 95W TDP. FreeBSD is very threaded and scales very well with multiple cores, unlike most windows applications and windows core drivers.

I would love some motherboard recommendations. Power efficient, on-board video and dual NICs are the only things I really need.

I recommend a motherboard based on 760G/785G/SB750+ chips. Newer SB800 /SB850 southbridges are becoming available at the moment, for example on the Gigabyte GA-890GPA-UD3H. It gives you double the PCIe bandwidth on the PCIe x1 ports as these are handled by the southbrdige, while the x16 slots are handled by the northbridge. It also had double the bandwidth to the northbridge and SATA 6Gbps ports, and internal ethernet MAC which you probably don't want to use. I recommend you get two single-port or one dual-port PCIe Intel PRO/1000 PT NIC; that is PCI-express. DO NOT USE PCI!

I am still not clear on why you suggest small arrays with single parity and copies=2, which seems to require more space, instead of one larger array with double parity.

It requires more space, but offers much higher protection. The copies=2 can be set per directory; so text documents etc. not large files which are not important anyway; so you only use that on data that is very important.

Multiple arrays also have performance reasons. As ZFS checks parity and checksum integrity; all disks are accessed. With multiple arrays there is more actual parallel I/O possible.

Also, it makes expanding your array more easy. You add disks per 4 which is a RAID-Z array of itself. If you start with 10 or 12 disks, and later want to add 4, that would limit the performance benefit you get as the arrays are very different in size; so you only get part of the performance benefit.

Also, any information on my cache/SSD questions would be much appreciated.

If you're going to use the stable FreeNAS release, based on FreeBSD 7.2, you won't be able to use cache devices i realise now. Cache devices were introduces in ZFS pool version 10, while FreeNAS uses ZFS pool version 6.

You may want to reconsider using ZFS v13/14 on FreeBSD or OpenSolaris; but it would leave you without nice Web-GUI. You may also use FreeNAS without cache device now, and later add it to the volume when you upgrade to newer FreeNAS which has newer ZFS version. Because FreeBSD 7.3 also has ZFS version 13; so it will just be a matter of time before FreeNAS supports it. Though you may need to configure the cache device via terminal as there may not yet be GUI support.

Cache devices will cause the most frequently accessed data to be stored on the SSD; read requests will first check if it can be served from the SSD; which has very low access times. This can significantly speed up random access on a large array of HDDs. It is similar to the "ReadyBoost" feature available in Windows Vista and 7.

Tau · Apr 15, 2010

I'm just going to chime in here since no one ever meantions this.

ZFS uses TWO types of cache (you can use SSD cache drives here).

All the available RAM in the system except for 1GB is used for ARC cache (random read caching), and ZIL (Write Caching.)

Now you can attach an SSD to enchance both of these, the read SSD would be for L2ARC, and the write for ZIL. So make sure you size/speed the SSDs that you are going to use for cache accordingly.

on the ECC subject. When you are writing/reading data to/from the server is when the corruption would occure, though the rate is very low assuming 1TB read/writes you are looking at 1 bit per TB per year... ish....

Sure ZFS checksums all your data when it is stored and accessed. But it does that when the data is in RAM/cache... after it leaves the cache it does nothing since it has already been checksummed, so if you are serving it out over the network and zfs checksums it then your RAM hiccups you could have just lost a bit or two of your data.

At the moment i would highly reccomend ECC memory for anyone that is concerned about their memory or for a production server.

Raxor · Apr 15, 2010

i have a quick question regarding the raidz array, I have been reading these threads a lot trying to get information on what i will eventually need for a build.

I have a few 1.5TB and 1TB/500GB disks i would like to use. going on the method of using 4 disks with 1 parity/array would it be possible to some how expand the 1TB/500GB drives so they appear as a 1.5 disk then have the raidz array essentially show as 4 1.5TB disks? or is that not possible and i have read it wrong.

if that was the case it would allow me to use some of the older drives i have lying around. (they are all green drives.

And on the subject of ECC ram, are there many amd boards which support ECC and Registered memory as all i seem to find is AMD boards which support ECC and only unbuffered (also i cant seem to find much ecc unbuffered here in the UK)

Tau · Apr 15, 2010

Raxor said:
i have a quick question regarding the raidz array, I have been reading these threads a lot trying to get information on what i will eventually need for a build.

I have a few 1.5TB and 1TB/500GB disks i would like to use. going on the method of using 4 disks with 1 parity/array would it be possible to some how expand the 1TB/500GB drives so they appear as a 1.5 disk then have the raidz array essentially show as 4 1.5TB disks? or is that not possible and i have read it wrong.

if that was the case it would allow me to use some of the older drives i have lying around. (they are all green drives.

And on the subject of ECC ram, are there many amd boards which support ECC and Registered memory as all i seem to find is AMD boards which support ECC and only unbuffered (also i cant seem to find much ecc unbuffered here in the UK)

You could on paper make that setup work.... though i dont think ZFS would let you since the block total would be diffrent... you would have to create slices....

stay away from doing things like that, it MAY work on paper... though in reality it would be a nightmare.

Raxor · Apr 15, 2010

ok, i guess ill stick to arrays of the same capacity, cheers

sub.mesa · Apr 15, 2010

Raxor said:
I have a few 1.5TB and 1TB/500GB disks i would like to use. going on the method of using 4 disks with 1 parity/array would it be possible to some how expand the 1TB/500GB drives so they appear as a 1.5 disk then have the raidz array essentially show as 4 1.5TB disks? or is that not possible and i have read it wrong.

On FreeBSD that is easy, create a JBOD array of the two disks using geom_concat, then attach it to a new zpool. For example:

# assume ad4 is 1TB and ad6 is 0.5TB; the rest are 1.5TB
gconcat label concat0 /dev/ad4 /dev/ad6
zpool create tank raidz /dev/concat/concat0 /dev/ad8 /dev/ad10 /dev/ad12

This would work without any issues; the GEOM I/O framework is very clean and the nicest I/O framework i've seen thus far. ZFS and GEOM also play nicely together. If you want encryption, not yet available in native ZFS, you can also do that:

# encrypt just a few disks part of a RAID-Z array:
geli init /dev/concat/concat0
geli init /dev/ad8
geli attach /dev/concat/concat0
geli attach /dev/ad8
zpool create tank raidz /dev/concat/concat0.eli /dev/ad8.eli /dev/ad10 /dev/ad12

# encrypted ZVOL with its own filesystem
zfs create -V 10g tank/encrypted
geli init /dev/zvol/tank/encrypted
geli attach /dev/zvol/tank/encrypted
newfs -U /dev/zvol/tank/encrypted.eli
mount /dev/zvol/tank/encrypted.eli /mnt

Last example ends with a UFS2 filesystem with Soft-Updates set, encrypted by the geom_eli layer, stored on a ZVOL (zvolume) on the ZFS filesystem on a RAID-Z pool. Isn't that sleek? ;-)

if that was the case it would allow me to use some of the older drives i have lying around. (they are all green drives.

I would all use them. But there is another way: use RAID0 instead, which allows disks to be in different size at least on ZFS. Then, use copies=2 on anything you don't want to miss. That should give you almost mirror-like protection while just dropping disks in there. Plus you can expand with no trouble; you can add any number of disks of any capacity.

I would carefully test the protection this yields, if you go this route. Though it could correct corruption, i'm not sure if ZFS allows access with unavailable devices - which are detached or completely failed. Still something on my shortlist to test and analyse.

Raxor · Apr 15, 2010

Sounds pretty good, i think i would want to at least test this method out (the first one you said) for a while before i put any important data on it.

Is is much different in OpenSolaris as that is what i have been testing out trying to get used to before i build the array.

Thanks for the tips i will want to try that at the weekend.

Ripley · Apr 18, 2010

sub.mesa said:
Multiple arrays also have performance reasons. As ZFS checks parity and checksum integrity; all disks are accessed. With multiple arrays there is more actual parallel I/O possible.

If all the disks in an array are accessed when checking parity and checksum then wouldn't a large array be better than a smaller one?

Tau · Apr 18, 2010

Ripley said:
If all the disks in an array are accessed when checking parity and checksum then wouldn't a large array be better than a smaller one?

Not really, since the same number of discs are being accessed. With multuiple nested/striped arrays you gain additional redundancy as well as performance. instead of one giant raid 5, 2 or 3 smaller striped raid5s yeild better performance, though at the cost of GB.

Also it is reccomended to not make zpools out of more than 7-8 single discs in an array.

Baddreams · Apr 19, 2010

Dang it, just when I had finally ended my month long battle between hardware raid and ZFS and decided on hardware, I read this thread and now I'm questioning everything again.

Tau · Apr 19, 2010

Baddreams said:
Dang it, just when I had finally ended my month long battle between hardware raid and ZFS and decided on hardware, I read this thread and now I'm questioning everything again.

If you have the CPU to back it up Software raid is basically on par, and more flexable in my experiance. Though good hardware raid is nice as well... it depends on your goals

Baddreams · Apr 19, 2010

Tau said:
If you have the CPU to back it up Software raid is basically on par, and more flexable in my experiance. Though good hardware raid is nice as well... it depends on your goals

I have a core 2 duo which is plenty since all this box will do is serve media. How forgiving is ZFS when it comes to moving hardware around? Can you move drives from an existing zpool from say a 4port sata card to an 8 port card? If you had to reinstall the OS could you still use an existing zpool? ZFS seems pretty cool so far, perhaps the question is what can't I do besides expand an existing raid-z?

Tau · Apr 19, 2010

Baddreams said:
I have a core 2 duo which is plenty since all this box will do is serve media. How forgiving is ZFS when it comes to moving hardware around? Can you move drives from an existing zpool from say a 4port sata card to an 8 port card? If you had to reinstall the OS could you still use an existing zpool? ZFS seems pretty cool so far, perhaps the question is what can't I do besides expand an existing raid-z?

Totally. ZFS does not care how the drives are connected, or if you need to reload the OS, hell you can boot from the live disc and access your drives

move from one controller to another and back again, 3 drives on one controller 1 drive on another... it all works.

ZFS is VERY resiliant.

sub.mesa · Apr 19, 2010

It doesn't matter how you connect the drives; you could connect them to another controller or another system. You could replace all the hardware including the HDDs (copy raw contents over to new disks) and import your array.

If you are going to move the array to another system, you need to use the zpool import command, or else ZFS wouldn't touch it as it sees the pool was accessed by a different system.

I also tested to confirm it works by connecting the disks to linux and setting up a Virtualbox VM with FreeBSD running, with physical access to the disks. Then share via the 'virtual' network; that works.

Baddreams · Apr 20, 2010

Very cool stuff. Not to mention FreeBSD has always had a special place in my heart, all I need is a very small excuse to run it

sub.mesa, I like your idea of small raid-z arrays inside a striped pool and use copies=2 for the very important stuff. I think I will do 4 drive chunks since most controllers seem to have capacities in multiples of 4.

Not that I'm looking forward to it, but what happens if two drives fail in a raid-z that is part of a stripe where I have folders with copies=2. If the two drives fail then the entire pool should be toast, however how would you access the directories that had multiple copies? Would the non-degraded raid-z's of the stripe be accessible?

ZFS Server

Limp Gawd

[H]ard|Gawd

Ninja Editor SuperMod

Limp Gawd

Weaksauce

Limp Gawd

Gawd

2[H]4U

2[H]4U

[H]ard|Gawd

2[H]4U

2[H]4U

[H]ard|Gawd

n00b

Gawd

2[H]4U

Limp Gawd

Gawd

2[H]4U

n00b

[H]ard|Gawd

Limp Gawd

n00b

n00b

[H]ard|Gawd

2[H]4U

Limp Gawd

Weaksauce

Limp Gawd

Weaksauce

2[H]4U

Weaksauce

Limp Gawd

Limp Gawd

Weaksauce

Limp Gawd

Weaksauce

Limp Gawd

2[H]4U

Weaksauce