About to do "zpool create raidz" on 6x WD20EARX drives. Any last words from anyone?

idea · Dec 8, 2011

Just attached 6x WD20EARX to a new Solaris 11 instance. Anything I should know before I start dumping data? I'm mostly talking about dealing with firmware updates, TLER, 4K sectors, etc etc

spazoid · Dec 9, 2011

You should put a "2" in there to get a raidz2 instead, that for sure..

danswartz · Dec 9, 2011

^ This.

stewartjm · Dec 9, 2011

I'll third adding the 2, both for greater data integrity, and for slightly better performance.
optimal array sizes for raidz2 are: 4, 6, and 10 drives
optimal array sizes for raidz1 are: 3, 5, and 9 drives

basically power of 2 + 1 for raidz1
power of 2 + 2 for raidz2
power of 2 + 3 for raidz3

If the drive has 4k sectors, then incorrect alignment, might result in degraded performance. I remember alignment being a larger issue when you aren't using an optimal array size. I've not used Solaris 11. For freebsd, I used a gnop command to make the OS see the drives as having 4k sectors before creating the pool.

After creating the pool, you can check to make sure the alignment is correct for 4k sector drives, using the following command:
zdb | grep ashift
If you see: ashift=12 you're good
If you see: ashift=9 there might be some performance loss.

TLER on or off is not nearly so important for ZFS or software RAID, as compared to hardware RAID. Though in most multi-disk use cases, turning it on is better.

Eickst · Dec 9, 2011

stewartjm said:
I'll third adding the 2, both for greater data integrity, and for slightly better performance.
optimal array sizes for raidz2 are: 4, 6, and 10 drives
optimal array sizes for raidz1 are: 3, 5, and 9 drives

basically power of 2 + 1 for raidz1
power of 2 + 2 for raidz2
power of 2 + 3 for raidz3

+1.....

idea · Dec 9, 2011

stewartjm said:
I'll third adding the 2, both for greater data integrity, and for slightly better performance.
optimal array sizes for raidz2 are: 4, 6, and 10 drives
optimal array sizes for raidz1 are: 3, 5, and 9 drives

basically power of 2 + 1 for raidz1
power of 2 + 2 for raidz2
power of 2 + 3 for raidz3

If the drive has 4k sectors, then incorrect alignment, might result in degraded performance. I remember alignment being a larger issue when you aren't using an optimal array size. I've not used Solaris 11. For freebsd, I used a gnop command to make the OS see the drives as having 4k sectors before creating the pool.

After creating the pool, you can check to make sure the alignment is correct for 4k sector drives, using the following command:
zdb | grep ashift
If you see: ashift=12 you're good
If you see: ashift=9 there might be some performance loss.

TLER on or off is not nearly so important for ZFS or software RAID, as compared to hardware RAID. Though in most multi-disk use cases, turning it on is better.

Regarding having 6 disks in RAIDZ1: I understand the optimal number of disks in a vdev for maximum performance, but I'd rather keep my extra 2TB. I'll do some benchmarks and see what I come up with.

WD20EARX's do in fact have the 4K sector sizes (Advanced Format) with the 512-byte emulation. There is not much I can do about it though. FreeBSD offers somewhat of a solution by allowing you to create a zpool with ashift=12, which I could then re-attach to my Solaris host, but FreeBSD won't work with my disk controller. So I'm stuck at 4K for now.

spazoid · Dec 10, 2011

I was not recommending RAIDZ2 due to the speed. In a best case scenario, it would barely be measurable.

The point in RAIDZ2 with 6 disks of 2 TB each is simply data protection. That is potentially a very large amount of data, which will take quite a while to rebuild, and the chance of one of the 5 drives in a RAIDZ dies before the rebuild is complete, is higher than what most people would be comfortable with.

stewartjm · Dec 10, 2011

I concur, I see no use at all for single parity raid levels with modern 1+tb HDDs. Rebuilds on such drives take hours, and the array is entirely too fragile during a rebuild. You've just lost one disk, and now you're putting a heavy workload on the entire array, when it's 1 drive away from total data loss.

If you need reliability and capacity, run double or triple parity raid(raidz2, raidz3, raid6). If you need reliability and performance use raid 1 or raid 10/0+1/etc, or the zfs mirror equivalents.

In this particular scenario, the possible performance improvement is just icing on the cake.

MrPippy · Dec 11, 2011

idea said:
WD20EARX's do in fact have the 4K sector sizes (Advanced Format) with the 512-byte emulation. There is not much I can do about it though. FreeBSD offers somewhat of a solution by allowing you to create a zpool with ashift=12, which I could then re-attach to my Solaris host, but FreeBSD won't work with my disk controller. So I'm stuck at 4K for now.

Solarismen hosts binary-patched versions of Solaris 'zpool' where ashift is hardcoded to 12. Download the binary and use it to create your pool. The Solaris 11 EA (snv_173) version should work.

idea · Dec 12, 2011

stewartjm said:
I concur, I see no use at all for single parity raid levels with modern 1+tb HDDs. Rebuilds on such drives take hours, and the array is entirely too fragile during a rebuild. You've just lost one disk, and now you're putting a heavy workload on the entire array, when it's 1 drive away from total data loss.

If you need reliability and capacity, run double or triple parity raid(raidz2, raidz3, raid6). If you need reliability and performance use raid 1 or raid 10/0+1/etc, or the zfs mirror equivalents.

In this particular scenario, the possible performance improvement is just icing on the cake.

Ah yes, I understand. Sorry about that. I've heard that opinion before. It is certainly something I have to consider. So basically if I want my extra 2TB, I increase my chances of failing a rebuild (which means 2 days of downtime while I restore ~9TB from backups over gigabit ethernet)

idea · Dec 12, 2011

MrPippy said:
Solarismen hosts binary-patched versions of Solaris 'zpool' where ashift is hardcoded to 12. Download the binary and use it to create your pool. The Solaris 11 EA (snv_173) version should work.

Awesome tip. Thanks so much.

idea · Dec 13, 2011

Damn it. I can't believe you guys are trying to convince me to give up TWO disks to parity. A whole extra 2TB disk. You guys are nuts, no way I am doing that. That would leave me only 6TB left out of 10TB. I'll take my chances. This isn't production, it's a home storage server

Eickst · Dec 13, 2011

idea said:
Damn it. I can't believe you guys are trying to convince me to give up TWO disks to parity. A whole extra 2TB disk. You guys are nuts, no way I am doing that. That would leave me only 6TB left out of 10TB. I'll take my chances. This isn't production, it's a home storage server

Just don't get mad when we quote this post a year from now when you are on here complaining about wasting a weekend restoring all that from a backup.

idea · Dec 13, 2011

Eickst said:
Just don't get mad when we quote this post a year from now when you are on here complaining about wasting a weekend restoring all that from a backup.

If this is my fate I will bump this thread for sure

I now have a RAID-Z zpool with ashift=12 set up, looks like it's going good

SadTelevision8558 · Dec 14, 2011

idea said:
Damn it. I can't believe you guys are trying to convince me to give up TWO disks to parity. A whole extra 2TB disk. You guys are nuts, no way I am doing that. That would leave me only 6TB left out of 10TB. I'll take my chances. This isn't production, it's a home storage server

Hey. I went with raidz2 on a 6 disk setup. I abandoned it and went to 10 disks instead. However, I decided to wait a bit because I wanted to migrate to better RAID cards and as a result HD prices shot through the roof.

Consider moving to 10 or risk a lot of data. 1/3 for redundancy is more than enough. I'm comfortable with 20% redundant so 10 disks was good.

idea · Dec 14, 2011

No can do, only have capacity in this chassis for 6 disks. I wouldn't want to spend the money on more drives anyway, plus the power they need and heat they generate

Can anyone post articles stating that its a bad idea to run RAIDZ/RAID5 with more than a few 2TB drives?

I'm 99% sure that if one drive fails the other 5 of my WD20EARX's can handle a rebuild. If it cannot, I have backups

SadTelevision8558 · Dec 15, 2011

question is how long does a rebuild take with 6 x 2TB drives?

stewartjm · Dec 15, 2011

I originally built my 10 drive raidz2 with 8 F4s and 2 F3s. Later I replaced the F3s with F4s, one at a time. They array was around 25% full when I replaced the disks.

the first replace command:

Code:

# zpool replace home6 /dev/ada1 /dev/ada3

status just after first replace command was run:

Code:

# zpool status
  pool: home6
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h2m, 1.21% done, 4h0m to go
config:

        NAME           STATE     READ WRITE CKSUM
        home6          ONLINE       0     0     0
          raidz2       ONLINE       0     0     0
            ada0       ONLINE       0     0     0  15.5M resilvered
            replacing  ONLINE       0     0     0
              ada1     ONLINE       0     0     0
              ada3     ONLINE       0     0     0  6.82G resilvered
            da3        ONLINE       0     0     0  15.3M resilvered
            da2        ONLINE       0     0     0  15.1M resilvered
            da1        ONLINE       0     0     0  15.4M resilvered
            da0        ONLINE       0     0     0  15.2M resilvered
            da7        ONLINE       0     0     0  15.4M resilvered
            da6        ONLINE       0     0     0  15.2M resilvered
            da5        ONLINE       0     0     0  15.4M resilvered
            da4        ONLINE       0     0     0  15.2M resilvered

Status, after the 2nd replace completed:

Code:

# zpool status
  pool: home6
 state: ONLINE
 scrub: resilver completed after 8h14m with 0 errors on Thu Jan  6 09:07:23 2011
config:

        NAME        STATE     READ WRITE CKSUM
        home6       ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            ada0    ONLINE       0     0     0  1.58G resilvered
            ada2    ONLINE       0     0     0  393G resilvered
            da3     ONLINE       0     0     0  1.57G resilvered
            da2     ONLINE       0     0     0  1.56G resilvered
            da1     ONLINE       0     0     0  1.57G resilvered
            da0     ONLINE       0     0     0  1.56G resilvered
            da7     ONLINE       0     0     0  1.57G resilvered
            da6     ONLINE       0     0     0  1.56G resilvered
            da5     ONLINE       0     0     0  1.58G resilvered
            da4     ONLINE       0     0     0  1.57G resilvered

Those were the only logs I kept of that disk replacement. I'm pretty sure it took ~8 hours per disk.

I would guess array size only plays a minor role in resilvering time. Disk size/speed should dominate, since the disks can be read in parallel, no matter how many there are(To a point anyways). As the array fills up, resilvers should become slower, since there is more data that needs to be re-arranged.

So, in theory, any raidzX array with 2TB "green" disks, that was 25% full, would probably take around 8 hours to resilver.

It's possible that it might be slower, if a disk failed outright, instead of being replaced. Heavy use of the array while it is resilvering, should make it take longer, possibly significantly so. For example, if a disk failed in the middle of a backup, or a busy day.

This shows another advantage of RAIDZ2/3 over RAIDZ1, you can be much more confident of success, when swapping disks, even when the array is perfectly functional. Like for instance if you wanted to up the capacity of an existing array by replacing all the disks 1 by 1.

Though admittedly, in that case, I'd almost always just build a new array and copy everything over, it should be faster and less error prone. Of course you can't do that if you don't have enough extra drive bays and controller channels.

About to do "zpool create raidz" on 6x WD20EARX drives. Any last words from anyone?

idea

Gawd

spazoid

Limp Gawd

danswartz

2[H]4U

stewartjm

Limp Gawd

Eickst

[H]ard|Gawd

idea

Gawd

spazoid

Limp Gawd

stewartjm

Limp Gawd

MrPippy

n00b

idea

Gawd

idea

Gawd

idea

Gawd

Eickst

[H]ard|Gawd

idea

Gawd

SadTelevision8558

Limp Gawd

idea

Gawd

SadTelevision8558

Limp Gawd

stewartjm

Limp Gawd