Opinion - ZFS Spares?

bds1904

Gawd
Joined
Aug 10, 2011
Messages
1,007
So I just had my first drive faliure on ZFS and everything went pretty well. Lost a drive, it showed offline, hot spare took over, pool rebuilt, no data loss and no downtime.

The lowdown on the setup:

13x Hitachi Ultrastar 500gb HDD (ESXi Shared Storage, Sync-Always)
2x 6 Drive RaidZ-2 Pool
1x Hot Spare
2x STEC Mach16 50GB SSD (ZIL-Mirror)​
6x Hitachi Ultrastar 2TB Drives (Media Pool, non-essential data, Sync-Disabled)
1x 5 Drive RaidZ-1 Pool
1x Hot Spare
These 2 pools are not my form of backup, just regular every-day access. Backups of critical data occur every night to a Synology NAS and are replicated offsite.

For a home power-user my personal opinion is always keep a hot spare and keep auto-replace on. Keep a cold spare if you can afford it. I'm okay with keeping hot spare and taking the risk that it'll go bad in place because I keep 2 completely separate backups of the data. RAID (or raid-like setups) are a matter of convenience for me, no downtime is nice. Besides, if I loose multiple disks there is a good chance it's a controller issue anyways. At that point the data would be FUBAR anyways.

That being said, what is everyone's thoughts on spares? Hot spare, cold spare or both? At what point do you increase the amount of spare disks you have on hand? When do you use a hot spare versus a cold spare?
 
On your first pool with 2 vdevs, the hotspare makes sense as it can autoreplace any faulted disk.
On your second pool with 1 vdev, this is not a good idea as with a z2 you would have a better security level, think of it like the "spare is already initialized"
 
I second what _Gea said.

One more thing to consider is that hot spares are not a good idea in some cases.

For example, let's assume you have a storage head server with 2 JBOD shelves attached to it. Each disk in JBOD A has a mirror in JBOD B.

You do not want hot spares in this case because it is possible that one full JBOD will go missing. (Like a cable being disconnected, or a full shelf not powered up). Even if you have a hot spare in the other JBOD, you do not want to start resilvering a disk from the missing JBOD.

You should always ask yourself the question: Can the ZFS start auto-resilvering things I do not want.

Personally, I prefer to have the server email me if there is a problem, and I start all resilvers manually. Best to assume the auto selection of the disk to resilver will fuck up and resilver the wrong disk.
 
Yes, this can be a real mess .
I have seen large pools with several hotspares.
Something flaky happened maybe a cabling, backplane or PSU problem.

The result: A disks fails:, hotspare jumps in, this or other disk fails, next hotspare jumps in
one or the other disk came back.
At the end you have a disaster of failed, resilvering and replaced disks.

If you do not panic, you can fix this with zpool clear or disk remove commands.
But indeed one hotfix is a very good idea for several vdevs, more should be used under admins control only.
 
Last edited:
Back
Top