Introduction
I'm looking for some feedback on my upcoming ZFS build, but I'll get to that later. I have a very good understanding of the mechanics of ZFS, and the general requirements for each subsystem therein. My concern here is specifically the ZFS Intent Log, or ZIL.
Update One
Update Two
Update Three
Gallery
Background on the ZIL
The ZIL is used to store the intended changes before they are committed to disk in the zpool. Writes to the ZIL act like somewhat of a cache: Random writes on the ZIL can be committed to the zpool as contiguous writes. By default, the ZIL is interleaved with the rest of the storage space on the zpool, but you can assign a specific device to host the ZIL. The device used to host the ZIL is known as the SLOG. Ultimately, the IOPS of the SLOG are a bottleneck for the zpool: changes must first be made to the ZIL, then copied to the zpool, then committed. If the SLOG can't keep up with the zpool, the aggregate performance will decrease. If performance decreases enough, ZFS will (if I understand correctly) abandon the dedicated SLOG.
NOTE: Should the ZIL be lost, ZFS may have difficulty recovering the zpool. For this reason, SLOGs should be mirrored if possible. EDIT: Apparently, this behavior has been fixed in v28, so mirroring may not be as critical.
So, which devices make for the best SLOG? By conventional thinking, three classes of devices, in order of maximum IOPS:
Comparisons
Deciding between these three technologies generally boils down to budgetary constraints: NVRAM devices are exceedingly expensive ($30+ per GB), and somewhat rare. SLC SSDs cost $10-20 per GB. SAS drives are usually around $3 per GB.
NVRAM devices are also the most volatile: any power outage that lasts more than a few hours will jeopardize the contents of the ZIL, the loss of which can corrupt the entire zpool. However, they are not worn by write cycles. All SSDs have a limit to how many writes they can sustain, and while they have different strategies to minimize writes (compression, caching, over-provisioning, TRIM), it is writes themselves that put these devices into the recycling pile, and since the ZIL is specifically for synchronous writes, SSDs can be rapidly consumed by the very role we assign them.
While SAS drives are prone to read errors, lost DRAM cache, and insufficient IOPS, they are able to sustain loss-of-power indefinitely, and a near-infinite number of write operations. They just aren't quite fast enough to make an ideal SLOG.
My Thoughts
Interleaving the ZIL with the zpool data does reduce zpool performance; HDD heads do not need to seek from data to ZIL, back to data, then back to ZIL, just to perform a simple change. The easiest solution is to move the ZIL to a dedicated HDD in the zpool, which eliminates the seek bottleneck and IOPS throughput, but also reduces ZIL throughput dramatically. Using a special devices for the ZIL role makes a lot of sense, but enterprise-grade solutions like NVRAM are not cost-effective outside of fortune-500 companies. SSDs will need to be replaced regularly (6-18 month lifespan), at a cost of at least $350 per drive, or minimum $700 to maintain a mirror.
Proposal
Two 2.5" 10-15kRPM SAS drives in a ZFS mirror will have greater IOPS than a normal 3.5" 5-7kRPM commodity storage drive. If WD Velociraptors were used, a pair of the 150GB model would be sufficient for capacity, and could bring the cost down below $200 total.
The Questions
Would the 2x SAS configuration I proposed above be sufficient IOPS for ZFS to make use of that vdev as a SLOG? Would it offer significant performance over the interleaved ZIL?
If I have any misunderstanding of ZFS mechanics, please correct me. I want to understand this technology inside and out.
I'm looking for some feedback on my upcoming ZFS build, but I'll get to that later. I have a very good understanding of the mechanics of ZFS, and the general requirements for each subsystem therein. My concern here is specifically the ZFS Intent Log, or ZIL.
Update One
Update Two
Update Three
Gallery
Background on the ZIL
The ZIL is used to store the intended changes before they are committed to disk in the zpool. Writes to the ZIL act like somewhat of a cache: Random writes on the ZIL can be committed to the zpool as contiguous writes. By default, the ZIL is interleaved with the rest of the storage space on the zpool, but you can assign a specific device to host the ZIL. The device used to host the ZIL is known as the SLOG. Ultimately, the IOPS of the SLOG are a bottleneck for the zpool: changes must first be made to the ZIL, then copied to the zpool, then committed. If the SLOG can't keep up with the zpool, the aggregate performance will decrease. If performance decreases enough, ZFS will (if I understand correctly) abandon the dedicated SLOG.
NOTE: Should the ZIL be lost, ZFS may have difficulty recovering the zpool. For this reason, SLOGs should be mirrored if possible. EDIT: Apparently, this behavior has been fixed in v28, so mirroring may not be as critical.
So, which devices make for the best SLOG? By conventional thinking, three classes of devices, in order of maximum IOPS:
1) NVRAM: battery-backed DRAM devices. The IOPS here are off the charts, especially if the device uses the PCIe bus directly, rather than SATA.
2) SLC SSD: Enterprise-grade Flash NAND-Flash Devices. The IOPS are much less than NVRAM, but still much better than traditional HDDs. A concern with these is the onboard DRAM cache: any writes to the ZIL pending in local cache will be immediately lost in the event of unexpected loss-of-power. Ideally, the cache should be backed with a supercapacitor or dedicated battery. 3rd-gen Intel SLC SSDs should meet this requirement.
3) 10k+ RPM SAS: Enterprise-grade HDDs. These drives have the highest IOPS of conventional HDD designs as they maximize IOPS and seek times over throughput (as throughput can be aggregated, but IOPS and seeking occur per drive). Faster than consumer-grade HDDs, but 2 orders of magnitude slower than SLC SSDs.
2) SLC SSD: Enterprise-grade Flash NAND-Flash Devices. The IOPS are much less than NVRAM, but still much better than traditional HDDs. A concern with these is the onboard DRAM cache: any writes to the ZIL pending in local cache will be immediately lost in the event of unexpected loss-of-power. Ideally, the cache should be backed with a supercapacitor or dedicated battery. 3rd-gen Intel SLC SSDs should meet this requirement.
3) 10k+ RPM SAS: Enterprise-grade HDDs. These drives have the highest IOPS of conventional HDD designs as they maximize IOPS and seek times over throughput (as throughput can be aggregated, but IOPS and seeking occur per drive). Faster than consumer-grade HDDs, but 2 orders of magnitude slower than SLC SSDs.
Comparisons
Deciding between these three technologies generally boils down to budgetary constraints: NVRAM devices are exceedingly expensive ($30+ per GB), and somewhat rare. SLC SSDs cost $10-20 per GB. SAS drives are usually around $3 per GB.
NVRAM devices are also the most volatile: any power outage that lasts more than a few hours will jeopardize the contents of the ZIL, the loss of which can corrupt the entire zpool. However, they are not worn by write cycles. All SSDs have a limit to how many writes they can sustain, and while they have different strategies to minimize writes (compression, caching, over-provisioning, TRIM), it is writes themselves that put these devices into the recycling pile, and since the ZIL is specifically for synchronous writes, SSDs can be rapidly consumed by the very role we assign them.
While SAS drives are prone to read errors, lost DRAM cache, and insufficient IOPS, they are able to sustain loss-of-power indefinitely, and a near-infinite number of write operations. They just aren't quite fast enough to make an ideal SLOG.
My Thoughts
Interleaving the ZIL with the zpool data does reduce zpool performance; HDD heads do not need to seek from data to ZIL, back to data, then back to ZIL, just to perform a simple change. The easiest solution is to move the ZIL to a dedicated HDD in the zpool, which eliminates the seek bottleneck and IOPS throughput, but also reduces ZIL throughput dramatically. Using a special devices for the ZIL role makes a lot of sense, but enterprise-grade solutions like NVRAM are not cost-effective outside of fortune-500 companies. SSDs will need to be replaced regularly (6-18 month lifespan), at a cost of at least $350 per drive, or minimum $700 to maintain a mirror.
Proposal
Two 2.5" 10-15kRPM SAS drives in a ZFS mirror will have greater IOPS than a normal 3.5" 5-7kRPM commodity storage drive. If WD Velociraptors were used, a pair of the 150GB model would be sufficient for capacity, and could bring the cost down below $200 total.
The Questions
Would the 2x SAS configuration I proposed above be sufficient IOPS for ZFS to make use of that vdev as a SLOG? Would it offer significant performance over the interleaved ZIL?
If I have any misunderstanding of ZFS mechanics, please correct me. I want to understand this technology inside and out.
Last edited: