Finding suitable ZIL and L2ARC drives

dogbait

n00b
Joined
Jan 31, 2005
Messages
43
We have a small ESXi cluster in our office running on a FreeNAS array with sync disabled...

I want to put a ZIL and L2ARC cache on the box and was wondering if purchasing two Samsung 840 Pro drives and partitioning both so p1 is mirrored for the ZIL and p2 is striped for the L2ARC.

Does that seam feasible?

The server has 16GB of RAM and 16x 1TB SATA 7200rpm disks connected to an M1015 controller.
 
I don't think the 840 pro is up to the task if you want a useful ZIL, it might be okay for L2ARC though. It's also not considered a best practice to slice up disks for multi-purposing since you'll be cutting down on their usefulness as a ZIL while they're servicing L2ARC requests.

From looking at the napp-it thread it looks like the Intel S3700 is the cheapest SSD that would provide acceptable latency and consistently for ZIL in an enterprise environment.


Do you have the 16x SATA drives broken up into more than one RAID set or is it all one big array?
 
x16 drives in a single RAID-Z3 configuration.

Thanks for the tip on the Intel S3700. Seems a great drive.

I purchased an Intel 320 40GB for the ZIL but didn't read the specs properly - it only supports 40MB/s write!

The S3700 100GB edition supports 200MB/s writes whereas the 200GB edition supports 365MB/s writes. Is there any advantage to be had by getting the faster drive?

For the L2ARC would a 128GB Samsung 840 Pro suffice?

Thanks.
 
840 or 840 Pro would both be fine as l2arc, the more the better in this case as the single raidz3 will be killing the iops available to you.
 
I purchased an Intel 320 40GB for the ZIL but didn't read the specs properly - it only supports 40MB/s write!

40MB/s is more than you might think as it pertains to a ZIL device. As an example if your record size is 8K then 40MB/s is 5120 IOPs.
 
40MB/s is more than you might think as it pertains to a ZIL device. As an example if your record size is 8K then 40MB/s is 5120 IOPs.

My understanding is that all synchronous data is written to the ZIL device, though. So even streaming writes would be limited to 40 MB/s.

OP: Check out zilstat and see if your workload takes advantage of the ZIL before worrying about it too much. (With sync turned off, you don't hit the ZIL, so turn it on temporarily to assess the performance.)
 
indeed yes, 1 queue depth sync requests. so, 40MBs @8K 1QD would be very impressive, there aren't many drives out there that can accomplish that.

at larger block sizes flash SSDs do better and some drives even pull past the zeusram drives at block sizes larger than 32K.

my round about point is i doubt the 40MB/s ability of that drive will ever be a bottleneck or the limiting factor. the drive's ability to respond to 1QD requests will likely fall down before it runs out of bandwidth.
 
Thanks for all the replies.

Didn't realize that the L2ARC took such a toll on the ARC. A lot to consider when speccing a ZFS box. Seems a real balancing act.

I have 4x4GB ECC Unbuffered and the board only has 4x memory slots so that's a bit of a bummer. The ZIL I purchased originally is an Intel 320 40GB SSD. Problem with it is that the writes are only 40MB/s. As a result I'm seeing NFS writes of 30MB/s with sync and 95MB/s with async. The slow write speed of the ZIL definitely seems to be hurting performance. Should've read the specs instead of the reviews which only looked at the much faster 80GB+ models.

I'm hoping to build a box which serves VMs to vSphere over NFS and is able to saturate a dual Gigabit aggregated pipe. Hence writes will be synchronous.

My feeling is that my next steps should be:
1. Get a faster ZIL - the Intel S3700 100GB SSD looks a good bet with 200MB/s writes. The 200GB model however offers 365MB/s writes but I'm guessing the 100GB model will be fast enough to saturate two Gigabit links?

2. Upgrade memory to 32GB so ARC doesn't suffer with the addition of L2ARC.

3. Get an L2ARC if performance is still lacking.

That sound about right?
 
After adding an Intel S3700 100GB partitioned as an 8GB ZIL I'm only seeing writes hitting 40MB/s with sync enabled over NFS. With sync disabled they're closer to 90MB/s.

Am I missing something?
 
What performance figures do you get without the ZIL SSD?

With large sequential I/O, you may find sync writes are just as quick (or slow depending on your viewpoint) without a separate ZIL log device.
 
Well, even more than zil speeds, you have many issues here.

First, async nfs, can write as fast as the network will allow it. Sync nfs will be limited to network latency, as each write will have to wait for a confirmation, even if that confirmation was send as fast as possible, the sending side will not be sending more write data, till it receives that confirmation, so you will never get full network speed.

Next add in some latency to write to your zil, and you have cut your speeds again some more, thought hopefully not enough to be noticable, over the network delay.
 
What performance figures do you get without the ZIL SSD?

3MB/s... :eek:


Well, even more than zil speeds, you have many issues here.

First, async nfs, can write as fast as the network will allow it. Sync nfs will be limited to network latency, as each write will have to wait for a confirmation, even if that confirmation was send as fast as possible, the sending side will not be sending more write data, till it receives that confirmation, so you will never get full network speed.

Next add in some latency to write to your zil, and you have cut your speeds again some more, thought hopefully not enough to be noticable, over the network delay.

OK, so what kind of changes would I need to make to the below spec to saturate a gigabit link? (hypothetical question since the 40MB/s I'm getting is fine for my small ESXi setup).

System spec is:
  • Xeon 1220L v2
  • 16GB DDR ECC Unbuffered
  • 2x LSI 9211-8i HBAs
  • ZIL - Intel S3700 100GB SSD
  • L2ARC - Samsung 840 Pro 256GB SSD
 
Well, gigabit speeds, sync, with an slog, the MAX speed you will be able to get is 45.897MB/sec, according to my math, due to network latency and ssd latency.

If you where to remove the ssd latency, I'm getting numbers around 88MB/sec. with gigabit network turnaround time being about .04ms, and ssd latency being about the same.

LIke I said above, this leaves you with two options, don't use sync, or decrease your latency.

Only options for latency is faster network, or faster ssd. I'm not sure they make any .01ms latency ssd's.
 
Well, gigabit speeds, sync, with an slog, the MAX speed you will be able to get is 45.897MB/sec, according to my math, due to network latency and ssd latency.

If you where to remove the ssd latency, I'm getting numbers around 88MB/sec. with gigabit network turnaround time being about .04ms, and ssd latency being about the same.

LIke I said above, this leaves you with two options, don't use sync, or decrease your latency.

Only options for latency is faster network, or faster ssd. I'm not sure they make any .01ms latency ssd's.

Hi patrickdk, just curious but what equations are you using to derive those figures?

thanks!
 
I'm very curious why an 840 pro isn't considered a useful drive for ZIL? Also, are there any better alternatives to the Intel S3700 for ZIL? I'm looking for the best possible within reason.
 
It has all the speed needs to be worthy of a slog. But it does lack a supercap, so you can't actually be guarenteed writes will be written on a powerloss event.
 
I'm somewhat curious about this. A missing supercap is in theory nothing that should keep an SSD from commiting a block to stable storage. It will just take longer than an SSD with a supercap. I did several power-loss tests with an 840 Pro and nothing I observed would justify to say it does not properly honour flush commands.
 
I'm somewhat curious about this. A missing supercap is in theory nothing that should keep an SSD from commiting a block to stable storage. It will just take longer than an SSD with a supercap. I did several power-loss tests with an 840 Pro and nothing I observed would justify to say it does not properly honour flush commands.
in theory yes, youre correct. since the SLOG is single queue depth and operations involving SLOG will be o_sync type workloads then yes any single IOPhas to first be written to ZIL and should in theory be there on reboot regardless of supercap.

however, some SSDs do strange stuff internal to their own firmwares. some SSDs will lie and say "yeah got it, wrote it to disk" when really its cached and the drive is still looking for blocks in the freespace map. granted this happens extremely fast which is why some drives allow it but still there is a possibility of data loss on a power event.

i personally haven't seen it but some hardware partners of mine have run into quite a few non supercap drives that had issues in testing.
 
in theory yes, youre correct. since the SLOG is single queue depth and operations involving SLOG will be o_sync type workloads then yes any single IOPhas to first be written to ZIL and should in theory be there on reboot regardless of supercap.

however, some SSDs do strange stuff internal to their own firmwares. some SSDs will lie and say "yeah got it, wrote it to disk" when really its cached and the drive is still looking for blocks in the freespace map. granted this happens extremely fast which is why some drives allow it but still there is a possibility of data loss on a power event.

i personally haven't seen it but some hardware partners of mine have run into quite a few non supercap drives that had issues in testing.

With that being said does that make the 840 pro a suitable cheap ZIL drive in a non production environment if someone where to pack on some lithium batteries into the sata power pins as well as a battery backup for the server in the event of power loss? Or is there more latency on this drive then the SF3700 100/200gb drives
 
With that being said does that make the 840 pro a suitable cheap ZIL drive in a non production environment if someone where to pack on some lithium batteries into the sata power pins as well as a battery backup for the server in the event of power loss? Or is there more latency on this drive then the SF3700 100/200gb drives

This is exactly what i'm curious about. If i'm prepared to take the risk of a potential issue during a power failure is the 840 pro the fastest/best drive available for ZIL?
 
i wouldn't fuck with the PINs. you're better off just buying a suitable inexpensive UPS just for the filer or JBOD that houses the SLOG drives.

any SSD is 'suitable' for SLOG duties in a non prod environment depending on what you classify as 'suitable'. wintec used to make an 8GB SLC that could only do something like 2K IOPs and even that would smoke mechanical drives right.

if its non prod though you really don't 'need' a SLOG. using my previous example of an inexpensive UPS attached and then zfs set sync=disabled on the pool/datasets functionally accomplishes the same thing. granted even less data integrity since there is no way to replay the data but you already said non prod right? i haven't seen many non prod environments were data integrity ranks highly.
 
This is exactly what i'm curious about. If i'm prepared to take the risk of a potential issue during a power failure is the 840 pro the fastest/best drive available for ZIL?

If you're willing to risk your data, why not just run with sync disabled?
 
Well I would mess with the pins on the plastic plug for the SATA power, not he drive itself. I already have a nice APC UPS so I am not concerned about it.

Production or not, I value the servers that I run, especially playing in SQL and Exchange as I used them for my own personal use.
--

Running with Sync disabled I believe is more risky then using the 840 is it not? Hell, why not do both if you truly needs some I/O :p
 
My position: since I am only running HA anyway, any failure of my esxi server will result in potential data loss anyway. To mitigate that, I run veeam B&R to make a replica every hour on the hour. So with sync=disabled, if I get some awful crash, I can just roll back the VM(s) and not lose much. Beats spending a ton of dough or having suck performance. Your YMMV of course.
 
That sounds like a good deal, however I do not have the resources for VEAM licensing (well i have m mcals for regular backups for family and a few business relations) - so that option isn't feasible in a whitebox environment with limited storage/ using a small ZFS NAS for everything
 
Running with Sync disabled I believe is more risky then using the 840 is it not? Hell, why not do both if you truly needs some I/O :p

sync=disabled bypasses any SLOG that might be present.

remember ZIL always exists, SLOG doesn't.
 
SLOG is nothing more than storing the ZIL on a dedicated drive. Without SLOG, your ZIL would be stored on the main pool instead. Sync=disabled means the sync writes will not be written to ZIL (intent log) first. Whether you are using SLOG (dedicated ZIL) or using regular ZIL, will not matter in this case. ZIL is used to guarantee the order of writes for synchronous writes. That means, sync write #2 may not be written before sync write #1. They must be kept in order not just with other sync writes, but also in relation to regular (async) writes that occur.

You do not need a ZIL, you can actually disable it. It will not damage or hurt ZFS, but it would damage your data consistency in a numerous of cases. For example, storing an iSCSI image with NTFS-filesystem on ZFS might cause the NTFS-filesystem to be corrupted when disabling sync writes, as the order of sync writes is not guaranteed anymore.

Few drives are suitable as ZIL/SLOG. When looking for (semi-)consumer SSDs, the best is Intel 320 and Intel S3500. Second comes Crucial M500. All these three support power-safe capacitors. The M500 is inferior to the Intels due to larger DRAM write-back which you will lose upon power failure. This makes the M500 a worse drive for ZIL/SLOG but still much much better than other SSDs. In particular, Samsung SSDs are not suitable for the function of SLOG.
 
Fairly certain ZIL cannot actually be disabled. the intent log is always stored in ram, the SLOG is meant to guarantee the ZIL can be replayed after a crash.

sync=standard over say NFS or sync=always means ZFS will flush to the pool before accepting another write if a SLOG is not present. sync=disabled say 'i dont care about my data in the event of a power loss just send writes as fast as you can'.
 
On BSD platforms there are the tunables.
vfs.zfs.zil_disable
vfs.zfs.zil_replay_disable

However, disabling ZIL altogether is not recommended and I believe this tunable was removed in recent releases. It's not really required now that you can tune sync write behavior on specific filesystems with the 'sync' property.

Disabling the ZIL doesn't cause dataloss in the event of power loss. At least not the ZFS side. If your data contains other filesystems, say an NTFS filesystem on a ZVOL used for iSCSI, then you definitely want/need to obey sync writes (sync=standard). Otherwise, a power failure might cause ZFS to be corrupt-free, but the ZVOL is in a state where the NTFS-filesystem is corrupt.

In most cases however, sync=disabled works just fine for casual files. If during a file transfer the power is interrupted, then you might lose a few seconds more history. But that won't matter anyway since the file transfer would need to be restarted. Data that is not modified, is not affected at all.
 
Back
Top