ZFS boot/zil/l2arc all off of one SSD?

AceXsmurF · May 29, 2012

First off I know this isn't ideal in any sense of the word, but right now I have 1 128gb ssd and 3 2tb hdds that I am trying to use in this ZFS box.

I looked around a bit and it sounds like it is possible to use a single drive as a ZIL and l2arc.
http://constantin.glez.de/blog/2011...uestions-about-flash-memory-ssds-and-zfs#both

I have everything installed and running right now with the boot on the SSD(10gb partition), and the 3 2tb hdds in a zfs pool.

Right now I am looking for more information about how to make the 2 more partitions on that SSD and exposing them for a ZIL and l2arc. I looked at the format command and did some fdisking to create 2 more solaris2 partitions on that ssd, but I am not able to figure out how to expose those two partitions so I can attach them to the zpool as a ZIL and l2arc. Anyone have any recommendations?

jwcalla · May 29, 2012

You don't really want a ZIL, do you?

AceXsmurF · May 29, 2012

ZIL probably isnt worth it, but l2arc would be very nice. Since this storage is being used as a vmware lab storage for a couple tests over NFS.

dilidolo · May 29, 2012

AceXsmurF said:
ZIL probably isnt worth it, but l2arc would be very nice. Since this storage is being used as a vmware lab storage for a couple tests over NFS.

Over NFS? Then zil is worth it but L2ARC may not.

AceXsmurF · May 29, 2012

Well it would be nice to be able to have both, and then figure out by actually testing to see what differences I really see.

dilidolo · May 29, 2012

Unless you disable ZIL, without testing, I can tell you now ZIL makes huge difference for NFS. L2ARC is nice to have unless you want to turn on dedup (big no), ZIL is must have for NFS.

thedge · May 30, 2012

AceXsmurF said:
ZIL probably isnt worth it, but l2arc would be very nice. Since this storage is being used as a vmware lab storage for a couple tests over NFS.

For NFS its just the opposite, ZIL is very very worth it unless its a pool of SSDs.

danswartz · May 30, 2012

ZIL is a showstopper for NFS writes only if the client does sync writes. Unfortunately, ESXi does force sync, so you either need a ZIL or disable sync altogether or your write performance will suck,

ItsTooHot · May 30, 2012

Oh cool. Thanks danswartz. I was just about to ask why, and you answered that question before I had a chance to ask it.

jwcalla · May 30, 2012

danswartz said:
ZIL is a showstopper for NFS writes only if the client does sync writes. Unfortunately, ESXi does force sync, so you either need a ZIL or disable sync altogether or your write performance will suck,

When you say "suck", how much suck are we talking about?

And am I the only one who gets skeeved a bit about async NFS writes? I'm the only person on my lonely LAN and even then I'm wary of doing async writes.

thedge · May 30, 2012

jwcalla said:
When you say "suck", how much suck are we talking about?

And am I the only one who gets skeeved a bit about async NFS writes? I'm the only person on my lonely LAN and even then I'm wary of doing async writes.

The suck goes up dramatically with the more VMs you run and the more disk writing they do, its pretty workload dependent at least in my experience.

Do you want to run a bunch of VMs or have VMs that do a lot of writing?

danswartz · May 30, 2012

jwcalla said:
When you say "suck", how much suck are we talking about?

And am I the only one who gets skeeved a bit about async NFS writes? I'm the only person on my lonely LAN and even then I'm wary of doing async writes.

Just going off memory here, but I recall sustained throughput being 1/3 to 1/4 the speed

AceXsmurF · May 30, 2012

So I guess my original question stands.

Is there a way to use a single SSD for boot partition, ZIL, and l2arc?

madrebel · May 30, 2012

if all you're doing is a lab then there is no reason. disable sync globally and just run your labs. ZFS can use partitions but performance will be slightly worse than if you just feed it whole disks.

_Gea · May 30, 2012

AceXsmurF said:
So I guess my original question stands.

Is there a way to use a single SSD for boot partition, ZIL, and l2arc?

You can slice/partition disks, you can even create disks from files to use for Zil and ARC.
You can, you should not - nothing to win.

ALWAYS use whole disks!

obrith · May 30, 2012

Gea, can you provide references for that?

I assure you there is much to gain from partitioning in some situations... for SSD's you MASSIVELY increase your lifespan only using part of the disk as well as gain considerable performance on lots of models. Intel has several white papers suggesting you should use only part of your SSD (partitions or HPAs) in Enterprise or heavy-use situations. Look at the section "Over Provision" on this one for example: http://cache-www.intel.com/cd/00/00/49/23/492354_492354.pdf

You can also drastically reduce seek times in spinning disks by partitioning it to only use the inside tracks ("short stroke"). This used to be a very common tactic for IOPS sensitive workloads.

I'm not advocating splitting up a disk such as the OP is asking about, but you're extremely adamant about never partitions when they do work and can provide benefits. Just because it's a PITA on Solaris doesn't mean it's taboo.

Billy_nnn · May 31, 2012

AceXsmurF said:
First off I know this isn't ideal in any sense of the word, but right now I have 1 128gb ssd and 3 2tb hdds that I am trying to use in this ZFS box.

I looked around a bit and it sounds like it is possible to use a single drive as a ZIL and l2arc.
http://constantin.glez.de/blog/2011...uestions-about-flash-memory-ssds-and-zfs#both

I have everything installed and running right now with the boot on the SSD(10gb partition), and the 3 2tb hdds in a zfs pool.

Right now I am looking for more information about how to make the 2 more partitions on that SSD and exposing them for a ZIL and l2arc. I looked at the format command and did some fdisking to create 2 more solaris2 partitions on that ssd, but I am not able to figure out how to expose those two partitions so I can attach them to the zpool as a ZIL and l2arc. Anyone have any recommendations?

Solaris needs a Solaris2 partition, but ZFS doesn't - and really there should only be one Solaris2 partition per drive.

If you still want to do this, try using gparted instead of format/fdisk.
Create 2 new unformatted partitions in the free space (eg p2 and p3), save changes in gparted, and then add these directly to your pool using the "zpool add" command. The ZIL log need not be that big - a few GB usually is fine!

Can't say what performance increases you'll get, but it might be worth a try!!

AceXsmurF · May 31, 2012

Thanks that worked!

Zarathustra[H] · Oct 16, 2012

AceXsmurF said:
Thanks that worked!

So what has your experience been with this setup over the last few months?

I'm considering doing something similar, but my setup is different than yours.

I'll be running FreeNAS as a guest on my ESXi server. Controller will be passed through natively for my 4 3TB drives, but for ZIL and L2ARC, I figured I'd just add VMWARE virtual disks on my existing SSD used for boot and datastore on the server, until I save up some cash and get dedicated drives for the tasks.

Long term I'm thinking a 20GB Intel 313 drive for the ZIL (since it is SLC) and probably a consumer 128GB MLC drive for the L2ARC.

My plan is to create a iSCSI target on the FreeNAS install, and have another guest on the same ESXi server hooked up directly to it via a vswitch that is not connected to the outside world.

brutalizer · Oct 16, 2012

I would also like to know how much faster it gets. I am considering doing something similar...

phendry · Jan 24, 2014

AceXsmurF said:
Thanks that worked!

Did you have any joy with this configuration?

Nex7 · Jan 24, 2014

I find it somewhat concerning that the prevailing wisdom seems to be you only need a slog device for NFS, or that only with NFS does it become a necessity.

It is just as necessary on iSCSI. If you don't think that, then you've been running iSCSI with writeback cache enabled, and you're not power-safe from data loss. You've been doing so because at the moment most implementations do that by default, as sad and scary as that is. You'll get just as much of a performance benefit having a fast slog on iSCSI as NFS, assuming you've set up iSCSI to be safe. If you haven't, and you don't care (data isn't important, you keep regular backups and can handle service interruptions, etc, etc) then you should set NFS mount options to 'async', as that's the equivalent to iSCSI with writeback cache enabled.

For home use, I can't imagine it'd be overly bad to mix use an SSD, but do bear in mind slog traffic is nasty -- and can, depending on the SSD and firmware -- significantly reduce the life expectancy of the SSD. I'd more wonder if it's worth it, simply because in my experience l2arc is rarely worth it. In general, I don't even recommend l2arc anymore, simply because its hit rate is often so much poorer than arc that even with the 'balloon' effect you get with it (X GB of RAM used for L2ARC equals much more than X extra read cache, thus a balloon over what you'd get with RAM alone) it ends up having a lower hit rate than if you'd just left it out and let ARC use that RAM. Of course, I do live in a world where the smallest boxes have 64 GB of RAM, so this logic may not be applicable in the home space with 4-16 GB a more common maximum, but as a lot of the reason for this advice hinges on how L2ARC is fed, depending on your workload at home it may still apply.

danswartz · Jan 24, 2014

Nex7, are comstar targets written to in sync mode then? That's what absolutely kills nfs write performance from esxi, since it forces sync mode. I do hourly replications and daily backups of my VMs so I am not worried about disabling sync on those datasets.

Nex7 · Jan 24, 2014

danswartz said:
Nex7, are comstar targets written to in sync mode then? That's what absolutely kills nfs write performance from esxi, since it forces sync mode. I do hourly replications and daily backups of my VMs so I am not worried about disabling sync on those datasets.

No, just the opposite, which is why people think iSCSI is faster. Out of the box on illumos derivatives I'm aware of, COMSTAR enables "Write Back Cache" by default on LU's. This "feature" basically says: unless the client calls for synchronous writes, I will assume asynchronous writes.

This sounds sane, until you realize nearly no iSCSI client does this. In fact, in some hypervisor-in-the-middle worlds (like VMware), they may actively eat the sync flag from the writes at the VM layer before passing it on to the iSCSI target. In 99% of the environments I walk into that were unaware of this issue and using iSCSI due to a perceived performance improvement, even in environments where they had spent $1000's on STEC ZuesRAM's for slog use, they were not utilizing the slog at all and were completely unprotected from in-transit/uncommitted data loss on power event, etc. I'd also go on to state that easily 75% (probably way higher but I'll be conservative) of those environments, after disabling the write back cache setting or setting sync=always on the zvolvs, found that the performance of their iSCSI LUN's was now in-line with their NFS shares (or the data they had from testing it in the past before going iSCSI on the assumption it was faster). In maybe 25% of those environments, their NFS shares or NFS performance data found it was /faster/ than iSCSI once the playing field was even.

NFS' default is sane - assume synchronous unless told otherwise. COMSTAR's default is exactly the opposite - assume asynchronous unless told otherwise. And many people don't realize it. IMHO it is where this "common wisdom" that iSCSI is way faster than NFS is coming from. They don't realize they're comparing safe to unsafe.

edit: and, for the record, if you do hourly replications (that are consistent & good!) and daily backups and are in an environment where a power event causing some number of disruptions in service after coming back up due to lost files, corrupted filesystems requiring fsck's, etc, then by all means, disable sync -- set sync=disabled on your datasets and call it a day. You will no longer require a slog device, AND your data devices won't do double-duty as a slog space for the ZIL. Everything will go only into the in-memory ZIL construct then out to disk as part of a txg commit and that's it. Honestly if you're in a small business/office environment and doing the aforementioned replications & backups, I don't see that it's that insane. There's any number of scenarios where I wouldn't cringe to hear it was being done this way. There are also a huge number of scenarios where it is being done this way that are terrifying and in a few cases quite honestly likely illegal or in breach of contract and it's just that nobody is aware.

danswartz · Jan 24, 2014

Thanks for the explanation. I was not clear though (my fault). What I meant was: if you disable the writeback cache, does it do sync writes by default? And if so, is it as god-awful slow as NFS? To answer your implied question: I am using veeam backup and replication, which uses vmware snapshots in conjunction with vmware tools filesystem quiescence.

Nex7 · Jan 24, 2014

danswartz said:
Thanks for the explanation. I was not clear though (my fault). What I meant was: if you disable the writeback cache, does it do sync writes by default?

Effectively, yes.

danswartz said:
And if so, is it as god-awful slow as NFS?

Yes. More so, if going defaults, as your zvol will have a significantly lower average block size than your NFS share.

danswartz said:
To answer your implied question: I am using veeam backup and replication, which uses vmware snapshots in conjunction with vmware tools filesystem quiescence.

Good, then you should be in good shape, assuming you're OK with having a period of disruption and controlled chaos after a power loss (which you may very well be).

danswartz · Jan 24, 2014

Yes. This is a SOHO setup, and I'm really the only user (well, my wife too...)

ZFS boot/zil/l2arc all off of one SSD?

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

Limp Gawd

Limp Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

Limp Gawd

2[H]4U

[H]ard|Gawd

Gawd

Supreme [H]ardness

Limp Gawd

Limp Gawd

[H]ard|Gawd

Extremely [H]

[H]ard|Gawd

n00b

Weaksauce

2[H]4U

Weaksauce

2[H]4U

Weaksauce

2[H]4U