SSD drive to boot ESXi and Solaris VM?

I imagine that reliability of the SSD is your real priority, as performance will only be affected on boot (which should happen rarely in a server). Once ZFS is running, the SSD will not be read from or written to very often.

As I understand it, Intel SSDs have the lowest failure rate on the market at this time.

I am in a similar position - I need a disk / RAID 1 for esxi / OI, for my upcoming all-in-one napp-it build.

It sounds like in a worst case scenario where this disk fails, you replace it with a new disk and:
1. reinstall esxi
2. reinstall OI/napp-it
3. import pool
4. reconfigure VMs with images on pool that are now available.
5. and you're back up and running
 
You are correct. I am looking at SSD because of reliability plus low power/heat/noise, and it seems to be the middle-ground for cost/reliability between booting a single drive and buying one of those RAID units I posted in the link.

And I'd just like to say that your "worst case scenario" plan is necessary in any case. Even if I had a hardware RAID card with battery-backed cache and enterprise fiber channel drives to boot from, the recovery procedure plan would still be needed. RAIDs can fail. Controllers can fail. Entire systems can fail. The reliability I am looking to get from an SSD boot drive is just to mitigate the risk as much as I can.
 
Honestly, I'd skip the SSD for ESXi and OI. I have higher failures with SSD than normal drives for machines that stay on perpetually. It's not a big performance gain. If you're power is stable then a small traditional drive is just as good for ESXi. ESXi can essentially be booted from usb if need be. The only issue with the All in one is that OI has to be booted from local store before the data/nfs/smb store is available to the other VM's.

Now, if you absolutely want the SSD, go for it. But I'd find the SSD be better used as a reach cache or ZIL log. But you'd have to get an enterprise SSD to be safe with a ZIL log.

Now in terms of backup, you can store a localized image of OI on the local data store as well as the NFS datastore. That way, once you're up and running with ESXi - you can pretty much grab the working image and be up and running again.
 
Good points. Additionally, if you go the all in one route, don't bother with a ZIL, just set sync=disabled for the data pool you create. Given that everything including the SAN is running on the same physical hardware, there's no likely scenario where the SAN crashes but nothing else (including ESXi host) does. And, if you go this route, make sure to put the SAN datastore on NFS, not iSCSI. I found that having VMs on the SAN datastore set to auto-start, didn't work well with iSCSI - basically, by the time OI was up and running, the iSCSI code on ESXi had timed out - the NFS code seems much more robust in that respect... Speaking of ESXi/NFS, another reason to either use a ZIL or sync=disabled - ESXi's NFS client does all writes in sync mode and no way to change that, so your write performance will blow if you don't do this.
 
sync=disabled is a necessary option if not using ZIL. However, the ZIL gives you a performance boost as well and lays things out for the drives better. Basically by aggregating all the writes (essentially in memory), and then batch writing them out, the drives basically get used more in burst/sequential mode and don't get in the way of any potential reads that are also happening.

We find the read performance is slightly improved on heavily used machines when a ZIL log is in place.

That being said - for most home uses, a ZIL log is a definitely luxury (and even a L2arc for that matter).
 
Hmmm, not doubting you, but do you have a cite for the batching of writes? My understanding was that ZFS did that on its own anyway. Always glad to be proven wrong, though :) As far as a ZIL improving read performance, that is kind of odd, given that (AFAIK) ZFS never reads from the ZIL. Maybe that is corroboration of your statement that running writes thru the ZIL allows optimization?
 
Hmmm, not doubting you, but do you have a cite for the batching of writes? My understanding was that ZFS did that on its own anyway. Always glad to be proven wrong, though :) As far as a ZIL improving read performance, that is kind of odd, given that (AFAIK) ZFS never reads from the ZIL. Maybe that is corroboration of your statement that running writes thru the ZIL allows optimization?

I don't have a benchmark for it, but the "response" while using it seems to corroborate. What I'm talking about is this - if you don't specify a ZIL, ZFS will do this automatically, but it will carve out a section in your main zdev to do it. My understanding is that it will write out to this constantly and then "flush" (for lack of a better term) to the actual datastore every x number of seconds (5 ??). So, what you see is that you see the writes constantly as they are happening, then its flushed out in a more optimized fashion (if possible).

The read performance gained is that when we're running systems with ZIL logs, the drive writes are only every 5 seconds. So the drives are essentially free to handle random (or sequential) reads for any request that comes along. . This could obviously be tied to the fact that you aren't accessing the ZIL on the main datastore (as its now offloaded to a dedicated ZIL) - which means everything (both read and write) should theoretically be faster.

If you want to benchmark it (unfortunately right now none of my 4 servers are in a benchmark state as they are live) - write out a 25 meg file write every 1 seconds onto a non-zil setup. Now try do some large random i/o loads (load up 10 vm machines, or even some light loading like streaming 5 high-def streams). You'll notice that the 25meg file writes 'get in the way' as in they need to be serviced immediately. With a ZIL log, you'll see the drives very quiet and essentially only servicing the read requests. The writes from the ZIL log go out about every 5 seconds but much quicker since they are processed more optimally (it's much faster for zfs to write out 5 x 25 meg (125 megs) at once, than to write it out every second one at a time).
 
Semi-thread jack - can someone briefly explain what sync=disabled means practically?
Edit: Anthem, thanks for the clear description of how ZIL functions!
 
I am confused.. Saying 'sync=disabled' completely eliminates the write log - the moment the write is done by the client, it is told "you are all set , carry on...". I thought you were saying a ZIL was faster than sync=disabled, which I have trouble believing... I'm not clear from your last post if you were addressing that or not?
 
Last edited:
hey danswartz,

I may be misunderstanding this, but it sounds like the performance improvement of a ZIL file on a dedicated SSD occurs when you have multiple requests for disk I/O interrupting writes. By using a discrete ZIL file, write requests are stored on the SSD/dedicated ZIL drive and only written every 5 seconds. This means the storage array is mostly idle, so random access is vastly improved (since this can presumably occur in the 5 second window between writes).

If you have many clients, this effect is amplified. If you have only 1 client, there may be no noticeable difference in speed, because there are no other operations competing for disk time.

if sync=disabled means that the ZIL file is eliminated, and the disks are written to without this intermediary step, then depending on the environment, the ZIL file could be much faster, or make no difference in speed.

Don't take this to be 100% accurate. This is only my understanding!!! Please, someone correct me if I'm wrong!
 
My understanding was that having no ZIL implied that the buffer writing code batches things up. I have no idea how to prove/disprove this :)
 
Back
Top