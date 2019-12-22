Virtualization and ZFS

    I would be interested in your opinion because I am migrating all my machines to Virtualized solutions (mostly Proxmox VE, still have one ESXi Box).

    For the Host (or AIO using OmniOS + Napp-IT) filesystem I'm without doubt using ZFS.

    Does it matter the FS that the Guest will be using? Of it should be ZFS as well?
    My question is basically: is all data corruption prevented at host level (i.e. the Guest can use any FS he wants) or it's the weakest link of the two that determines the reliability of the data saved (i.e. both must be ZFS, otherwise it's almost "pointless")?

    In a napp-it AiO setup you use ZFS as filesystem. Due Copy on Write, ZFS is alway consistent as an atomic write (write data and update metadata or write a whole write stripe) is always done completely or discarded. With checksum, ZFS can always guarantee a filesystem and file consistency.

    From ZFS view, a VM filesystem is a file (or zvol). ZFS cannot guarantee consistency or atomic writes for VMs per se. The most important problem is the ZFS write cache (can be several GB) that commits writes immediately to a VM but put small random data on disk with a delay of a few seconds (to increase performance as writes are not small/slow random but fast/sequential). A crash in the meantime can mean an inconsistent file state from the view of a VM. This is why you need ZFS sync write to guarantee any commited VM write to be on stable disk. The performance degration of sync can be limited with an Slog.

    If the VM does its own write caching, you must care for this cache as well. If the VM use ZFS you want sync there as well or use a filesystem without write ramcache (slow).
     
    Not sure if this is the right thread, but here goes. I have a 2 host ESXi 6.7 cluster. I'd like to do some flavor of your 'cluster in a box', but I don't care about SAS/SATA shared storage. Each host has 2 1TB nvme cards - I'd like to do something using that (shared nothing). Does this work with your cluster in a box?
     
