Deduplication and cloned vms

Tiporaro

[H]ard|Gawd
Joined
Apr 12, 2011
Messages
1,151
So, forgive me in advance for still being somewhat new to this field. I'm toying with the idea of running several similar vm's that would either be direct clones or clones with minor changes. I am only using a single esxi host as this is limited to my home setting, so no external san or anything else too "fancy." Would the following be possible and/or is there a better way to accomplish this?

Set up a zfs vm with an rdm'd ssd (or potentially 2 for a mirror - I suppose I could use a pci-e controller and pass that through, but would prefer not to if possible). Export this back out as an iscsi target to the esxi host with deduplication enabled. Setup all of the similar cloned vm's on this datastore where presumably the space would not grow much past the size of a single vm because of deduplication. And deduplication on a smaller ssd would arguably not chew through too much ram if I'm staying between 50-100gb (or less) of unique data.

Is this completely just a terrible idea and wouldn't work, or is there a more effective way to run cloned vm's with some form of deduplication?
 
Be very careful using deduplication with zfs. If you don't know *exactly* what you are doing, you can get killed (taking days to delete snapshots, etc...)
 
Thanks for the warning. I'm trying to play with this first in a more experimental/developmental environment to see if it's doable and work past the hurdles that you're suggesting before it ever would be used for anything real, much less something critical. To be clear, none of what I'm doing this with is or will be critical, where I couldn't afford to have downtime or having to fiddle with it.
 
Thanks for the links. The shortcomings that are pointed out there and that I've read previously really seem to key in on the fact that in a typical storage array, you either need massive amounts of ram to keep the ddt in ram or you're going to hit massive performance penalties. However, my thoughts were that if you're only dedup'ing a relatively minor amount of data (not tb's, but literally enough for a cloned vm, i.e. ~50gb+/-), it wouldn't be all that hard keeping your ddt in ram without a major commitment to that end. Is this logic way off base?

Maybe I'm approaching this in the wrong way, but my end goal is more or less to be able to run several cloned vm's simultaneously - as opposed to having a massive amount of repeated data for these vm's, deduplication seems like a reasonable fit, as its not working with a dedup ratio of 1-2, but rather multiplicative in whole factors for each cloned vm active. Is there a better way to accomplish this?
 
no, i think it is sound theory. if say you went with a small pool in the couple hundred gig range and enabled dedup on that, using it for your linked clone store then yeah you will probably see good results.
 
I think the vmware guys here would tell you you are using the wrong tool. I think it's linked clones or somesuch. They share a common base disk with deltas for blocks that diverge.
 
Just for the sake of anyone wondering or possibly looking at this later, simple linked clones was the solution I needed and this doesn't invoke any of the mess with zfs deduplication. While this is usually a vCenter feature (and I'm still just working with the free vSphere license), I did find this helpful little article for getting linked clones up manually:

http://sanbarrow.com/linkedcloneswithesxi.html

Took all of 5 minutes to get a test vm up and running correctly. Definitely useful, and hopefully I'm not missing anything by doing it this route as opposed to the supported GUI vCenter route.
 
Back
Top