esxi napp-it omnios all in one disk performance issues

kumasan

n00b
Joined
Apr 21, 2015
Messages
9
Im a newbie to building an all in one setup using esxi 6.0 and omnios. My basic setup is as follows:

Supermicro X10SL7-F with onboard LSI-SAS flashed to non raid
Xeon E3-1232v3
32GB ECC ram
ESXi 6.0
2x Intel GBE
8xSeagate 2TB drives (on LSI SAS passed through to OMNIOS)
250GB Samsung 850 SSD (esxi local datastore)

OMNIOS/ZFS VM
I have a primary VM for omnios 6GB memory, 4cores onboard LSI-SAS passed through.
VMware tools installed
LSI SAS, E1000, 30GB vdisk on local SSD store
I created a ZFS pool with RAIDZ2 and have it shared via smb and nfs

Windows10 VM
2 cores 6GM memory, 120GB disk on OMNIOS/NFS share with thin provisioning
E1000, LSI-SAS, Paravirtual controller

Windows7 VM
2 cores 6GB memory, 120GB disk on local share thin provisioning
E1000, LSI-SAS, Paravirtual controller

I get ok performance ~80MBPS transfer when copying from the ZFS windows share to my laptop. A little better than I was getting with my older windows system with hardware raid6.

If I copy from the ZFS windows machine to the W7 machine on the local store I get 120MBPS which is even faster.

Im having performance problems on windows VM's when the VM is located on the ZFS share.

If I try the same copy from the ZFS windows share to the W10 VM desktop the transfer starts out at ~50MBPS but very quickly falls to ~5MBPS. Overall windows VM performance is sluggish, all pointing to slow disk. Running esxtop seems to indicate low overall system load and cpu utilization.

Basically Im having severe performance issues when the VM disk is stored on the ZFS NFS share. Installs and bootup for machines is slow and sluggish.

Im not that experienced in debugging the slow disk access and all in one setup would greatly appreciate any feedback or suggestions how to proceed in debugging and any tuning tips for my setup.
 
A fast read from NFS over ESXi and a slow write indicates the effect of secure sync write to avoid a dataloss during a powerloss. This is enabled on ESXi by default.

You can check this when you set sync=disabled on the NFS filesystem
(napp-it menu ZFS Filesystems).

If writes are faster then, you must decide. With disabled you are in danger to loose some seconds of writes on a crash. This can lead to a corrupted VM especially when they are using older filesystems like ext4 or ntfs. If you want the secure sync write behaviour with a better performance you can add an Slog device (ex an NVMe or Intel S3700). Size of an Slog must not exceed 10 GB but you need very low latency, high write iops and powerloss protection.

Next you should use the vmxnet3 vnic instead of e1000 as it offers a better performance with less cpu load.

Third you can do some tunings, mainly increase buffers for tcp, vmxnet3 and NFS as the defaults are optimized for 1G networgs with low RAM.

Last, you can give OmniOS more RAM for caching.
Count 2 GB for ESXi and 12 GB for your Windows VMs
With 32 GB RAM you can use up to 16 GB for storage caching.
 
Thanks for the suggestions, it is much appreciated.

I tried disabling sync on my NFS share and the performance got much better but still looks suspicious. The transfer rate starts out at the ~120MBPS (about the same as the vm stored on local share) but then it oscillates in transfer speed (looks like a saw blade) eventually settling down at about 40mpbs. Overall it is much better but I am suspicious if there is not something else suboptimal in the setup. Any ideas what might be going on that may be causing the oscillations and speed degradation?

Read up a little about the slog device. Im currently maxing out my sata ports on the LSI SAS and cannot passthru the motherboard SATA controller since I have the 200GB samsung disk. Was thinking to possibly move the esxi install to USB but since I also have the OMNIOS VM that does writes this probably would not work well. If I go this route I'd probably need to add another (perhaps LSI SAS) card.
Of the different SLOG devices the Intel s3700 seems the most economical but still 100$. Guess thats not bad compared to some of the other solutions. Im thinking for short term I might just disable sync on my nfs share that has the vm disks on it and make sure to back up the vm machines regularly and save up for the SLOG device and SAS card.

Another issue Im fuzzy/still learning about. For the vm swap file/file location and windows page file Im suspecting my setup may not be ideal.
In trying to debug performance on the windows 10 machine I had tried increasing the paging system file to 6g to match the ram size and the machine almost came to a halt after rebooting until I was unable to decrease the size. Currently I am using default settings for storing vm swap with the machine. Looking for a suggested suggestions for managing this.

Also I saw on the napp-it all in one they mentioned adding additional switch/vlan config but the details were not clear. From what I had seen in pro IT setups there would be a separate SAN switch that server the disk images through iSCSI to the machines. From what I read the ISCSI is possibly faster but basically same as sync=disabled. If I stay with nfs shares would there be much benefit to adding switch/vlan or enabling jumbo frames, or is this mostly for a more pro setup where subnets are improving security?

Lastly if I want to tune are there any suggested alternate settings for my setup? and where to set these i.e in ESXi, or omnios.

Greatly appreciate the help.
 
The "saw blade" regarding write performance is the effect of the ZFS write cache. It collects writes for a few seconds in RAM and writes it then as a large sequential write. This is ok.

The VM settings (swap etc) is uncritical. Start with defaults.
But switch vnics on any VMs (OmniOS, Win) from e1000 to vmxnet3 (much faster)

Vswitch or vlan is not performance relevant. You can separate traffic, thats all

If you enable or disable write caching on iSCSI or NFS, both are similar fast
sync = disabled (NFS) and writeback cache=enabled (iSCSI) is a setting with same result

MTU 9000 can give better values but only over "real ethernet"
For internal AiO transfers in software this is not relevant (and can give problems)

Try some tunings ex (vmxnet3 settings)
TxRingSize=4096,4096,4096,4096,4096,4096,4096,4096,4096,4096;
RxRingSize=4096,4096,4096,4096,4096,4096,4096,4096,4096,4096;
RxBufPoolLimit=4096,4096,4096,4096,4096,4096,4096,4096,4096,4096;
EnableLSO=0,0,0,0,0,0,0,0,0,0;

or (ip settings)
max_buf=4097152 tcp
send_buf=2048576 tcp
recv_buf=2048576 tcp

or (NFS settings)
nfs_max_transfer_size=32768
nfs_max_bsize=32768

see http://napp-it.de/doc/downloads/napp-it.pdf
chapter 21

or http://napp-it.de/doc/downloads/performance_smb2.pdf
(SMB tuning)
 
Back
Top