Linux guests experience high load average (iowait) with ESXi/ZFS

idea

Gawd
Joined
Jan 24, 2005
Messages
615
I have two ESXi environments. I set it up this way so I can compare speeds between both. One is a traditional configuration with a hardware RAID controller and 4x 15k SAS disks in RAID10. One is the "all in one" style you are all familiar with on these forums, with the same model 15K SAS disks in ZFS mirrors (RAID10 also). Solaris shares the disks via NFS over the vmxnet3 adapter.

I was pleasantly surprised to find that regarding VMFS performance, they are both equally as fast without any tweaking at all. I get full performance from each "RAID10" with just the disks being the bottleneck. The only problem I have is that the Linux guests in the "all in one" are reporting high load averages. IOwait can reach 100% at times.

This is not a HUGE problem. I mean it's still just as blazing fast as a $500 hardware RAID controller with a battery backed cache and I can't complain about that. However, I have Cacti set up to monitor load averages and Nagios notifies me when they reach high levels. Now I have to disable it because it's getting annoying.

Has anyone else experienced this? Does anyone know what could cause this? As far as I understand it, IOwait rises when the CPU is waiting on the disk for a resource. Maybe there needs to be a tweak on the OS level.
 
So the NFS shares from solaris, are those exported via sync or async? do you have a ZIL drive offloading the write log?

Have you checked the logs on the linux guests for errors?
 
So the NFS shares from solaris, are those exported via sync or async? do you have a ZIL drive offloading the write log?

Have you checked the logs on the linux guests for errors?

async because at this time the NFS options between the server (Solaris) and client (ESXi) are default. I did however set atime=off on the ZFS filesystem. I probably wouldn't change it to sync, it would make me nervous.

There is no ZIL device.

There are no errors on the linux hosts, everything looks good except for the IOwait.

Switch to the no-op scheduler in the linux guest.

This looks interesting. I am going to check that out and post what I find. I found more on this here:

http://lonesysadmin.net/linux-virtual-machine-tuning-guide/

7. Set your disk scheduling algorithm to ‘noop’

The Linux kernel has different ways to schedule disk I/O, using schedulers like deadline, cfq, and noop. The ‘noop’ — No Op — scheduler does nothing to optimize disk I/O. So why is this a good thing? Because ESX is also doing I/O optimization and queuing! It’s better for a guest OS to just hand over all the I/O requests to the hypervisor to sort out than to try optimizing them itself and potentially defeating the more global optimizations.

You can change the kernel’s disk scheduler at boot time by appending:

elevator=noop

to the kernel parameters in /etc/grub.conf. If you need to do this to multiple VMs you might investigate the ‘grubby’ utility, which can programmatically alter /etc/grub.conf.

Source: myself, plus corroborating comments from VMware Communities participants.
 
Doesn't matter how the share is exported - ESX only does sync NFS writes :)

You'll want a ZIL too.
 
Welp, I already had the noop scheduler set. Good suggestion though
 
I have the same issue, I thought it was because i had to much running on that box file server. I do have a ZIL, and noop set.

Any sugestions?
 
use iscsi instead. nfs sucks for writing. read through the vmware forums folks with 10gigE also complain about horrible write performance.

use nfs for convenienvce, iscsi, fc, or AoE for performance.
 
that isn't what was being discussed. it's a valid point, but not relevant to your comment and my reply.
 
Back
Top