Linux guests experience high load average (iowait) with ESXi/ZFS

idea · Sep 28, 2011

I have two ESXi environments. I set it up this way so I can compare speeds between both. One is a traditional configuration with a hardware RAID controller and 4x 15k SAS disks in RAID10. One is the "all in one" style you are all familiar with on these forums, with the same model 15K SAS disks in ZFS mirrors (RAID10 also). Solaris shares the disks via NFS over the vmxnet3 adapter.

I was pleasantly surprised to find that regarding VMFS performance, they are both equally as fast without any tweaking at all. I get full performance from each "RAID10" with just the disks being the bottleneck. The only problem I have is that the Linux guests in the "all in one" are reporting high load averages. IOwait can reach 100% at times.

This is not a HUGE problem. I mean it's still just as blazing fast as a $500 hardware RAID controller with a battery backed cache and I can't complain about that. However, I have Cacti set up to monitor load averages and Nagios notifies me when they reach high levels. Now I have to disable it because it's getting annoying.

Has anyone else experienced this? Does anyone know what could cause this? As far as I understand it, IOwait rises when the CPU is waiting on the disk for a resource. Maybe there needs to be a tweak on the OS level.

madrebel · Sep 29, 2011

So the NFS shares from solaris, are those exported via sync or async? do you have a ZIL drive offloading the write log?

Have you checked the logs on the linux guests for errors?

lopoetve · Oct 1, 2011

Switch to the no-op scheduler in the linux guest.

idea · Oct 3, 2011

madrebel said:
So the NFS shares from solaris, are those exported via sync or async? do you have a ZIL drive offloading the write log?

Have you checked the logs on the linux guests for errors?

async because at this time the NFS options between the server (Solaris) and client (ESXi) are default. I did however set atime=off on the ZFS filesystem. I probably wouldn't change it to sync, it would make me nervous.

There is no ZIL device.

There are no errors on the linux hosts, everything looks good except for the IOwait.

lopoetve said:
Switch to the no-op scheduler in the linux guest.

This looks interesting. I am going to check that out and post what I find. I found more on this here:

http://lonesysadmin.net/linux-virtual-machine-tuning-guide/

7. Set your disk scheduling algorithm to ‘noop’

The Linux kernel has different ways to schedule disk I/O, using schedulers like deadline, cfq, and noop. The ‘noop’ — No Op — scheduler does nothing to optimize disk I/O. So why is this a good thing? Because ESX is also doing I/O optimization and queuing! It’s better for a guest OS to just hand over all the I/O requests to the hypervisor to sort out than to try optimizing them itself and potentially defeating the more global optimizations.

You can change the kernel’s disk scheduler at boot time by appending:

elevator=noop

to the kernel parameters in /etc/grub.conf. If you need to do this to multiple VMs you might investigate the ‘grubby’ utility, which can programmatically alter /etc/grub.conf.

Source: myself, plus corroborating comments from VMware Communities participants.

lopoetve · Oct 3, 2011

Doesn't matter how the share is exported - ESX only does sync NFS writes

You'll want a ZIL too.

idea · Oct 6, 2011

Welp, I already had the noop scheduler set. Good suggestion though

ManateeMatt · Oct 6, 2011

I have the same issue, I thought it was because i had to much running on that box file server. I do have a ZIL, and noop set.

Any sugestions?

madrebel · Oct 6, 2011

use iscsi instead. nfs sucks for writing. read through the vmware forums folks with 10gigE also complain about horrible write performance.

use nfs for convenienvce, iscsi, fc, or AoE for performance.

danswartz · Oct 6, 2011

None of which is an issue for pass-thru I/O

madrebel · Oct 6, 2011

danswartz said:
None of which is an issue for pass-thru I/O

you lose vmotion doing that.

danswartz · Oct 6, 2011

that isn't what was being discussed. it's a valid point, but not relevant to your comment and my reply.

idea · Mar 13, 2013

let me bump this old thread of mine

Linux guests experience high load average (iowait) with ESXi/ZFS

idea

Gawd

madrebel

Gawd

lopoetve

Extremely [H]

idea

Gawd

lopoetve

Extremely [H]

idea

Gawd

ManateeMatt

Limp Gawd

madrebel

Gawd

danswartz

2[H]4U

madrebel

Gawd

danswartz

2[H]4U

idea

Gawd