My ESXi iSCSI benchmarks (& questions)

spectrumbx

[H]ard|Gawd
Joined
Apr 2, 2003
Messages
1,647
My setup:

ESXi server specs:
  • Dual Quad (2x 8356)@2.415Ghz
  • 16GB (4x 4GB) DDR2 667Mhz ECC REG w/ 4bit Parity [Chipkill]
  • 4x Gigabit NICs
  • Intel 160GB SSD G2 (the only local datastore)
  • 8+ TB storage (used by a single VM)

iSCSI target:
  • Win 2003 R2 VM (512MB RAM / 2 vCPUs) running on ESXi
  • SATA drives passed to the VM as physical RDMs
  • The first 50GB of each drive is partitioned to host an iSCSI virtual image file
  • Running Starwind (free edition)
  • 2 virtual NICs (one dedicated for iSCSI) and both going to the same virtual switch (the switch has 2 NICs for load balancing)

iSCSI initiator
  • ESXi with a dedicated NIC

Network config:
ESXi-Network.PNG


SATA drives (used to host the iSCSI image file):
base-drives.PNG


Virtual drives on iSCSI target:
iSCSI-Compare.PNG


The tests for the virtual drives were conducted on a XP based VM and the OS for that VM was running off a 3rd iSCSI target.
I have repeated these tests at least 5 times each, and the results (and anomalies) are fairly consistent.

In the images, "ASync" is in reference to the iSCSI target being created with support for multi-threading (multi-threaded file system capabilities) while the one labeled "Sync" is in reference to a target created without multi-threading.

I am not sure if the differences in the benchmarks above are due to the asynchronous vs synchronous traits of the iSCSI targets or if there is a networking issue that's at play here.

Questions:
1. Why are my write speeds taking a major hit at above 2MB?
2. Should I place the VM hosting the iSCSI targets on the same switch as the ESXi iSCSI initiator? Why?
3. Can someone give me some clue/comments on this whole async/sync iSCSI target thing?

Thanks.
 
Fire up iometer instead. 1024 outstanding IO, 1 worker, 16k and 32k tests, 50% read, give me the numbers you get :)
 
'tis just what I'm used to seeing :)

I might be doing something wrong here, but this is taking too much time and too much CPU out of my test VM.
I had to kill everything.

1. Are we talking about this Iometer? http://www.iometer.org/
2. How long should the test take?

The program had weird behaviors on Win 2003 R2 and Win 7.
It only seems to work fine on XP.
Nevertheless, it is taking to much time.

[Edit] I have downloaded the latest version from sourceforge and am retrying again.
I am also setting the outstanding IOs to 16 based on reading this: http://communities.vmware.com/docs/DOC-3961
 
Last edited:
yeah, set it to 1024. The communities are useless. Trust me on this ;) It does take some time, you'll also need to set the display to actually refresh (default is infinite seconds between refresh).
 
Well, my patience with the tool has worn out.
It's been hours now and I don't understand the need for it to take that long.
I am giving up on it. Sorry.

Anything else?

I am more concerned with throughput and the anomalies I posted above.
 
That's the tool and the results I know and work with, and what we use. You deleted the second worker, right? Only had a single worker? Try it on a second disk instead of the boot volume (that sometimes causes problems, although its rare)?

It's SATA though, it's not going to be all that fast... SATA disks don't have that many IOPS.
 
Ha... by default Iometer will make a test file that fills any free space left on the disc. Restrict the size of the test file. It is taking forever because the test file is being generated.

You probably realize your setup is not ideal but i'll go with it for academic pursuit.

You should have redundant pnics on the iSCSI, but if you are just using it internally, add a second nic to the vm in question and put it on your iSCSI switch, that way it will have non-blocking network access into the network (zero latency on the virtual switch and only bus limited bandwidth).

I really hope this is a lab environment, and not a production system.
 
Okay, I re-did it.
However, I only let each test run for a minute before stopping it.

Asynch: IO=16 / Access spec= 16K-50% read- 0% random
s-16IO-16k.PNG


Asynch: IO=16 / Access spec= 32K-50% read- 0% random
s-16IO-32k.PNG



Synch: IO=16 / Access spec= 16K-50% read- 0% random
r-16IO-16k.PNG


Synch: IO=16 / Access spec= 32K-50% read- 0% random
r-16IO-32k.PNG


Ha... by default Iometer will make a test file that fills any free space left on the disc. Restrict the size of the test file. It is taking forever because the test file is being generated.
...
Yeah, I was able to noticed that.
However, it still took long after I restricted it to 200000 sectors.
In any case, my quick run results are above.

You probably realize your setup is not ideal but i'll go with it for academic pursuit.

You should have redundant pnics on the iSCSI, but if you are just using it internally, add a second nic to the vm in question and put it on your iSCSI switch, that way it will have non-blocking network access into the network (zero latency on the virtual switch and only bus limited bandwidth).

I really hope this is a lab environment, and not a production system.
Yeah, this is my home setup.
I do have a second vNIC (dedicated for iSCSI) on my VM, but it goes to the same vSwitch as the first vNIC.
I thought of placing that second vNIC on the iSCSI vSwitch, but I fear that the ESXi initiator communication would still hit the physical NIC associated with that vSwitch.
Otherwise, I could just create a vSwitch with any physical NIC attached to it.

Also, why redundant NIC on the iSCSI? If anything is to affect the NIC, it will affect any other NIC. Or is it for load balancing?
 
numbers look pretty solid for a windows box with starwind on SATA. Actually, some of the iops numbers look really good. It's doing a not-at-all-shabby job of caching :) I bet your prior numbers past a certain size are due to the limitations on the caching that the software is doing.
 
numbers look pretty solid for a windows box with starwind on SATA. Actually, some of the iops numbers look really good. It's doing a not-at-all-shabby job of caching :) I bet your prior numbers past a certain size are due to the limitations on the caching that the software is doing.

Actually, I am doing no caching.
That was going to be another area of testing (more for curiosity).
I don't think I will ever enable caching (not worth the risk), but it should be fun to see what impact it has.

What about the ASync one performing worst than the Sync?
Both targets are on identical drives (WD 1TB Black), same first 50GB partitions, same size, and same controller.
 
Okay, moving the second NIC of the iSCSI target VM to the iSCSI vSwitch did improve things a bit (note the 4MB and 8MB results).
Here are the latest bench results:

atto.PNG
 
To be clear - did you test with outstanding IOs at 16 or 1024 as lopoetve requested? Your results may be skewed in error because of this.

I'm not sure if you're familiar with lopoetve, but if I remember correctly he's a storage engineer with VMware. You're asking one of the best ;)
 
To be clear - did you test with outstanding IOs at 16 or 1024 as lopoetve requested? Your results may be skewed in error because of this.

I'm not sure if you're familiar with lopoetve, but if I remember correctly he's a storage engineer with VMware. You're asking one of the best ;)

Synch: IO=16 / Access spec= 16K-50% read- 0% random

So, yeah.
 
Also, why redundant NIC on the iSCSI? If anything is to affect the NIC, it will affect any other NIC. Or is it for load balancing?

It is best practice for both redundancy and load balancing to have redundant network connections for all of the services running production workloads. You also want to make them redundant between cards, so that even if an entire network card fails, everything still runs.

With vSphere 4, and Software iSCSI you can now Round-robin loadbalance and break the 160MBps barrier in 3.5, but you need multiple (if possible 4+) ports.

Since this is just a lab, don't worry about it too much.

Since you are not running iSCSI outside of the ESX host, I don't think you even need to bother with a physical NIC mapping on the iSCSI switch.
 
Ok.

I tried removing the physical NIC from the iSCSI swith, but the IP assignment became a bit troublesome.

Since there is no gateway, setting the IP on the initiator complains of not specifying a gateway.

Last, the CPU load is fairly high on the iSCSI target VM (40-60%).
It does not look like I am getting any TCP offloading. :(
 
Ok.

I tried removing the physical NIC from the iSCSI swith, but the IP assignment became a bit troublesome.

Since there is no gateway, setting the IP on the initiator complains of not specifying a gateway.

Last, the CPU load is fairly high on the iSCSI target VM (40-60%).
It does not look like I am getting any TCP offloading. :(

it shouldn't complain... as long as you've got a VM network port and a Vmkernel network port on there, it should be perfectly happy. There's a pile of docs out there that use that exact setup for testing labs (I use workstation, personally, but I like simulating the fabric as well).
 
To be clear - did you test with outstanding IOs at 16 or 1024 as lopoetve requested? Your results may be skewed in error because of this.

I'm not sure if you're familiar with lopoetve, but if I remember correctly he's a storage engineer with VMware. You're asking one of the best ;)

This too, although to be honest, those numbers aren't shabby at all for a VMed iSCSI target. :)
 
As an update, I have to say that the VMs deployed on the iSCSI datastores are performing very well.
Big kudos to Starwind.

The last benchmarks actually show how close the iSCSI datastores performances are close to the actual SATA drive performances.

Thanks all for your responses and inputs. :)
 
Back
Top