Fibre FC Attached Datastores for esxi

Tiporaro · Jan 4, 2013

I'll begin by apologizing a) for my ignorance and b) if this is a little more networking forum than vm'ing. That out of the way, I'm looking to have a zfs nas/san that will host datastore(s) for 2 esxi hosts that are sitting in the same rack (so distance isn't an issue) in my home lab. I'm trying to find a network solution faster than standard 1gbps as it would choke at certain times - the most economical path that is suited to my needs I've seemed to find is 4gbps FC.

Which brings me to my main question - if I were to drop a 4port pci-e hba like the QLE2464 into the zfs box, could I run multiple direct lines to esxi hosts and have functioning shared storage, or am I missing functionality without a fc switch? While I wouldn't mind getting the fc switch, if it doesn't make a big difference I'd happily hold off until there was a more pressing need.

Also, in either case, does bonding/teaming work well (or at all) in this scenario to provide increased one-to-one throughput, i.e. if there were 2-port hba's in the esxi hosts, could I dual-link 8gbps transfers for each host?

Thanks for any help guys!

Jay_2 · Jan 4, 2013

Why not bond some copper 1Gb ports for iSCSI?

Tiporaro · Jan 4, 2013

Well, I had considered it, but a few things are somewhat leading me away from that:
1) a quad port bonded would at most give me a total bandwidth of 4gbps aggregate, best case scenario,
2) arguably more expensive than fibre HBAs (4gb) - with less potential
3) bonding to get one-to-one throughput >1gbps doesn't work in many situations, and from what I've seen would require a decent managed switch in the cases where you can get it to work.
4) from everything I've read, fc has much better latencies than iscsi - not a deal breaker, but like most other things I listed, if fc is cheaper And has the benefit, why not
5) Last but not least,I wouldn't mind getting the chance to play with fc in my home lab

MrGuvernment · Jan 4, 2013

Fiber HBA's are expensive are they not? you can get a quad port intel for $300-$400...

This HP one is $1200..
http://h30094.www3.hp.com/product/sku/2797613/mfg_partno/AE311A

Tiporaro · Jan 4, 2013

eBay for home lab makes fc hba's pretty affordable, where 4gbps fc is a pretty good value (quads around $100, dual or singles significantly less)

jimh425 · Jan 4, 2013

Fc switches are pretty high. Why not not esata enclosure?

Never mind, on eBay they aren't.

Tiporaro · Jan 5, 2013

jimh425 said:
Fc switches are pretty high. Why not not esata enclosure?

Never mind, on eBay they aren't.

eSata enclosures don't really help with being able to set up an expandable/high performance shared storage pool, unless I'm missing something.

And also yes, even if I did have to get a fc switch they aren't that prohibitive second hand. It's just that from what I've parsed from various discussions, for a smaller setup I could probably skip the switch and run multiple direct fc connects to the zfs datastore; if I could get some confirmation, it saves me the added cost of the switch, and I can always drop a fc switch in later.

And then of course I was curious of how bonding/trunking/multipathing/etc. works over fc.

jimh425 · Jan 5, 2013

I missed the 2 in the first post. You do need shared storage for migration from hosts.

bmh.01 · Jan 5, 2013

I very much doubt you will see any improvement over the setup you have now, let alone a multipathed iscsi setup. IOPS will be holding you back, not bandwidth as a generalisation.

s0lid · Jan 5, 2013

Using QLE2462 on the solaris server and QLE2460 on esxi box with direct host2host connection

Tiporaro · Jan 5, 2013

s0lid said:
Using QLE2462 on the solaris server and QLE2460 on esxi box with direct host2host connection

Thanks for the confirmation, I feel decent enough about it to pick up a few things and tinker around.

Also, found some additional on-topic discussion over at http://forums.overclockers.com.au/showthread.php?t=971190 - should hopefully run fairly easily.

s0lid · Jan 5, 2013

One thing when you setup the solaris comstar server, remember to conf the target card as target, before actually connecting the fiber to esxi box.
Causes kernel panic on solaris side if both sides are in initiator mode

Tiporaro · Jan 5, 2013

bmh.01 said:
I very much doubt you will see any improvement over the setup you have now, let alone a multipathed iscsi setup. IOPS will be holding you back, not bandwidth as a generalisation.

From a lot of the benches and others experiences, there's a fair amount of difference between 1g Ethernet vs 4gbps fc, not only in throughput but latencies which would actually make a difference for booting vms over datastores.

I do agree that iops is often a quicker limiting factor tho, and that's definitely already been considered - implementation of ssd's in a variety of setups can quickly cause a single 1gbps to choke, and multipathing standard Ethernet still doesn't provide increased performance one-to-one as far as I've seen, but is more for redundancy/load-balancing across multiple hosts (with the exception of some more unique configurations that I've heard can sometimes get improved 1-to-1 throughput depending on situation and everything playing nice with round-robin, but doesn't seem to be a guarantee).

Tiporaro · Jan 5, 2013

s0lid said:
One thing when you setup the solaris comstar server, remember to conf the target card as target, before actually connecting the fiber to esxi box.
Causes kernel panic on solaris side if both sides are in initiator mode

Haha, this sounds suspiciously like a lesson learned from experience =) which we can all agree make for the best ones that you never forget. I definitely appreciate the heads-up, and I'll keep it in mind once I get the cards in and can start playing with them.

yp_1 · Jan 7, 2013

s0lid said:
Using QLE2462 on the solaris server and QLE2460 on esxi box with direct host2host connection

Does not have to be Solaris. Pretty much anything that can do FC SCSI target will work.
I've been using Quadstor. Works fine on Linux/FreeBSD with aforementioned HBAs.

lopoetve · Jan 7, 2013

Pointless - you'll run out of IOPS well before you run out of bandwidth.

This is the biggest mistake most people make with storage for VMs - IOPS matter, bandwidth not so much, unless you're doing sequential large block reads/writes.

Tiporaro said:
From a lot of the benches and others experiences, there's a fair amount of difference between 1g Ethernet vs 4gbps fc, not only in throughput but latencies which would actually make a difference for booting vms over datastores.

I do agree that iops is often a quicker limiting factor tho, and that's definitely already been considered - implementation of ssd's in a variety of setups can quickly cause a single 1gbps to choke, and multipathing standard Ethernet still doesn't provide increased performance one-to-one as far as I've seen, but is more for redundancy/load-balancing across multiple hosts (with the exception of some more unique configurations that I've heard can sometimes get improved 1-to-1 throughput depending on situation and everything playing nice with round-robin, but doesn't seem to be a guarantee).

If you're having significant latency issues between Eth and FC, you're doing it wrong or have substandard components. Switched Eth should be within 1ms of FC in terms of a real-world setup (violin / Whiptail type arrays aside, or Infiniband).

SSDs are great for many things, but with the large block reads/writes which could possibly saturate 1GBE, they're not going to perform as well (most are not designed for that, as very few things actually do that) - enterprise SSDs aside (which I doubt you're using, are you?).

For small block reads/writes, which most VMs send, IOPS is still the limiter in the end - you'll run out of 4k commands before you run out of overhead on the link when running real-world workloads, assuming at least 2 links and round robin (Which, when you have multiple hosts and multiple vms, will effectively grant you the full bandwidth of both).

Heck, in my performance tuning op for enterprise arrays, we run out of commands before we run out of 1G link more often than not, and that's with dozens of disks backing them. The only time you don't is with certain workloads, and most people don't run those at home (oracle tuned for large blocks, etc).

THAT BEING SAID - what you're doing will work fine, if you really want to.

Tiporaro · Jan 7, 2013

Thanks for the answer lop. First and foremost, I'll say that this is partly a learning experience, and not a critical application where I'm truly worried about 1gbps flat out not working at all.

That said, I guess the logic that I had come back to was that if I was doing a large read/write (for example, media rendering/encoding/etc., or even simple transfers) from a single vm even to a single disk over the network, the 1gbps link (with overhead, maybe around 110-120MBps?) would limit before a single drive would cap out sustained (or thereabouts). If I'm able to saturate a 1 gbps network with a single file transfer, wouldn't that leave other vm's depending on that link choked during those periods?

lopoetve · Jan 10, 2013

Tiporaro said:
Thanks for the answer lop. First and foremost, I'll say that this is partly a learning experience, and not a critical application where I'm truly worried about 1gbps flat out not working at all.

That said, I guess the logic that I had come back to was that if I was doing a large read/write (for example, media rendering/encoding/etc., or even simple transfers) from a single vm even to a single disk over the network, the 1gbps link (with overhead, maybe around 110-120MBps?) would limit before a single drive would cap out sustained (or thereabouts). If I'm able to saturate a 1 gbps network with a single file transfer, wouldn't that leave other vm's depending on that link choked during those periods?

Depends on what commands they're sending and what size commands your guest is sending, which depends on the guest, etc.

In other words, you may be surprised - most of those things won't use LBW/LBR, they'll be doing smaller reads/writes, which will saturate IOPS first. You'd have to look at the actual commands that they send, but large media encoding/etc are highly sequential as well, and also not normal from a workload comparison.

I'll find a way for you to measure the size of the io

Fibre FC Attached Datastores for esxi

Tiporaro

[H]ard|Gawd

Jay_2

2[H]4U

Tiporaro

[H]ard|Gawd

MrGuvernment

Fully [H]

Tiporaro

[H]ard|Gawd

jimh425

Gawd

Tiporaro

[H]ard|Gawd

jimh425

Gawd

bmh.01

Gawd

s0lid

Limp Gawd

Tiporaro

[H]ard|Gawd

s0lid

Limp Gawd

Tiporaro

[H]ard|Gawd

Tiporaro

[H]ard|Gawd

yp_1

n00b

lopoetve

Extremely [H]

Tiporaro

[H]ard|Gawd

lopoetve

Extremely [H]