Looking at revamping my VM/storage setup (open source only) looking at options

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
My current setup is as follows:

Storage: single 24-bay server acting as a NAS with NFS shares for various purposes, but for now I'll just concentrate on VMs, so I have one share per LUN. (LUN at this point is just a raid array, basically)

VM: Vmware ESXi - I just put that together real quick as I needed a turn key solution and KVM requires quite a lot of reading and setup to make it work.

My goal is to move towards a SAN setup where the VMs use block storage directly. I also want the block storage to be shared as I may want to add more VM servers over time so I want to be able to do live migrations etc.

I know you can't share block storage between servers but I want to be able to do it somehow. Not sure how a commercial vmware solution does it though as I've seen it done. But basically I'm looking to make an open source equivalent to that.

Just looking for what solutions I need, and some reading material to get me started. Basically I want to reduce the overhead of NFS and go towards block storage. For actual file sharing, I'll just have a file server VM instead of accessing the file server directly, as that will be moved on it's own network and VM servers will have dedicated NICs for it. Like a real SAN.

Does KVM/Qemu support the ability to assign an iSCSI LUN directly to a VM and have it be treated as a hard drive? I'm thinking that may be the easiest solution. Rather than having LUNs mapped to the VM hosts and have multiple VMs per LUN, each VM would have it's own LUN that it talks to directly. Then I can move the VM on any host all I want and it wont matter as that LUN is only being used directly by the VM software and not the physical system. Is this possible to do?
 
You certainly can share iSCSI between devices (in this case, ESXi servers). Just have to make sure the filesystem is cluster aware, which VMFS is, so no worries there.

You can also present iSCSI LUNs to VMs within vmware. Create the LUN on the SAN and setup the iSCSI Initiator on the guest OS to access that LUN. The question you present about having an iSCSI LUN per VM sounds like a bad idea with a ton of overhead and headache for no purpose.

Unless you have a compelling reason, I don't see a need to move to iSCSI from NFS. NFS is just fine for everything from tiny home labs to massive deployments.
 
I will be switching off vmware though, as the free version is quite limited. So unless there's a cluster aware FS for Linux? Is there one? I also don't want to present the lun to the VM itself as in, the OS, as that will defeat the security aspects of the vlaning. I don't want the VM OS to even be aware that the SAN exists. I am wondering if I can do it at a "hardware" level though, like that the VM software emulates it as a hard drive.

I find myself having lot of performance issues with NFS, so it's one of the reasons I want to switch off it. Being a file system, it adds overhead because each VM will also have it's own virtualized file system.
 
You need to be looking at Xen server with Orchestra. It can do everything vSphere can do only free.

Why a SAN? Why not consolidate everything inside that 24 bay case and run the VMs from local storage? It's a hell of a lot easier (and much cheaper) to setup. The cheapest 10Gbe switch is $800 and then you need the NICs.
 
Is your goal performance, redundancy, flexibility with hosts, etc.?

I actually run pure local storage with my hyper-v hosts (3) because it is cheaper to get higher performance and I do not need the cluster real-time HA. I use hyper-v's replication which gives me a cold spare setup for critical VMs.
 
Pretty much what Grentz said. A SAN complicates things greatly and adds cost/complexity while subtracting speed when comparing to local storage. If you need HA then you could replicate to another host with Veeam/Unitrends/Nakivo OR you could use a hyper converged solution like Starwinds VSAN. All of these options are cheaper and better than a SAN in my option for a handful of hosts.

If this is for a home lab than more power to ya. I wish I had the room for something like this to play with at home and would do it if I could.
 
Is this a work production environment?

Do you need a SAN? how many hosts...

As noted you should not be thinking about a SAN..you do not need one. Local storage will always be faster and you can get more and spend that SAN money on local hardware or more hosts and do VSAN.

http://www.smbitjournal.com/?s=SAN
I find that most people looking to implement a SAN are doing so under a number of misconceptions.

Adding a single SAN give you a single point of failure, something most try to avoid.
 
Unless you're doing a dual controller San, I echo everyone else saying just avoid it. NFS is a great way to expose storage to VMware. It's about the only reason I'd consider getting a netapp.

if you're just wanting a San to play with then you can look at star wind, they have a free version although I'm not sure if the size limits. That would give you some experience with block level San and iscsi, etc
 
I want a SAN to have a central point of high performance storage, as I do want to add more hosts, and ability for any host to run any VM so I can have redundancy on the host level. Eventually I will want to look at clustered storage as well, but for now I will live with the SAN being the point of failure - right now I barely have any redundancy so I'm not any better now. The san uses raid obviously (would be moronic not to) and dual PSU. I will probably add a multi port nic at some point so I can do teamed ethernet. I don't want to do local storage, that's too limiting, especially if I change to a completely different VM platform that can't do mdraid.

For the VM solution I'm leaning towards kvm/qemu, I just mostly want to know how is it typically setup in a multi host/san environment so that the storage can be shared. Keep in mind when I say SAN I still will be using my custom box not an actual SAN, I will just reconfigure it as block storage. There will probably be a transition period where it's doing both. What about ZFS on Linux, is that production worthy now?

The issue I find with NFS is performance. Whenever I have backup jobs running or any somewhat I/O heavy thing going on, it brings the entire server to it's knees. The load shoots up way past 1.0 and eventually start getting kernel error messages like "the task delayed for more than 120 seconds" and stuff crashes. I figure it's probably due to all the overhead of having a file system on top of a network file system on top of another file system. So trying to eliminate any overhead. As another example I was taking image backups of my VMs, and it was SLOWWWWWW. Took HOURS just to ghost the OS drive. I'm hoping going iSCSI will eliminate a lot of that overhead and make performance better.
 
I can't help you with a SAN. I have no need for it as I find it too limiting when compared with local storage for me and my production usage. Hopefully others familiar with running a SAN can help.

But if you are going to drop this amount of coin on building out a SAN then get off fake/software RAID. Get a quality LSI controller, throw in 4 drives and go OBR10.
 
Last edited:
I wont be buying a SAN, just reconfiguring my existing hardware (well mostly the software). I'm trying to stay away from anything proprietary so I will stick with mdraid/Linux or maybe ZFS. Problem with hardware raid is you have a hardware point of failure, if the card dies you're mostly screwed. That, and you can't live add new drives or make new arrays live, and have to go into the bios. You also tend to be limited to a small number of drives, I'm not aware of any raid card that can handle 24 drives. But that's another thing, is there any tweaks I can do so that more cpu resources can go towards software raid? Since this is a dedicated storage box I pretty much want every single resource going to storage. Perhaps my performance issues can be fixed with tweaking NFS instead of going off it. Though I still want to look at dedicated storage nics/vlan. That would require a reboot though, so I may skip that part.

What I'm toying with for the VM hosts is either KVM/Qemu directly, or Proxmox. Though when I tried Proxmox originally I was not happy with it, you can't specify a folder as an ISO store and have all the sub folders show up too. But maybe that changed now. i also can't recall if it had proper vlan support. That is a must for any solution as I have various VMs all on various vlans for security purposes. (ex: vms that serve internet stuff are isolated from my main network and so on)
 
Oh yes, there are plenty of RAID controllers that support well in excess of 24 drives. Hell, many support 16 drives on just a single port. And you can very much can resize/add/delete drives/arrays live with 'real' RAID cards without rebooting and going into their config bios. You just have to use quality controllers that have a cache module. All the high end LSI cards $350+ should have these features. You can use CLI tools to configure/change already created arrays or even create them if not there all without rebooting.

But I'm from topic here. Sorry about that.
 
Good to know, but I don't think the bottleneck is my storage anyway, it's either NFS or the network, but I'm leaning towards NFS as iftop does not show any significant network pegging during the slowdown periods.

Local iozone:

Code:
[root@isengard temp]# iozone -s 1G -T 4
	Run began: Mon Feb  8 06:17:24 2016

	File size set to 1048576 kB
	Command line used: iozone -s 1G -T 4
	Output is in kBytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 kBytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              kB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         1048576       4 1685865 2851009  5394352  5453894 4142980 2752982 4352886  4667224  4245879  2344830  2974018 5114805  5229445

iozone test complete.
[root@isengard temp]#

On a remote server via NFS (on same raid array) :

Code:
[root@borg tmp]# iozone -s 1G -T 4
	Run began: Mon Feb  8 06:18:32 2016

	File size set to 1048576 KB
	Command line used: iozone -s 1G -T 4
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random  random    bkwd  record  stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read rewrite    read   fwrite frewrite   fread  freread
         1048576       4  127169   91313    76003  2576912 2050775   25299    4367 1923622 2180608    67680   113799   38674  2509172

iozone test complete.

I'm not sure how well I trust those numbers though, on NFS random read I'm getting higher than gig... that does not make sense. This was not even pegging the network, and also does not reflect normal usage, as if I try to transfer a file via NFS I get maybe 20MB/sec at very most. It's dog slow. I suppose there may be some NFS specific tweaking I can do before I try to switch?

Also would I benifit if I still treated the file server as a SAN, ex: put a 2-4 port NIC, a dedicated switch, and have the hosts have their own nics for storage as well. Use it strictly for VMs, for file sharing have a file server VM that then shares files. Would that be better? Right now it's acting both as a VM file storage and regular file storage. NFS also has laughable security so perhaps it should be kept on a separate network anyway and for regular file sharing I can use SMB. At least that actually honours the concept of a username and password, not just an ID.
 
NFS is plenty fast and is used as datastores all the time. Your issue lies elsewhere I'm afraid.
 
starwinds has a nice free version

it's failover nfs and not ha iscsi though

ps how do you like nakivo? are they ex-vizioncore guys?

Pretty much what Grentz said. A SAN complicates things greatly and adds cost/complexity while subtracting speed when comparing to local storage. If you need HA then you could replicate to another host with Veeam/Unitrends/Nakivo OR you could use a hyper converged solution like Starwinds VSAN. All of these options are cheaper and better than a SAN in my option for a handful of hosts.

If this is for a home lab than more power to ya. I wish I had the room for something like this to play with at home and would do it if I could.
 
+1 to StarWind free for this scenario.
their NFS implementation is pretty fast so it's worth considering.
BTW: If you're an IT pro or active online community contributor they can easily give you a full-blown NFR license for free! That one supports unified fault tolerant SMB3/NFS/iSCSI storage.

If that doesn't work then SAM-SD may be an option, but from my experience it worked slower on the same hardware. Be aware if you're aiming for top performance
 
Last edited:
If you use a certain Raid-array as a DAS storage, you get the best sequential performance.
If you use the same array in a SAN environment you will find a higher latency and you are limited by the network performance
but there are huge advantages of a good SAN environment especially if you include newer filesystems like ZFS, example

- a CopyOnWrite filesystem (crashresistent with unlimited snaps/ versioning of your VMs)
- data security with checksums and datascrubbing against bitrot problems
- increased readperformance/iops due a large rambased readcache (ZFS Arc)
- safe and fast write behaviour with sync write over slog devices
- no raid write hole problems like a conventional Raid
- ZFS replication between filesystems/ appliances
- storage virtualisation/ pooling, no fixed partitions
- easy and fast cloning/ backup/ restore of VMs over NFS/SMB
- Comstar, an enterprise class stack for FC/ iSCSI devices
- Crossbow for virtualized networking with vlans, vnics and vswitches

- appliance, storage, raidmanagement with a SAN management software

I do not understand, why you want to move away from NFS as it is as fast as iSCSI when you use optimized or comparable settings regarding sync write/writeback cache settings. While you can use iSCSI with a SAN, I see no advantages but many inconveniences as you must use a Target per VM, must use cloning if you want to go back to a certain version, mustd eal with partition sized and provisioning and the worsest, you loose the ability to copy/ move/ clone by a simple NFS/SMB file transfer.

What I use and suggest

- Using a SAN out of the Solaris familiy (where ZFS and NFS comes from, ultrafast and best integration of OS, ZFS and fileservices like iSCSI, NFS and SMB from Solaris, no 3rd party software), example OmniOS, an OpenSource Solaris fork
- think of 10G networking
- optionally virtualize the SAN part, see my All-In-One Howto http://www.napp-it.org/doc/downloads/napp-in-one.pdf
 
I do not understand, why you want to move away from NFS as it is as fast as iSCSI when you use optimized or comparable settings regarding sync write/writeback cache settings. While you can use iSCSI with a SAN, I see no advantages but many inconveniences as you must use a Target per VM, must use cloning if you want to go back to a certain version, mustd eal with partition sized and provisioning and the worsest, you loose the ability to copy/ move/ clone by a simple NFS/SMB file transfer.

Bingo.

I think the issue he is having is with performance but I feel that is a software issue and has nothing to do with NFS. It's probably that fake RAID/mdraid garbage that people try to use. If he moved to an enterprise level RAID controller I bet his issues would be gone. No well in any hell would I trust software RAID for that many disks. Really any disks for that matter...
 
Well, as a vsphere user, there are a couple of reasons I use iSCSI instead of NFS.

1. I can tell the software iSCSI adapter to connect to the SAN using a 10gbe NIC, with a 1gb NIC as the backup interface. I don't know of any (non-trivial) way to do this with NFS.

2. I back up my VMs using Veeam. The default proxy method is NBD (network block device), which is the worst performing of the 3 alternatives Veeam provides. Hot-plug works with SAN or NFS, but can be problematic. The best performing method is Direct SAN, which maps the LUN (recommended to make it available ready-only), which obviously requires a SAN.

If you use something like SCST on the target, configuration is more complicated than NFS, but still almost trivial...
 
Bingo.

I think the issue he is having is with performance but I feel that is a software issue and has nothing to do with NFS. It's probably that fake RAID/mdraid garbage that people try to use. If he moved to an enterprise level RAID controller I bet his issues would be gone. No well in any hell would I trust software RAID for that many disks. Really any disks for that matter...

Locally the speeds are fine, it's anything over NFS that is slow. I prefer to stick to mdraid as it is not hardware dependent. I can move the drives to any system on any controller or HBA and they'll be recognized. I can also do all management from within Linux without having to worry about some kind of special software that will probably be Windows only or having to boot into bios. Rebooting this server is not even an option at any time.

I'm open to any recommendations on how to make NFS faster though.
 
Locally the speeds are fine, it's anything over NFS that is slow. I prefer to stick to mdraid as it is not hardware dependent. I can move the drives to any system on any controller or HBA and they'll be recognized. I can also do all management from within Linux without having to worry about some kind of special software that will probably be Windows only or having to boot into bios. Rebooting this server is not even an option at any time.

I'm open to any recommendations on how to make NFS faster though.

Have you tested network with iperf to confirm you can get higher throughput then what you are seeing on NFS test? I'd check into this to rule out a network issue before putting effort into switching your storage protocol.
 
Yeah did iperf tests between various servers and the file server, all checks out, lowest speeds were maybe in 800mbps range and most were up to 930mbps or so. Tried up/down.
 
Back
Top