Windows 2012 R2 Failover Cluster with Virtual RDM - Cluster Across Boxes?

KapsZ28

2[H]4U
Joined
May 29, 2009
Messages
2,114
It seems difficult to find a solid answer to this question. VMs are running on vSphere 5.5 across multiple ESXi hosts. VMware documentation says both physical and virtual are supported in this configuration but physical is recommended. The same documentation also states, "Clusters across physical machines with non-pass-through RDM is supported only for clustering with Windows Server 2003. It is not supported for clustering with Windows Server 2008." But I am interested in 2012 R2.

Main reason I am looking to use virtual RDM is to run native Veeam backups. I am not overly thrill having to use the Veeam agent backup due to limitations and relying on Microsoft's VSS writer.

Anyone know what is officially supported by both VMware and Microsoft, and what works best especially when you want to run snapshot based backups?

Also, the clustering is being used on file servers.
 
Are you referring to a windows cluster with RDM for a shared disk?
 
Thanks for the article, but that seems to be the same as the 5.5 article I was reading. Virtual mode only for cluster in a box and physical mode for cluster across boxes. VMFS also only supports cluster in a box. Cluster in a box is only good for guest OS failures and maintenance. No protection against ESXi host failure. Kind of sucks. Another reason more storage vendors need to build in CIFS with AD authentication.
 
Physical only. Don't use virtual RDMs. :) Why would a physical mode RDM be a problem?
 
Physical only. Don't use virtual RDMs. :) Why would a physical mode RDM be a problem?

Backups... Have a customer with many physical RDMs and on more than one occasion the VSS writer could no longer take snapshots and therefore Veeam Agent backups would fail. With a virtual RDM I don't have to worry about Microsoft's VSS writer since VMware snapshots can be used with a typical Veeam backup job.

They have all this clustering setup because there can't be any downtime. No downtime means I can't even run CHKDSK to fix the issue. I am curious if this is a Microsoft problem or something with the storage they use. Three LUNs within six months on all different server have experienced this problem and it always seems to happen when they fail over the LUNs during patching. All the servers have 3-4 physical RDMs each and only one volume is affected. So it seems less likely to be a OS VSS issue and more likely a corruption problem on the volume.
 
Snap shorting a PGR SCSI-3 reserved volume in use by two or more machines is a very bad idea. Same as snap shorting anything having to do with the OS involved in that.

In other words, that already doesn't work like you really want it to (and you can't restore it), so don't bother. App based backups!

What's the storage?
 
Snap shorting a PGR SCSI-3 reserved volume in use by two or more machines is a very bad idea. Same as snap shorting anything having to do with the OS involved in that.

In other words, that already doesn't work like you really want it to (and you can't restore it), so don't bother. App based backups!

What's the storage?

So although virtual RDMs support snapshots, it is not recommended with bus sharing? What app based backup would you recommend?

I believe the physical storage is 3PAR, but they are using DataCore SANsymphony.
 
Depends on what the app is. If it's SQL (90% of the clusters), just use SQL backups to a share that veeam then does things with (or any other backup software).

EXTREMELY not recommended with bus sharing - it honestly shouldn't work, but sometimes does by sheer luck. Remember, VMware snapshots are not coordinated between machines like you'd want when you're doing complex SCSI things with a shared volume; they're unaware that any of that is going on.

Virtualizing storage like SANsymphony does always makes things interesting. If they're having corruption on physical mode RDMs, that means the software somewhere in there is mungling the the logical unit and/or the abstraction between the physical blocks and the logical blocks. I'd... well, not do that in the first place, personally, at least not at a block level. You do that at an object level or somewhere else, but not block... and not block when you're already doing abstraction there with VMs. But that's me. Either way, it's doing something ~wrong~ in there.

And damn the autocorrect - snap shorting? lawl.
 
The VSS issue turned out to be a VMware issue with multi-pathing. They disabled multi-pathing and now it is working correctly.
 
Back
Top