SCSI-3 persistent reservations and SAS NL disks?

danswartz · Jan 12, 2020

So I'm trying to stand up a pacemaker cluster. Back-end is a SM 826 JBOD. Storage is existing 7200 RPM 1TB Constellation spinners. I wanted to use SCSI fencing, but when I enter the command, giving the 8 drives, I get this:

[ 2020-01-12 15:34:56,313 ERROR: Cannot get registration keys ]

This happens for all 8 SAS-NL drives. These are not new (close to 10 years old by now). Note that I have a ZeusRAM drive as SLOG, and fence_scsi works just fine for that one. Am I just SOL here? I'm loath to replace all of these, as they have been working flawlessly. I suppose I can stick with just the reservation on the SLOG drive, but not wild about that...

kdh · Jan 13, 2020

I dont have much experience with pacemaker but I do have a lot of(painful) experience dealing with scsi-3 locking issues mostly with windows clustering. The drives themselves don't handle the locking specifically but the storage controller / storage subsystem you are using is supposed to handle it and the commands are passed to it via the OS interacting with it. I had a physical 2012 cluster poop the bed on me behind a vmax because the admins didn't setup clustering right, they did "something" and it flipped the scsi-3 bit on the volumes. I had to seriously de-provision the stupid volume from all the nodes. That removed the lock array side. Had to reboot the hosts, and then re-provision the volume back to the cluster. I googled the controller you are using. Its no slouch by any means, but I don't think it has the functionality you need. Suckage. =(

danswartz · Jan 13, 2020

Thanks for the info! Very strange, as the ZeusRAM is on the same controller (literally in the same jbod), and it *seems* to work. At the moment, I am using fence_vmware_soap, since these are virtualized storage appliances. I wanted to use fence_vmware_rest, but the CentOS 7 implementation seems completely broken (at any rate, it doesn't seem to interact correctly with the vmware center server appliance), so I am sticking with the soap agent for now. Sounds like you had a LOT of fun

EDIT: so of course, when I tried fence_vmware_rest again, it worked

Langly · Jan 13, 2020

When clustering storage whether it be multiple linux hosts sharing a jbod or servers sharing a lun etc, its really best to look at the storage vendor and OS best practices and recommendations. Scsi3 has been in the t10 standards for a while with specific behavior mapped depending on the requests seen.

Ive had to fix persistent reservation conflicts on super computer clusters that ended up being a nightmare due to random hosts added to the cluster without matching settings. Thank goodness for scripting to search and find specific .conf files that didnt match!

Misconfigured clustering can lead to even worse things than conflicts like corrupt data from hosts locking and overwriting data willy nilly.

danswartz · Jan 13, 2020

There isn't a vendor, per-se. This is for a home lab. It's a ZFS cluster serving NFS to vsphere. I was following ewwhite's blog post on this. Seems to work fine, except for the spinners not cooperating...

SCSI-3 persistent reservations and SAS NL disks?

danswartz

2[H]4U

kdh

Gawd

danswartz

2[H]4U

Langly

Supreme [H]ardness

danswartz

2[H]4U