• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

VMFS Resignature Issues

Joined
Sep 21, 2011
Messages
32
Hi!

Hopefully someone will be kind enough to lend a hand. I had an issue with a raid array last night that forced me to move my iSCSI luns to another raid array and reconnect them to vSphere.

When I rescanned the hba's and added the storage, it asked me if I wanted to mount using the existing signature or to resignature the datastore. I chose resignature the data store. Unfortunately when I did this for each of my luns, it looks like I've lost many of the VMDK's in those luns. I've also lost the ability to register a vmx on the ESXi box through the vSphere Client. The option is grayed out when I right click the vmx file. When I look at some of the folders where my VMs were in the datastore browser, they look empty. When I try to cd into the datastore from SSH on my ESXi console, I get an error message "Invalid Argument". Lastly, there's one specific VMDK that I can see in the datastore browser, but I'm unable to add it manually to a VM.

Did I totally ruin the contents of my datastores? I do have backups (some of them not as recent as I'd like). Is there anything I can do to get access to my VMDK's (including the one that I see in the datastore browser but cannot add to a VM?

Thanks!
Dark Diamond
 
That SO should not have happened. Resignaturing just matches the VMFS LVM signature to the array LUN signature.

Run:
vmkfstools -V

Then
ESXi 5/5.1
cat /var/log/vmkernel.log

ESXi 4.X
cat /var/log/messages | grep -i vmkernel | grep -iv aam

Post the most recent 10 lines, and then cd into it for the "invalid argument" error and do the same cat again.
 
This is the last few lines in the kernel file. Looks ominous :(

5210f2-46f2-001b216b5406 ("Servers 2") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (193) and freeResources (61) (cluster 264).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid freeResources 202 (cluster 271).[type 1] Inconsistency between bitmap (192) and freeResources (202) (cluster 271).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid nextFreeIdx 211 (cluster 276).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid freeResources 205 (cluster 285).[type 1] Inconsistency between bitmap (186) and freeResources (205) (cluster 285).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (199) and freeResources (55) (cluster 296).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (184) and freeResources (75) (cluster 313).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid freeResources 245 (cluster 328).[type 1] Inconsistency between bitmap (7) and freeResources (245) (cluster 328).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (8) and freeResources (2) (cluster 335).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid nextFreeIdx 211 (cluster 340).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid nextFreeIdx 218 (cluster 349).[type 1] Inconsistency between bitmap (13) and freeResources (6) (cluster 349).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (136) and freeResources (118) (cluster 360).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (193) and freeResources (60) (cluster 377).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (1) and freeResources (142) (cluster 490).2012-11-11T01:06:13.607Z cpu4:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (193) and freeResources (61) (cluster 264).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid freeResources 202 (cluster 271).[type 1] Inconsistency between bitmap (192) and freeResources (202) (cluster 271).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid nextFreeIdx 211 (cluster 276).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid freeResources 205 (cluster 285).[type 1] Inconsistency between bitmap (186) and freeResources (205) (cluster 285).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (199) and freeResources (55) (cluster 296).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (184) and freeResources (75) (cluster 313).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid freeResources 245 (cluster 328).[type 1] Inconsistency between bitmap (7) and freeResources (245) (cluster 328).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (8) and freeResources (2) (cluster 335).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid nextFreeIdx 211 (cluster 340).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Invalid nextFreeIdx 218 (cluster 349).[type 1] Inconsistency between bitmap (13) and freeResources (6) (cluster 349).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (136) and freeResources (118) (cluster 360).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (193) and freeResources (60) (cluster 377).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
[type 1] Inconsistency between bitmap (1) and freeResources (142) (cluster 490).2012-11-11T01:06:13.614Z cpu6:5095)WARNING: Res3: 3249: Volume 509eb5e4-3ac894ea-808e-001b216b5406 ("Servers 1") might be damaged on the disk. Resource cluster metadata corruption has been detected.
~ #
 
Last edited:
This is what appears when I try to cd into one of the VM folders... I noticed something else odd: there's timeouts to an ip address that doesn't exist on my network.


2012-11-11T01:12:26.539Z cpu4:625079)Vol3: 2359: Failed to get object 28 type 3 uuid 509eb57f-2b5210f2-46f2-001b216b5406 FD 3408ac4 gen 15 :Not found
2012-11-11T01:12:26.539Z cpu4:625079)Vol3: 2359: Failed to get object 28 type 3 uuid 509eb57f-2b5210f2-46f2-001b216b5406 FD 3808ac4 gen 16 :Not found
2012-11-11T01:12:26.540Z cpu4:625079)Vol3: 2359: Failed to get object 28 type 3 uuid 509eb57f-2b5210f2-46f2-001b216b5406 FD 3808ac4 gen 16 :Not found
2012-11-11T01:13:17.894Z cpu6:4102)WARNING: Hbr: 534: Connection failed to 192.168.5.113 (groupID=GID-eaa0eda4-3af0-4ff8-8361-87bbd3282358): Timeout
2012-11-11T01:13:17.895Z cpu6:4102)WARNING: Hbr: 4322: Failed to establish connection to [192.168.5.113]:44046(groupID=GID-eaa0eda4-3af0-4ff8-8361-87bbd3282358): Timeout
2012-11-11T01:13:19.908Z cpu0:235997)Vol3: 2359: Failed to get object 28 type 2 uuid 509eb57f-2b5210f2-46f2-001b216b5406 FD 6008ac4 gen 37 :Not found
 
Oh fuckballs.

What's the filer?

If you're on 5.1, run:
esxcfg-scsidevs -m

This will give you the datastore to naa mapping (naa.ABJAEHADFASDFA:1)
then run:
voma -m vmfs -f check -d /vmfs/devices/disks/naa.BLAHFROMABOVE:1
(don't forget the :1)
And save the output to a file or something, and paste it here.
 
I'm running Starwind for an iSCSI target. Prior to all of this, my raid adapter started dropping drives out of my raid set. Eventually I lost two drives on the raid 6. I copied everything from that onto a different array. Even though I had the hosts offline and the LUNs unmounted during the copy, I wonder if the failing array scrambled the data somehow.

Thanks for replying though :) I unmounted the luns after I posted those log entries, so I'll run this command later on today once I remount them and I'll post the results.
 
it's a good idea to use 3-way mirror with starwind and no hardware or software raid on individual nodes or use raid0

in this case you just need to throw in a spare drive and re-initiate starwind sync process

no raid rebuild required --> much faster

I'm running Starwind for an iSCSI target. Prior to all of this, my raid adapter started dropping drives out of my raid set. Eventually I lost two drives on the raid 6. I copied everything from that onto a different array. Even though I had the hosts offline and the LUNs unmounted during the copy, I wonder if the failing array scrambled the data somehow.

Thanks for replying though :) I unmounted the luns after I posted those log entries, so I'll run this command later on today once I remount them and I'll post the results.
 
I ran the commands you asked me to after I remounted one of the potentially corrupted datastores (choosing Keep Existing Signature this time). This was the result:

voma -m vmfs -f check -d /vmfs/devices/disks/eui.a8dd3a0ad3343162:1

Checking if device is actively used by other hosts
Found 1 actively heartbeating hosts on device '/vmfs/devices/disks/eui.a8dd3a0ad3343162:1'
1): MAC address 00:1b:21:6b:54:06
/vmfs/volumes/509eb251-055cd3b7-1e17-001b216b5406/Media # ls -ltr
-rw------- 1 root root 2147483648000 Oct 2 07:50 Media_2-flat.vmdk
-rw------- 1 root root 606 Nov 6 01:07 Media_2.vmdk
-rw------- 1 root root 8192512 Nov 10 00:54 Media_2-ctk.vmdk
/vmfs/volumes/509eb251-055cd3b7-1e17-001b216b5406/Media #​

I'm able to see three files (the ones in the snippet above). However, I cannot add this vmdk file to another VM. When I try to browse to it in the "Browse Datastores" dialog box, it doesn't appear in the list to select. Given that I can see it via the command line, is it possible this isn't totally fubar? Any way I can force mount this disk to another VM to copy the data off of it?





Oh fuckballs.

What's the filer?

If you're on 5.1, run:
esxcfg-scsidevs -m

This will give you the datastore to naa mapping (naa.ABJAEHADFASDFA:1)
then run:
voma -m vmfs -f check -d /vmfs/devices/disks/naa.BLAHFROMABOVE:1
(don't forget the :1)
And save the output to a file or something, and paste it here.
 
I keep forgetting that - I never run it live...

so, you can shut down every vm and all but one host and run it, or you can do this:

dd if=/vmfs/devices/disks/eui.a8dd3a0ad3343162:1 of=/vmfs/volumes/yourlocalstoragepleasehavesome/eui.bin bs=1M count=1200

and then run voma on the resulting eui.bin file. Dump it to local storage though, not the array.
 
How big will this bin file be? Just want to make sure I have enough of storage available...

I keep forgetting that - I never run it live...

so, you can shut down every vm and all but one host and run it, or you can do this:

dd if=/vmfs/devices/disks/eui.a8dd3a0ad3343162:1 of=/vmfs/volumes/yourlocalstoragepleasehavesome/eui.bin bs=1M count=1200

and then run voma on the resulting eui.bin file. Dump it to local storage though, not the array.
 
Sorry my response took a few days. Got slammed at work getting a bunch of software projects wrapped up before the holiday...

I ram VOMA against the volume. For some reason I wasn't able to run it against the eui.bin file you mentioned before. Here's a sample of what I saw:


Phase 1: Checking VMFS header and resource files
Detected file system (labeled:'Media 1') with UUID:509eb251-055cd3b7-1e17-001b216b5406, Version 5:54
ON-DISK ERROR: Cluster 3904 free count 243 should be 2
ON-DISK ERROR: Cluster 10183 free count 39 should be 53
ON-DISK ERROR: Cluster 10184 free count 100 should be 125
ON-DISK ERROR: Cluster 10185 free count 100 should be 125
Found stale lock [type 10c00002 offset 16787456 v 0, hb offset 4136960
gen 25, mode 0, owner 00000000-00000000-0000-000000000000 mtime 9338380
num 0 gblnum 0 gblgen 0 gblbrk 0]
ON-DISK ERROR: Cluster 10186 free count 100 should be 125
ON-DISK ERROR: Cluster 10187 free count 100 should be 125
Found stale lock [type 10c00002 offset 16789504 v 0, hb offset 4136960
gen 25, mode 0, owner 00000000-00000000-0000-000000000000 mtime 9338376
num 0 gblnum 0 gblgen 0 gblbrk 0]
ON-DISK ERROR: Cluster 10188 free count 100 should be 125
Found stale lock [type 10c00002 offset 16790528 v 0, hb offset 4136960
gen 25, mode 0, owner 00000000-00000000-0000-000000000000 mtime 9338382
num 0 gblnum 0 gblgen 0 gblbrk 0]
ON-DISK ERROR: Cluster 10189 free count 100 should be 125
ON-DISK ERROR: Cluster 10190 free count 100 should be 125
--More--



along with lots of these:
ON-DISK ERROR: <FD c62 r11> : Invalid linkCount 245
ON-DISK ERROR: <FD c62 r11>: invalid address PB2 cow 1 cnum 475473 rnum 15
ON-DISK ERROR: <FD c62 r11>: invalid address PB2 cow 1 cnum 475783 rnum 15


and literally thousands of lines of lines similar to these:

Phase 5: Checking resource reference counts.
ON-DISK ERROR: PB inconsistency found: (775,1) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (775,2) allocated in bitmap, but never used
ON-DISK ERROR: PB inconsistency found: (775,3) allocated in bitmap, but never used


I'm guessing FUBAR is in order?

Dark Diamond
 
I agree, when he says "Oh fuckballs" that is really bad. I was in a similar situation to this a while back. I HAD to re-signature some of my files due to a SAN upgrade. And one of them had a Snapshot in place......

1 full day and a lot of sweat later I finally rebuilt the VM. This may not help you, but what I had to do was manually pull the files off and straighten out the signatures. As it was the VMX and VMDK files were pointing to the previous iSCSI/LUN signatures. Manually editing them in Notepad then re-uploading them I was able to fix them. Even then I still had some filesystem corruption that a chkdsk had to fix.

Lesson learned, re-signature at your own risk. Or as an excellent form of payback to someone you don't like. ;)

Best of luck to ya!
 
Do you think that may solve my problem? Crappy thing is the Lun I can't access is 2 tb in size. That will take a while to download, not to mention I'll need to buy a hard drive.


I agree, when he says "Oh fuckballs" that is really bad. I was in a similar situation to this a while back. I HAD to re-signature some of my files due to a SAN upgrade. And one of them had a Snapshot in place......

1 full day and a lot of sweat later I finally rebuilt the VM. This may not help you, but what I had to do was manually pull the files off and straighten out the signatures. As it was the VMX and VMDK files were pointing to the previous iSCSI/LUN signatures. Manually editing them in Notepad then re-uploading them I was able to fix them. Even then I still had some filesystem corruption that a chkdsk had to fix.

Lesson learned, re-signature at your own risk. Or as an excellent form of payback to someone you don't like. ;)

Best of luck to ya!
 
Nope, his was a VMX issue (resignaturing doesn't update the links in the files).

Yours is "your volume looks like spaghetti".

Restore from backup, and see if there are any errors on the transfer - something went wrong. That's unfixable.
 
Back
Top