Need help: Linux MD + LVM2 data recovery

Surly73

[H]ard|Gawd
Joined
Dec 19, 2007
Messages
1,782
I'm looking for a little help re-gaining access to an LVM2 area on a spindle that came out of a crippled machine. Google has stopped helping me so I thought I'd ask for advice. The system was running ubuntu 8.04 LTS and I configured it using two pairs of disks with MD RAID-1 mirrored partitions. Filesystems for /boot and / were regular ext3 on MD. The rest of the system was LVM2 on top of MD.

I used amanda to back up to tape for most backups. I have the amanda files in my "offsite" format on disk from a couple of weeks before things went haywire so I'm not totally screwed but I'd really like access to the latest versions of the files from disk if I can. Using my daily tapes with no working server (running amanda) could be possible but may be more trouble than it's worth. Right now I have the spindle with desired data connected to another old desktop booting an ubuntu rescue.

I'm a little puzzled about what happened. The two spindles with /, /boot, /home, /var and /project (a place where I stuffed working files requiring multi-user access) were on separate power rails, separate PATA (yes, PATA) controllers etc... A hardware issue put a crater in one of the ICs on the first spindle and then the system refused to finish booting from then on. GRUB starts to boot but all of the MD devices were no longer "running". The partitions are still there. If I mount /boot and / as read-only EXT3 the data is there. It looks like all of the MD metadata is gone. The UUIDs are all zeros and the MD arrays cannot be detected with "mdadm --examine --scan ....". Is this what it would look like if at the time of the failure the array was resilvering and what is now the remaining good device was considered out of sync? or did something else happen?

As mentioned, I can access /boot and / by simply mounting the MD member partitions as ext3. I cannot, however, get LVM2 to recognize anything on the remaining large partition. I changed partition type from fd to 8e (Linux LVM) and still nothing. I dd'ed the first few sectors of the partition and in the plain text portions I can see LVM2 metadata.

What's my best course of action here?

  • Should I force MD to build another RAID and consider this remaining good partition to be a member while the other is failed? Doesn't this risk overwriting good data?
  • Is there some lvm2 foo I'm missing to use that remaining MD partition without running MD any longer? I know that MD and ext3/4 were designed not to interfere with one another so a broken MD mirror could simply be mounted as ext3/4. I'm not sure if LVM2 and MD play as nicely or whether the MD metadata confuses LVM2 which is why it's not working
  • I have read http://www.linuxjournal.com/article/8874 The problem is that none of the metadata is found using mdadm --examine or vgchange -a y
  • any idea why a failure of one disk caused all my UUIDs and metadata to be lost? I tested all this stuff when I built the box but a real failure left me hanging.

Thanks,
 
Before you start any recovery trys, make block level duplicates of your disks if you care about your data. (This is always the first step of RAID recovery)

You can then try to recreate the array with one of the drives "missing" and use --assume-clean to avoid resync. You can do that with both drives separately to find the better drive. Depending on the superblock version those steps are not necessary, as they can work as singular drives. This can also be a problem, because if md does not find the drives first, lvm can "claim" them, prohibiting its use by md.
 
Hmm. A couple of extra things I realize I should have mentioned:

- I'm not trying to recover the system, I just want to copy off some data from /home and move on. I've replaced the server with a Synology already. It doesn't bother me if the mirrors aren't rebuilt, or if LVM2 asserts control over the entire partition - I just want to be able to copy my data off then DBAN and bin it.

- This is the only drive. The other has a literal crater in an IC. I thought momentarily about swapping controller boards to see if the metadata was intact on the platters but I haven't gone there yet. I figure there are some risks associated with doing that (to the single remaining good controller board) and there's no guarantee the data is in any better condition.

- I will be doing a block level dupe before doing anything more complicated than what I've already tried. I presume a dd with bs=512 is best practice.

- In theory I can gain access to the original mdadm.conf in /etc since it's not in the LVM2 area of the disk and I've been able to read it. Since the UUIDs appear to be wiped on the live disk is it of any real value?


@omniscience:

You said "create". From what I've seen I do not want to use creation mode. I was thinking --assemble mode, though manpages say devices with no UUID are skipped. And --build says it doesn't allow you to specify RAID-1 but it's the only mode that accepts --assume-clean so I'm a little confused.

My best idea right now is a 'mdadm --build /dev/md2 --level=1 --assume-clean -n 2 /dev/sda5 missing' with '--run' possibly being necessary to force things. If it comes up, then maybe a vgchange -a y will get things moving (?).
 
Last edited:
You can try to use "mdadm --assemble /dev/sda5", but if the UUIDs are gone, the whole superblock maybe damaged. If your superblock is somehow gone, you can recreate it if you use the exact same options (which is basically only the metadata version for RAID1) and --assume-clean.

LVM makes backups of its metadata to /etc/lvm/backup, but I never had to do a recovery with this.
 
Last edited:
It hasn't been until the last couple of days that I've had a chance to take another crack at this.

Using mdadm --build /dev/mdX --assume-clean -l 1 -n 2 /dev/sdaX missing it appears like all of the MD devices are running again. The UUIDs are still all zeros though.

I have always been able to access my / filesystem including the last running copy of /etc with MD and LVM2 configs and backups. Even with the MD layer running again, I'm still unable to access any of my LVM2 data on /dev/md2. All of the 'scan' commands I know say they check "all" physical volumes and find nothing. If that command has an option to direct it to a specific device I do - nothing. If I dd the first couple of sectors of md2 I still see LVM metadata.

I copied the LVM hierarchy from my old /etc to the liveCD I was running and tried to do an lvmcfgrestore for the volume group I'm interested in. This actually went and started to do something, but results in an error stating that it can't find the PV referenced in the config (which is a real UUID with a "hint" of /dev/md2). Of course it doesn't look like the UUID in the PV region of the LVM config is the same length as the UUID shown by MD (let alone it not matching).

I changed the LVM config file to a UUID of all zeroes like the running MD devices. No go - says it cannot be found.

I did some searching and there appears to be no realistic way of changing or re-generating the UUID on my MD mirrors.

So, what now? I've seen reference to recreating the VG with the exact same config (and maybe all of the volumes) and hoping for the best. I'm not sure that will work out since the extents in the VG have been mapped and used all over the place for different volumes in whatever way LVM2 saw fit at the time. Without more of the existing metadata I'm not sure it'll work, but I'm not a real expert in such things.
 
Please do a 'blkid' first and print the output here. The "this does not work" and "that does not work" does not really help. Then do a 'pvscan /dev/md2' just to be sure. You can recreate the physical volume metadata using the metadata backup. The metadata backup should contain all relevant block allocation maps.

When you talk about UUIDs, which UUIDs do you mean? The UUID of the RAID device or the UUID of the physical LVM volumes?
 
Please do a 'blkid' first and print the output here. The "this does not work" and "that does not work" does not really help. Then do a 'pvscan /dev/md2' just to be sure. You can recreate the physical volume metadata using the metadata backup. The metadata backup should contain all relevant block allocation maps.

When you talk about UUIDs, which UUIDs do you mean? The UUID of the RAID device or the UUID of the physical LVM volumes?

I admit my "rescue" setup is difficult to paste output from - it's an old, no-network SFF PC running a live CD with my disk dangling from it. Next time I'm able to I'll work to copy some blkid output.

For UUIDs I guess the answer to your question is "both".

LVM2 expects to see the real UUID that the PV (/dev/md2) had when it was last running. Something strange happened to my MD devices (all of them) and mdadm -D shows all of the MD UUIDs as all-zeros. Also, mdadm --examine --scan shows the UUIDs of all of the member partitions as all-zeros. But, the data all seems to be intact.

vgcfgrestore says there are no UUIDs present which match the PV it's looking for (because the UUID of the MD seems to be all-zeros instead of the real UUID I see in the backup file). I cannot change the UUID of the MD with mdadm to match the LVM file. Editing the LVM VG backup file so the PV has a UUID of all-zeros still doesn't result in LVM feeling the PV is a match for what it's expecting. From what I saw the UUIDs in the LVM backup and the mdadm output aren't the same length. Being all zeroes I can't tell from looking whether one is hex and the other decimal or what, but zero is zero.
 
I still assume you have a valid backup of your block device.

So I suggest the following: look into the unmodified metadata backup and find the right physical volume UUID. Do a
Code:
pvcreate --uuid "[I]insert_UUID_here[/I]" --restorefile [I]insert_metadata_backup_filename_here[/I] [I]insert_md_device_here[/I]
After that do a
Code:
vgcfgrestore --file [I]insert_metadata_backup_filename_here[/I] [I]insert_volume_group_name_here[/I]
and see if it does something.

You can use ssh to log into your recovery box, most live distributions come with an OpenSSH server or can install it.
A simple vgcfgrestore will not help if LVM can not relate the backup entries with physical devices - which it can't if the UUIDs are lost.
 
Last edited:
I still assume you have a valid backup of your block device.

Yes.

So I suggest the following: look into the unmodified metadata backup and find the right physical volume UUID. Do a
Code:
pvcreate --uuid "[I]insert_UUID_here[/I]" --restorefile [I]insert_metadata_backup_filename_here[/I] [I]insert_md_device_here[/I]
After that do a
Code:
vgcfgrestore --file [I]insert_metadata_backup_filename_here[/I] [I]insert_volume_group_name_here[/I]
and see if it does something.

You've just jogged my memory that the UUID in the LVM backup file would be the UUID of the PV after pvcreate, not the UUID of the underlying device on which pvcreate was run (right?). That's why the UUID length doesn't match (duh).

You can use ssh to log into your recovery box, most live distributions come with an OpenSSH server or can install it.

The issue wasn't a lack of ssh understanding. There's no network active in the box at the moment (fried onboard blah blah not worth going into) but if my new ideas don't fix this first try I'll go through the hassle of getting it on the LAN.

A simple vgcfgrestore will not help if LVM can not relate the backup entries with physical devices - which it can't if the UUIDs are lost.

Gotcha. Until now I had forgotten one layer of indirection/abstraction in my thinking of MD+LVM. Hopefully I'll get a chance to try a few more things tonight.
 
There are a lot of UUID here. The md array has one, but not each member device. They are identified by the array UUID and a number. Each LVM pv has one, each vg has one and each lv also has one. Additionally, each filesystem also has an UUID.

It is still not clear what killed your metadata. You have to assume that the filesystem, if even recoverable, has at least some corrupt files.
 
There are a lot of UUID here. The md array has one, but not each member device. They are identified by the array UUID and a number. Each LVM pv has one, each vg has one and each lv also has one. Additionally, each filesystem also has an UUID.

Yes, indeed, and I forgot about one while thinking through the issue. :(

It is still not clear what killed your metadata. You have to assume that the filesystem, if even recoverable, has at least some corrupt files.

The two filesystems I've accessed which were straight mirrors without LVM have been fine, oddly. A normal fsck said they were clean already and a forced fsck came up clean too. I find it puzzling how what should be a normal-case failure of a well separated disk caused such strange and specifically limited damage.

Once I get access to the data on the LVs I will also look into performing a recursive 'md5sum -c'. Some files will have changed since the last full backup but most should be exactly the same. Unfortunately I had no time to look at this last night.
 
Code:
pvcreate --uuid "[I]insert_UUID_here[/I]" --restorefile [I]insert_metadata_backup_filename_here[/I] [I]insert_md_device_here[/I]
After that do a
Code:
vgcfgrestore --file [I]insert_metadata_backup_filename_here[/I] [I]insert_volume_group_name_here[/I]
and see if it does something.

As I suspected near the end of this thread, this was the missing layer in my previous recovery efforts.

The pvcreate command printed a warning that it detected an MD superblock on /dev/md2 and asked permission to overwrite it. I would have expected this if I continued to work with the underlying member of /dev/md2 directly (/dev/sda5 in this case) but not since I restored the MD layer of encapsulation. I have a block level backup so I said "y".

The vgcfgrestore finally worked and a "vgchange -a y" restored access to all volumes. I did an e2fsck and nothing other than a journal replay took place on all volumes. They appear intact but I have not done any md5sum work yet to determine if it's as good as it looks.

I'm making tarballs of each filesystem on an external now to process/import on another system.

Thanks, all.
 
pvcreate detecting a md superblock on an md device could indicate that you recreated the md device with the wrong superblock version. The versions have different locations for the block.
Nice to hear that this worked out for you. After all I never had to do something like this before on one of my systems - and I hope it stays that way.
 
It could be the wrong version, I suppose. They were very old MD devices. At this point with a live CD I don't need to start the MD layer, vgchange -a y autodetects and off it goes. Not exactly correct, but it appears functional.

There are some particularly important areas of the data I'd like to compare to the latest full backups I have on tape using something like 'md5sum -c' functionality. There are some files I'd expect to be different/updated, and a lot which should not have changed. Unfortunately there's a pretty deep and complex directory tree and md5sum only checks the current directory level without recursion.

Any tips on reliable, recursive 'md5sum -c' like functionality using another tool? I could probably script something to dump md5sum files in a tree and recurse. I don't do stuff like this very often at this point and I'd spend more man hours building and testing the tool than I'd like. Furthermore, if I've needed it, someone else with more effortless scripting talent has probably already done it.
 
Thanks, guys - both look suitable. My liveCD already has cfv so that's the tie-breaker for now. Working on it...
 
Yep. CFV did the trick and I've verified everything. It all seems to have come off clean.
 
Back
Top