LVM recovery tale.

Over the weekend I had the worrying experience of losing my LVM settings and potentially all my data… a quick search on the web showed a confusing set of information, much of it for older versions of LVM and therefore rather suspect.

Well, I recovered all my data and it was really quite simple, so I’ve written up what I did in the hope that someone else, in a similar situation, will find it useful. It’s a scary thing, losing the whole hard disk and knowing that, in reality, its all there.

First the situation

I’ve got a small /boot partition as ext2, and a larger one for the root directory (2Gb, also ext2). The rest of the hard disks (nearly 120Gb) are assigned to a volume group called, descriptively, system… (which is SuSE’s idea of a default name).
More accurately, they were supposed to be assigned to it. At first I had just added 60Gb to the volume group, it was my first use of LVM and I was hedging my bets. After 6 months of trouble free operation I decided to add another 60Gb of disk, which I did 3 months ago.
Except that, although the physical and volume group managers all agreed that the volume group had 120Gb, the logical volume manager insisted that there was only 60Gb. I’d used Yast2 to create and add the volumes.
I tried every combination of commands I could think of to get the logical volume manager to recognise the additional space but it wouldn’t.
At the time, I was busy, so forgot about it, then last week I realised that I wanted to use the space so settled down to do something about it.

The problem

So, it seemed the best solution would be to remove the second partition that I had added (/dev/hdd1) from the physical volume manager and then add it back.
It wasn’t recognised so wouldn’t be missed, right?
pvremove /dev/hdd1 removed the label from /dev/hdd1 but also from /dev/hda7 (which was the original partition and full of data).
pvscan and pvs reported no physical volumes on the disk.
vgscan and vgs couldn’t find any volume groups.
lvscan and lvs were non-starters obviously.

The rather surreal thing was, the whole system kept on running quite nicely, X Server and KDE desktop and all, but I knew that as soon as I rebooted the system would be toast.

First I tried adding the partition back to the volume group system, but the system couldn’t find the ‘system’ group. I tried creating the physical volume again (pvcreate) but that told me that the volume already existed. It became clear that I would need to reboot and hope that the system sorted itself out, flushed the disks, resynced, whatever.

The solution

After rebooting the system wouldn’t come up, which is kind of what I had expected so I had to reboot from the SuSE Rescue disks. So now I had to think about how to recreate the physical volumes, volume group and logical volume and do it with the data intact. (I have daily backups but the thought of restoring the whole system, applications and data, was not too exciting, especially as I knew all the data was there and intact. With a ‘regular’ hard disk partition that had got lost I could scan the disk for potential disk partitions and restore them. But that wouldn’t work with LVM.

On a search through various sites, I found one that mentioned the importance of saving a copy of the volume group parameters to a file using vgcfgbackup. This file could then be used to restore the parameters later, assuming that the underlying physical structure hadn’t changed.
Well, the physical layout hadn’t changed but unfortunately I hadn’t created a backup of the volume group parameters (the ‘descriptor area’ to use the technical term) so that didn’t seem to hopeful.
I poked around in the /etc directory (I still had the ‘/‘ partition remember, as that was on its own ext2 partition) and noticed that there was a /etc/lvm/backup/ directory and a /etc/lvm/archive/ directory. Further investigation and I found that these are automatically created by LVM whenever changes are made to the system.

Unfortunately, all the messing around I had done had created a non-working version of the system file and the archive files didn’t seem to be recent enough. But, I remembered that I had a backup of the system files (going back 6 months in fact) and so I dug out a copy of the /etc/lvm/backup/sysem file and used that.

Here is what I did:
First find out the old UID’s of the partitions, this is in the /etc/lvm/backup/system file. They are quite long… make sure you get the UID for the physical volumes.
$pvcreate -u sdSD-2343-SD939-adIda2 /dev/hda6
$pvcreate -u dk33kd-929293nd-adfja298a /dev/hdd1
$vgcreate -v system /dev/hda7 /dev/hdd1
$vgcfgrestore -f /etc/lvm/backup/system system

and lo!, all data present and correct!

In fact, I just rebooted the system and was back where I had started with the additional benefit of an extra 60Gb of disk space, because now I had the extra partition properly included.

[Note: in the lines using pvcreate... above I could have used:
$ pvcreate –restorefile /etc/lvm/backup/system
to automatically find the ID’s but I hadn’t realized that at the time. Without the UID’s then the vgcfgrestore will not find the physical volumes that it needs to recreate the volume group.]

The lesson

Don’t panic!
Keep a safe copy of your /etc/lvm/ files!
Make sure that you have a Rescue disk that understands the LVM system!

Apart from the above disaster, which seems to have sorted itself out very easily, I have had no trouble with the LVM system. At first I was worried that if there was a failure it would lose everything. There is something very comforting about a simple ext2 (or FAT) partition in that I know it can just be hacked at the bit level and rebuilt. Something like LVM, which is logical volumes on top of volume groups on top of physical volumes is impossible to rebuild ‘by hand’ so I’m learning to trust technology a bit.