Cannot find root on LVM after update of lvm2

Hi all,
I am running OpenSUSE 13.1 and have all my partitions except boot running on LVM2, i.e. /swap, /root/, /home and another large data partition.
Now, yesterday my system installed some automatic updates and I noticed LVM2 was also among them. It took a very long time to complete, but eventually it was finished without any errors mentioned. After that the system continued to run fine.

However, when powering up the system again today, it fails to startup. I get the GRUB menu, but after that it just hangs when trying to start Linux.
So, I started again in recovery mode and the last two lines are as follows:

  • Resume device /dev/system/swap not found (ignoring)
  • Waiting for device /dev/mapper/system-root to appear:…Could not find /dev/mapper/system-root.

So apparently the logical volumes are not recognized anymore. :frowning:
They are still there though, as I can see and access them fine when running Linux Mint from a USB stick like I am doing atm. The only thing I notice is that in the Thunar file manager the root partition has no name, it’s just called " 21 GB Volume" whereas the home partition is called “Home” like it should.

Now my (obvious) question is: can this be fixed and what steps do I have to take to do that?
I’m not a complete noob, but still fairly ‘green’ as far as Linux goes. :wink:

I appreciate any help.
Cheers,

Hylke

This is most likely due to a problem in the “initrd”.

I updated 13.1 yesterday, on an encrypted LVM. It still booted afterwards.

I suggest you check whether “/boot” is short on free space. That can cause “initrd” problems.

You probably have more than one kernel installed. Use the “Advanced” line in the grub2 menu to try a different kernel and see if that boots.

Dear Opensuse Devs & Auto-Update People!

After the 13.1 update from 2015-02-02 evening, on two of our system the identical error occurs.
LVM-LVs are not found by grub, killing any boot and login possibility.
initrd-3.11.10-25.desktop of Feb 2 19:05 CET is 25089585 bytes (and there is an initrd-3.11.10-21-desktop of same day but 19:02 CET with 25089666 bytes besides it).
On the other system, there’s only the initrd-3.11.10-25.desktop from Feb 2 present, here with size 24675977.
Boot partitions of both affected systems are quite ovesized and are only half full, no cause for a broken initrd here.
Both systems are auto-updated, yast-installed pure OpenSuse-13.1 x86_64 systems.
Both systems have everything outside of swap and /boot in Logical Volumes (EXT4 & some old data LVs still REISERFS; system-relevant LVs are all EXT).
Both systems have 3 LVM2 Volume Groups.
On both systems exist Physical Volumes directly on hard drives (on HD, e.g. /dev/sdb, rather than in a partition, e.g. /dev/sdb1). But systemvg (including rootlv) resides in a PV not plain/directly on a HD, but in the partition behind the respective boot partition on the same HD, same on both systems.

Both systems are bricked (as for now) exactly after an auto update on Sunday evening.
This is a real repeatable thing.

Both systems still have completely intact pvs, vgs, lvs as far as my rescue system shows me
Regards to the opensuse update team, keep up the good work! And, perhaps, pull a bit more of it right now - I dream of a small opensuse-13.1-x86_64-3.11.10-25-REPAIR-netinstall.iso, so that I will be able to come out of that update without corrupting any more configuration and data during manual analysis and fix…
Yours
lipinger

Hi!

Some additional infos about the initrds of our two systems afected by the Feb 2 opensuse 13.1 x86_64 update:
in both systems,

  • all 3.11 initrds are new, all modified about 19:00 Feb 2
  • all the newly modified/created 3.11 initrds are significantly smaller than previous initrds
  • all 3.11 initrds miss any lvm files
    (and, most important: for both systems
  • I really do not see anyone/anything accessing boot configuration except auto online update)

I attach initrd information (size, mtime, lvm files contained) for both systems.

Did this update brick only our two and the one system of the original poster, or are there more of those?

Cheers!
lipinger


System 1, currently:
Linux Rescue 3.11.6-4-default #1 SMP Wed Oct 30 18:04:56 UTC 2013 (e6d4a27) x86_64 x86_64 x86_64 GNU/Linux

-rw-r--r-- 1 root root 6153366 Apr  2  2010 /tmp/boot/initrd-2.6.25.20-0.7-default
    config/lvm2.sh
    boot/62-lvm2.sh
    sbin/lvm
    etc/lvm
    etc/lvm/lvm.conf
    var/lock/lvm

-rw-r--r-- 1 root root 6188529 Apr  2  2010 /tmp/boot/initrd-2.6.25.20-0.7-debug
    config/lvm2.sh
    boot/62-lvm2.sh
    sbin/lvm
    etc/lvm
    etc/lvm/lvm.conf
    var/lock/lvm

-rw-r--r-- 1 root root 5820851 Apr  2  2010 /tmp/boot/initrd-2.6.25.20-0.7-xen
    config/lvm2.sh
    boot/62-lvm2.sh
    sbin/lvm
    etc/lvm
    etc/lvm/lvm.conf
    var/lock/lvm

-rw------- 1 root root 24675977 Feb  2 19:07 /tmp/boot/initrd-3.11.10-25-desktop
    no LVM...

-rw------- 1 root root 24676878 Feb  2 19:11 /tmp/boot/initrd-3.11.6-4-desktop
    no LVM...



System 2, currently:
Linux Rescue 3.11.6-4-default #1 SMP Wed Oct 30 18:04:56 UTC 2013 (e6d4a27) x86_64 x86_64 x86_64 GNU/Linux

-rw-r--r-- 1 root root 33671195 Oct 23  2013 /tmp/boot/initrd-3.7.10-1.1-desktop
    usr/lib/udev/collect_lvm
    usr/lib/udev/rules.d/11-dm-lvm.rules
    boot/61-lvm2.sh
    sbin/lvm
    var/lock/lvm
    config/lvm2.sh
    etc/lvm
    etc/lvm/lvm.conf
    etc/sysconfig/lvm

-rw------- 1 root root 22833897 Feb  2 18:59 /tmp/boot/initrd-3.1.10-1.16-xen
    no LVM...

-rw------- 1 root root 25089666 Feb  2 19:02 /tmp/boot/initrd-3.11.10-21-desktop
    no LVM...

-rw------- 1 root root 25089585 Feb  2 19:05 /tmp/boot/initrd-3.11.10-25-desktop
    no LVM...


I have not heard of others.

My own (32-bit) 13.1 system, using an encrypted LVM, still boots.

I look through recent messages on opensuse-bugs, and don’t see a related one. So I suggest filing a bug report on this. Mark it as urgent (or “critical” or whatever the bugzilla term is).

Can you chroot into one of the broken boxes, add lvm to /etc/sysconfig/kernel INITRD_MODULES="" and mkinitrd?

On 13.2 this would be slightly different.

Thanks for the suggestions. Although it seemed unlikely to me, I checked the free space on /boot anyhow. However, it has 139,3 of 251,6 MB free space available so no problems there.
I also checked the grub2 menu and indeed I have 2 options available: 3.11.10-25 desktop and 3.11.10-21 desktop. Tried both (in normal and recovery mode), but they give the same problem and error.

To me it’s quite clear that it was a result of the lvm2 update, I can’t believe it’s just a coincidence. But maybe it only occurs on systems where /root is on an lvm partition. Lipinger says he has that same setup too. @nrickert: do you also have /root on an lvm, or just a regular partition?
And like I said, there seems to be nothing wrong with the configuration. All partitions are shown and accessible when running from a live USB stick. So the problem is that /root just cannot be found/accessed during the boot process.

Most important to me now is the question if this can be fixed and how? If not I will ’ just’ have to do a re-install (and upgrade to 13.2 at the same time).

@Lipinger: I feel sorry for you for having these problems too, but at the same time I’m happy that I’m not alone out there. :wink:

Yes, root, home and swap are all part of the encrypted LVM. Only “/boot” and “/shared” (a partition with multimedia files, iso, and other stuff for sharing on the home network) are outside the LVM.

Possibly the lvm support in “initrd” is being triggered by the use of encryption, but not by the use of LVM without encryption (just a wild guess). Or maybe there are other differences.

Note: my two boxes do not use encrypted LVs (there are some encrypted LVs on one of those boxes, but these are only data LVs, and they are only optionally mounted and not used during boot.
Not an encrypted LV problem here.

Currently, I am trying to chroot in one (and then the other) of the boxes and re-run grub2-mkconfig.
“Trying”, because there is a problem with that: I can mount my LVs, but I can’t execute pcscan, lvscan etc… just the same behaviour since lvm2/lvmetad changes in opensuse, as it even was/is in many lvm2-using systems running a newer opensuse.
grub2-mkconfig also has a problem in the chrooted system concerning lvm - this may be a problem of my imperfect chroot (having to mount /proc, /dev, /sys … I reuse that nice work at Kleines Script für Chroot :: kanotix.com :: GNU Linux Live system based on Debian, optimized for HD-install and high performance).

Generating grub.cfg ...
Found theme: /boot/grub2/themes/openSUSE/theme.txt
Found linux image: /boot/vmlinuz-3.11.10-25-desktop
Found initrd image: /boot/initrd-3.11.10-25-desktop
Found linux image: /boot/vmlinuz-3.11.10-21-desktop
Found initrd image: /boot/initrd-3.11.10-21-desktop
Found linux image: /boot/vmlinuz-3.7.10-1.1-desktop
Found initrd image: /boot/initrd-3.7.10-1.1-desktop
Found linux image: /boot/vmlinuz-3.1.10-1.16-xen
Found initrd image: /boot/initrd-3.1.10-1.16-xen
Found linux image: /boot/vmlinuz-3.1.10-1.16-xen
Found initrd image: /boot/initrd-3.1.10-1.16-xen
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
ERROR: opening path /mounts/instsys/sys/block
ERROR: failed to discover devices
  /dev/mapper/control: mknod failed: No such file or directory
  Failure to communicate with kernel device-mapper driver.
  Check that device-mapper is available in the kernel.
done

Nevertheless, on my (chrooted) system, the lvm-challenged grub2-mkconfig produces a quite similarly crippeled grub.cfg as I suspect the Feb 2 update to have produced.

Next steps are:

  1. trying to have a chrooted run of grub2-mkconfig that has no device mapper errors.
  2. boot.
  3. do step 1 like for grub2-mkconfig for mkinitrd (as mkinitrd also moans about device mappers and lvm meta data)
  4. really boot

After mounting, you will need some additional bind mounts before the “chroot”.

Assuming that you mounted at “/mnt”


# mount --bind /dev /mnt/dev
# mount --bind /proc /mnt/proc
# mount --bind /sys /mnt/sys

From your error messages, it look as if you missed some of those.

Thanks for the hint… not only those, but also /mounts and /mounts/, as /dev/mapper/ point to that locations.

What I found:
in LVM Volumes not available after update / Pacman & Package Upgrade Issues / Arch Linux Forums, a problem is presented that looks similar to ours. Seems to be ok with kernels 3.15.x…

…such symptoms happened already exactly one year ago: That time with other versions, but similar symptoms -
cf. boot problem after lvm2 update - Install/Boot/Login - openSUSE Forums

[QUOTE]Yesterday is installed the lvm2 udpate (version lvm2-2.02.98-0.28.5.1.x86_64). After I rebooted the system I had an issue with several filesystems that wouldn’t mount. On one of my systems I have several volumegroups. Alle the logicalvolumes in those other volumegroups weren’t available. On a system that has only one volumegroup I get this message when I use a lvm command:
“WARNING: Failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.”

This version is buggy. To solve the issue revert back to the original lvm2-2.02.98-0.28.1.5.x86_64 from the OpenSUSE-13.1-Oss repository. Please let me know if that helped.
[/QUOTE]

Could it be similar reasons? My lvm2 is 2.02.98-0.28.30.1.x86_64.

Still trying to enable grub2-mkconfig and mkinitrd to see PVs, LVs and VGs again.

Yes, that’s possible.

I never had problems back then, but I do recall it causing trouble for others.

I just checked. The file “/etc/lvm/lvm.conf” has a recent date (probably from the troubling update). And it is back to “use_lvmetad = 1”.

Maybe try changing that “1” to a “0”, rebuild the “initrd” and see if that fixes the problem.

Hi, Hylke,don’t panic,this problem is not a lethal one. I am writing this reply on one of my formerly “bricked” boxes.It just was a problem of a corrupt run of mkinitrd concerning lvm volumes during the Feb 2 2015 opensuse-13.1-x86_64 update.I ran a manual mkinitrd on one the chrooted bricked system, got a working initrd, system is saved.Cheerio.If you did not save your box already, just wait until a few hours, during which I generalize my chroot/repair scipt (and verify it by saving my my bricked box II) and post that and a short solution description.Hang on! lipinger

Hi, Hylke!

These are more details about the problem that affected my machines and perhaps also your machine after the opensuse-13.1
x86_64 update on Feb 2 2015 update, around 19:00 CET.

Analysis:

I. Main problem

/boot/initrd (soft link to /boot/initrd-3.11.10-25-desktop) did not contain
lvm2 feature and lvm configuration.
This was true for those kernel 3.x initrds on both machines, which had had
a modification date of ~ 19:00 Feb 2 2051.

The file difference between an unaffected initrd (cf. remedy below) and the initrds
from the update are:

 
Linux Rescue 3.11.6-4-default #1 SMP Wed Oct 30 18:04:56 UTC 2013 (e6d4a27) x86_64 x86_64 x86_64 GNU/Linux 
 
-rw-r--r-- 1 root root 33671195 Oct 23  2013 /tmp/boot/initrd-3.7.10-1.1-desktop 
    usr/lib/udev/collect_lvm 
    usr/lib/udev/rules.d/11-dm-lvm.rules 
    boot/61-lvm2.sh 
    sbin/lvm 
    var/lock/lvm 
    config/lvm2.sh 
    etc/lvm 
    etc/lvm/lvm.conf 
    etc/sysconfig/lvm 
 
-rw------- 1 root root 22833897 Feb  2 18:59 /tmp/boot/initrd-3.1.10-1.16-xen 
    <no LVM...> 
 
-rw------- 1 root root 25089666 Feb  2 19:02 /tmp/boot/initrd-3.11.10-21-desktop 
    <no LVM...> 
 
-rw------- 1 root root 25089585 Feb  2 19:05 /tmp/boot/initrd-3.11.10-25-desktop 
    <no LVM...> 

The main remedy is to create an initrd for the desired grub boot option
explicitly containing “feature lvm2”, i.e.:

mkinitrd -k vmlinuz-3.11.10-25-desktop -i initrd-3.11.10-25-desktop -d /dev/systemvg/s2rootlv -f lvm2 -B

II. Side problems

To be able to generate a working initrd, mkinitrd needs the original system
to analyze. For that, I had to chroot into that system to prepare the nearly
original situation of the system. I include a recipe and my chroot-script
below.

Some problems occured - part of them is ignorance on my side, part of them
is strange behaviour/layout of the opensuse-13.1 configuration.

II.a mkinitrd changes grub.cfg

The mkinitrd of opensuse changes the grub(2) configuration. This is not
what I expect from an mkinitrd, and I do not like this side effect. But,
ok: there is an option -B that suppresses grub configuration change.

Solution: option -B for mkinitrd.

II.b (chrooted) grub2-mkconfig generates inadequate grub.cfg

On my chroted boxes, I could not get grub2-mkconfig to generate a usable
grub.cfg. Things related to lvm where not present in the generated
grub.cfg, especially strings expressing roots on lvm LVs where missing
completely.
Changing

use_lvmetad = 1

to =0 or back in the chroot-mounted
/etc/lvm/lvm.conf were of no use.
As the grub.cfgs produced by the Feb 2 opensuse update were not corrupt,
and as the grub2-mkconfig on the now-running boxes also works,
this was a problem of my chroot setup.
But the combination with II.a above - mkinitrd changes and corrupts
grub.cfg behind my back - did not improve my repair experience.

Solved by luck and backups of old grub.cfgs and then by the -B option in IIa.

II.c mkinitrd needs /mounts/mp_* chrooted

Still, also with -B and -f lvm2, mkinitrd did not produce a usable initrd.
Besides many messages about failing lvm info use there was a chrooted
/dev/mapper empty except for a control file linking to non-existing
files (chroot-) /mounts/mp_*.

The difference in the created initrds was rather subtle - the

-l 
lvm2

feature was there, but three files where missing without
recursively mounted $CHROOTDIR/mounts/* (on the left):

 
640d639 
< boot/11-block.sh 
705d703 
< config/lvm2.sh 
707d704 
< config/block.sh 

Funny, only a big recursive bind-mount

mount -R

did the
job. I did not look deep enough to see which mounts I was missing when
using mount -B.

Lesson learned: to me, current opensuse maintains a quite complex and
error-prone device and mount structure for lvm devices.

Solution was

mount -R /mounts $CHROOTDIR/mounts

and the same
again for every /mounts/mp_*.

III Solution:

I repaired my second box using the following steps.

III.a copy following script chroot.bash to an usb stick.

 
#!/bin/bash 
#   chroot.bash - try to solve a problem after opensuse-13.1 x86_64 update from Feb 2 2015 19:00 
#   lipi 2015 - reusing chroot.sh of unknown origin. twimc: Thanks! 
# 
#   call: ./chroot.bash <root dev name, e.g. systemvg/rootlv> 
# 
#   NOTE: sorry, no warranty - but it did the trick for me. 
#   NOTE: the only version of initrd that's re-generated is 
#         initrd-3.11.10-25-desktop. If your problem is related to another 
#         initrd (or another problem), this script is of no help. 
 
 
CHROOTDIR=/mnt/chroot 
 
# Make sure only root can run our script 
if  $EUID -ne 0 ]; then 
   echo "This script must be run as root" 1>&2 
   exit 1 
fi 
 
# Make sure that $1 is not empty 
if  ! -n "$1" ]; then 
echo -e "  You must give an argument which device you want to chroot in (without leading '/dev/') " 
echo -e "  Example '$0 systemvg/rootlv' will chroot the device /dev/systemvg/rootlv into /mnt/chroot." 
exit 1 
fi 
 
# Make sure that directory '/mnt/chroot' exists 
if  ! -d $CHROOTDIR ]; then 
 echo Error: No directory $CHROOTDIR existing. Creating it... !! >&2 
 mkdir -p $CHROOTDIR 
fi 
if  -d $CHROOTDIR ]; then 
  echo Created directory $CHROOTDIR successfully... >&2 
  echo Now chrooting into /dev/$1 ... >&2 
fi 
 
if  -d $CHROOTDIR ]; then 
mount -v /dev/$1 $CHROOTDIR 
mkdir -v -p $CHROOTDIR/mounts 
sysdirs=" 
    /dev 
    /mounts 
    /mounts/* 
    /proc  
    /sys  
" 
for x in $(ls -1d $sysdirs | sort) ; do 
  mount -v -R $x $CHROOTDIR$x 
done 
 
FSTAB="" 
 
awk '/^\// && / 	]\//' $CHROOTDIR/etc/fstab |  
while read source target rest ; do  
    mount -v $source $CHROOTDIR$target 
    FSTAB="$FSTAB $target" 
done 
 
if grep 'use_lvmetad ]*= ]*1' $CHROOTDIR/etc/lvm/lvm.conf ; then 
    echo -e "  /etc/lvm/lvm.conf: use_lvmetad is not = 1" >&2 
else 
    echo -e "  /etc/lvm/lvm.conf: changing use_lvmetad := 0" >&2 
    sed -i 's/\(use_lvmetad ]*= ]*\)1/\10/' $CHROOTDIR/etc/lvm/lvm.conf 
fi 
 
VERSION=3.11.10-25-desktop 
 
echo -e "  WORKING SOLELY ON VERSION $VERSION !" >&2 
if  ! -f $CHROOTDIR/boot/initrd-$VERSION-original ] ; then 
    echo -e "  backup copy of $(ls -l $CHROOTDIR/boot/initrd-$VERSION)" >&2 
    cp -v $CHROOTDIR/boot/initrd-$VERSION $CHROOTDIR/boot/initrd-$VERSION-original 
fi 
 
# here we are on the old box! 
echo -e " mkinitrd:" >&2 
echo -e "  generating initrd -including feature lvm2- on chrooted box; no grub.cfg change." >&2 
    chroot $CHROOTDIR \ 
        /sbin/mkinitrd -k vmlinuz-$VERSION -i initrd-$VERSION -d /dev/$1 -f lvm2 -B 
echo -e "  bash shell on chrooted box" >&2 
echo -e "    if you want to look/repair on your system, do it now." >&2 
echo -e "  To leave this bash shell, type <enter> <ctrl>-d !" >&2 
chroot $CHROOTDIR /bin/bash 
echo === left chrooted box. 
 
echo -e "  unmounting everything from the chroot directory ..." >&2 
umount -v -R $CHROOTDIR 
 
echo -e "  deleting chroot directory $CHROOTDIR ..." >&2 
rmdir -v $CHROOTDIR 
if  ! -d $CHROOTDIR ]; then 
echo -e "$CHROOTDIR deleted successfully. Good bye !!" >&2 
fi 
fi 
 
echo -e '  now, "shutdown -h now" and restart without rescue system.' >&2 
echo -e '  Good luck.' >&2 

III.b rescure system

boot system bricked by Feb 2 opensuse 13.1 update from
opensuse-13.1-x86_64 install dvd into rescue system, log in as root.

III.c usb stick

stick usb stick into bricked box running rescue system.

Search for stick using

fdisk -l

.
If there is to much output, do

fdisk -l | more

and page down
by space, enter…

Look for a section like the one below, showing an /dev/sd* with the
capacity of your stick. Probably has exactly one fat/vfat/ntfs partition.
This section looked like this for me:

 
...much text... 
 
Disk /dev/sde: 4009 MB, 4009754624 bytes, 7831552 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 512 bytes / 512 bytes 
Disk label type: dos 
Disk identifier: 0x1bf0d4df 
 
   Device Boot      Start         End      Blocks   Id  System 
/dev/sde1            2048     7831551     3914752    b  W95 FAT32 
 
...much text... 

So I mounted that stick:

 
mkdir /tmp/usb 
mount /dev/sde1 /tmp/usb 
cd /tmp/usb 

Do the same for your usb stick partition.

III.d chroot your old system

  • root device
    Try to remember what your old root partition was. HINT: this is the volume
    that did not appear when you tried to boot…

For you, Hylke, the original poster, with the boot failing this way

 
- Resume device /dev/system/swap not found (ignoring) 
- Waiting for device /dev/mapper/system-root to appear:............Could 
not find /dev/mapper/system-root. 

the root logical volume devive would be /dev/system/root.

  • start chroot.bash:
    In /tmp/usb, execute ./chroot.bash , for you, Hylke, e.g.
./chroot.bash system/root
  • the script starts a bash shell. In this bash, have a look at the
    /boot/initrd-3.* and at /boot/grub2/grub.cfg etc.

  • leave the script with -d

IIII. umount stick, reboot

cd / 
umount /tmp/usb

Remove usb stick.

shutdown -r now

This should do the trick for you.

Good luck!
lipinger

Hylke, please note:

my script sets

use_lvmetad=0

in /etc/lvm/lvm.conf.

I changed this on my boxes to avoid problems there with lvmetad.
Perhaps you want to change that back to use_lvmetad=1.

Sorry for not removing or commenting on that in my original post.

Regards
l

Did you report this on bugzilla? point them to this thread also as a good description of the problem

https://en.opensuse.org/Bugzilla

Somewhat interestingly, I had this problem not after the 3.11.1-25 update, but after the 3.11.10-29 update. I was able to use the scrip by lipinger to switch to a chrooted system and execute the mkinitrd by hand. This fixed things up nicely.

Thanks for the good work.
-A