“unable to mount root fs on”
My purpose in posting this is to potentially help others, and maybe myself
if I do something stupid again, and possibly to discuss the possibility of
opening a bug or enhancement to prevent others from being bitten.
System:
HP Envy something-or-another, new a few months ago, 16 GB RAM, probably
has an i7 in it.
1 TB drive, partitioned to use the UEFI Secure Boot extension, so it has
/dev/sda1 (efi I think), /dev/sda2 (/boot, around 150 MB), and /dev/sda3
(LVM with a single volume for /). The LVM volume group implements
encryption thanks to Yast making it a single checkbox (love it!).
What I did:
Yesterday I applied all patches available to the system. This is not my
system, but is one that I support (I will support (or try to) most Linux
systems that come my way, but this one is particularly important to me).
The patches applied properly with nothing more (at the end… I did not
read all of the output, not that I really think it would have helped) than
a note about running processes using old files, as usual (use ‘zypper ps’
to see which processes using which files). I then backed up the system
and finally rebooted this morning (100 GB to backup… took some time).
During bootup the system dropped me nicely at the grub prompt… not
grub-rescue, or the grub menu, but just grub. After plenty of poking
around (learning Grub 2 in the process… mostly it has all just worked so
my familiarity with Grub (legacy) paid off a little, but I’m a newb here)
I noticed that the grub.cfg file (/boot/grub2/grub.cfg) was missing some
important sections… you know, those “menuentry” ones. Well no wonder
I’m at a grub prompt. Thankfuly grub can ‘cat’ the files (‘set pager=1’
to add “more”-like paging) so I could see what to type as a
system-specific cheat sheet. The commands I typed were close to the
following (note, hold Shift during bootup until grub shows up in some
form, and then press Esc quickly, then press ‘c’ to get to a prompt if not
otherwise able to get there):
set root=‘hd0,gpt2’
linuxefi /vmlinuz-3.7.10-16-desktop root=/dev/mapper/sdavg-rootvol
initrdefi /initrd-3.7.10-16-desktop
This managed to get me a little bit further, but the kernel panicked when
it did not find a valid root filesystem. The message mentioned trying
ext2 and other filesystems, but as I had not been prompted for my usual
passphrase I knew it had not yet managed to get to the encryption part.
Maybe it was a driver problem, I thought, but I knew that openSUSE
helpfully keeps the old kernel around, so I tried it. Those commands,
after rebooting, looked like the following, give or take, going from memory:
set root=‘hd0,gpt2’
linuxefi /vmlinuz-3.7.10-1.1.1-desktop root=/dev/mapper/sdavg-rootvol
initrdefi /initrd-3.7.10-1.1.1-desktop
This allowed me to boot. While booted I was semi-intelligent and verified
the latest installed RPMs were all verified properly. The kernels were
not new (dated from June as I recall) so those should be fine, but just
for fun I verified those too (rpm -V <packageName>). No issues there that
I could see. It was at this point, when trying to fix my grub.cfg file, I
realized I had a space issue. Indeed, /boot was 100% used. Being very
unintelligent I decided to remove one of the two kernels (I only need one,
the other is just left there for safety, right?) and so, stupidly, I
removed the older kernel. Yes, the working one. I don’t know why I did
that, but it was stupid. Cleaned up the grub.cfg file, hoping that was
the real problem (even though I wasn’t using that to fail booting the new
kernel, or succeed in booting the old one, manually from the grub prompt)
and then held my breath and rebooted. Luckily I had copied out the
/boot/grub* data first to my other machine, which was helpful later.
Bootup happily failed with the kernel panic again, so the problem wasn’t
the grub.cfg file (duh), I now had space to do things, but I couldn’t get
to the encrypted volume to use the full set of commands. Also, with my
old kernel now removed, I was no longer able to boot properly. Lesson to
learn: don’t be stupid.
Using the 12.3 install USB stick I used to originally install I went into
rescue mode. I hope these steps will help others. I opened the encrypted
volume, hooked up to LVM, and then created a directory structure to use
chroot and mounted things there. I do not have all of the commands
written down, but this should be close (login as ‘root’; no password
necessary):
modprobe dm-mod #encryption kernel module load
mkdir /mnt/sdaroot #chroot jail base dir
cryptsetup luksOpen /dev/sda3 crsda1 #open the encrypted area
vgscan #scan for LVM volume groups
vgchange -a y #make all volume groups “active”… only had one
lvscan #find volumes; returned ‘sdaroot’ as the volume name
mount /dev/mapper/sdaroot /mnt/sdaroot #mount the LVM volume for /
mount --bind /proc /mnt/sdaroot/proc #I need /proc
mount --bind /sys /mnt/sdaroot/sys #and /sys
mount --bind /dev /mnt/sdaroot/dev #and /dev
mount /dev/sda2 /mnt/sdaroot/boot #and of course, the real /boot
chroot /mnt/sdaroot /bin/bash
I think that was everything, and at the end of that I was back in my
system at the command line with the regular, encrypted, filesystem able to
be modified. Finally realizing the /boot space issue may have affected
more than a stupid tiny config file (another ‘duh’ moment) I figured I’d
try the initrd, since that was the next-most-likely place a failure would
likely happen in /boot due to space. I was lucky and though mkinitrd
threw errors (eventually leading to the fake filesystems above being
mounted, like /proc and /dev), including the final time it ran, the initrd
appeared to be fine (I opened it up into the /tmp directory to look at it
and saw the ‘dm*’ kernel modules, and no errors when decompressing (gzip)
or extracting (cpio) the initrd).
The stupid error that did continue after running mkinitrd left me with a
completely worthless grub.cfg file, but I noticed it before rebooting so
copied it from the backup of /boot/grub* I had on my other system. This
backup included the grub.cfg.old file (thank-you grub folks) which was
from before the patching process which worked nicely. Rebooted and came
up normally.
How can this be improved? It’d be really nice if the post-install scripts
for the kernel, or pieces of grub2, would throw really big ugly nastygrams
if anything fails, particularly with initrd or the grub.cfg-generating
scripts. Maybe even better, fail to install with a pre-install check of
the space available within /boot. I’m used to running SLES systems which
don’t need as much in /boot (no backups, no EFI) and my 12.2 primary
laptop is only using 40 MB, so while I was a bit naive in making this
other laptop only have 150 MB for /boot, it seemed sufficient for my needs.
Thoughts welcome.
Good luck.