zypper dup breaks root/sudo and reboot fails at switching root

wolfi323 · February 1, 2018, 6:44pm

Yes.

I actually saw that bug report, but forgot about it…

Let’s hope you get a reply now.
Personally I don’t really have another idea currently, except trying to run mkinitd on the broken system.
But that may be difficult if you cannot boot the system in the first place.

I suppose it should be possible to boot from a LiveCD or similar and then switch to the installed system with chroot (after mounting the LVM of course) to do that. But I cannot really give further instructions for that either…

Mountainerd · February 1, 2018, 6:55pm

Actually, after the upgrade breaks, I can reboot into maintenance mode. Theoretically, I should be able to run mkinitd from there.

Mountainerd · February 1, 2018, 7:19pm

Since 20181030 was available, I went ahead and removed the locks and upgraded. It broke the system again. I rebooted into maintenance mode, tried mkinitrd, no change. I’ll try a little bit longer (days, maybe a week or so) but I don’t want to keep those held back too long. I may end up simply downloading the latest snapshot and reinstalling.

gsgx · February 3, 2018, 10:05pm

I have similar issues with tumbleweed upgrades since a few days.
Similar setup - DELL Precision (similar to XPS 15) Laptop, Kaby Lake CPU
encrypted /boot (ext2)
encrypted / (ext4)
encrypted /home (ext4)
all within a LVM on a LUKS partition.

when I upgraded to 20180128 or 20180126 (i am not sure which) which came with the 4.14.15 kernel and a big bunch of other upgrades - may be 1.5GB in total I was not able to boot any more with the new kernel.
With .15 there is no prompt for the encrypted root partition and it does not get mounted. After some delays i will get a dracut emergency console.

The kept kernel 4.14.12 (4.14.12-1-default #1 SMP PREEMPT Fri Jan 5 18:15:55 UTC 2018 (3cf399e) x86_64)
is okay. But I had to repair the grub / initramfs by booting an iso image into repair mode and then unencrypting, mounting /boot, /, /home … chroot … make … The usual procedure when the system does not boot.

zypper lr -d -E
Repository priorities in effect:                                                                                   (See 'zypper lr -P' for details)
      98 (raised priority)  :  1 repository  
      99 (default priority) :  5 repositories
     102 (lowered priority) :  1 repository  

#  | Alias                          | Name                                                   | Enabled | GPG Check | Refresh | Priority | Type   | URI                                                                      | Service
---+--------------------------------+--------------------------------------------------------+---------+-----------+---------+----------+--------+--------------------------------------------------------------------------+--------
 1 | Education                      | Applications for education users (openSUSE_Tumbleweed) | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/repositories/Education/openSUSE_Tumbleweed/ |        
 2 | devel_gcc                      | devel_gcc                                              | Yes     | (r ) Yes  | Yes     |  102     | rpm-md | https://download.opensuse.org/repositories/devel:/gcc/openSUSE_Factory/  |        
 3 | download.nvidia.com-tumbleweed | nVidia Graphics Drivers                                | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | https://download.nvidia.com/opensuse/tumbleweed                          |        
 8 | openSUSE_Tumbleweed            | openSUSE_Tumbleweed_packman                            | Yes     | (r ) Yes  | Yes     |   98     | rpm-md | ftp://ftp.gwdg.de/pub/linux/misc/packman/suse/openSUSE_Tumbleweed/       |        
10 | repo-non-oss                   | openSUSE-Tumbleweed-Non-Oss                            | Yes     | (r ) Yes  | Yes     |   99     | yast2  | http://download.opensuse.org/tumbleweed/repo/non-oss/                    |        
11 | repo-oss                       | openSUSE-Tumbleweed-Oss                                | Yes     | (r ) Yes  | Yes     |   99     | yast2  | http://download.opensuse.org/tumbleweed/repo/oss/                        |        
13 | repo-update                    | openSUSE-Tumbleweed-Update                             | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/update/tumbleweed/

Further upgrades did not help, have to stay with 4.14.12

cat /etc/os-release
NAME="openSUSE Tumbleweed"
# VERSION="20180130 "

rpm -q ucode-intel
ucode-intel-20171117-4.1.x86_64

I have setup tw to keep several kernel versions. As 4.14.15 gave issues, i removed it.
after removing the 4.14.15 kernel these kernel packages are left:

rpm -qa kernel-*
kernel-default-4.14.12-1.5.x86_64
kernel-default-devel-4.14.12-1.5.x86_64
kernel-docs-html-4.14.15-1.6.noarch
kernel-docs-4.14.15-1.6.noarch
kernel-syms-4.14.12-1.5.x86_64
kernel-devel-4.14.12-1.5.noarch
kernel-devel-4.14.6-1.8.noarch
kernel-devel-4.10.13-1.4.noarch
kernel-default-devel-4.10.13-1.4.x86_64
kernel-default-devel-4.14.6-1.8.x86_64
kernel-macros-4.14.15-1.6.noarch
kernel-source-4.14.12-1.5.noarch
kernel-source-4.10.13-1.4.noarch
kernel-default-4.14.6-1.8.x86_64
kernel-syms-4.14.6-1.8.x86_64
kernel-default-4.10.13-1.4.x86_64
kernel-default-base-4.14.15-1.6.x86_64
kernel-source-4.14.6-1.8.noarch
kernel-firmware-20180104-1.2.noarch
kernel-syms-4.10.13-1.2.x86_64

removing kernel-default-base-4.14.15-1.6.x86_64 was not possible thou.

wolfi323 · February 4, 2018, 10:05am

Well, if it works fine for you with the older kernel, it seems to be caused by a change in the kernel, and that would sound unrelated to the problem discussed here…

Btw, the latest/current kernel is 4.15.0…

removing kernel-default-base-4.14.15-1.6.x86_64 was not possible thou.

Why? What happens when you try to remove it?

kernel-default-base is intended for minimal VMs mostly, and is useless for normal systems, because it lacks most of the hardware drivers.
So you better should uninstall it.

My guess: you do have kernel-docs, kernel-docs-html and kernel-macros in version 4.14.15 installed as well, you’d probably need to remove them too as they may require a kernel 4.14.15.

gsgx · February 4, 2018, 2:58pm

I am quite sure it is not a kernel issue alone. Otherwise there would be hundreds of reports.
My guess, in my case it is a result of the combination of kernel, new udev/systemd, encrypted partition with LVM boot and root and may be more. And that seems a bit similar to the other cases. Just a guess and an idea it might be worth to drill down to details if there are more people with these problems.

When booting .12 I get the password prompt for the encrypted partition, but not when trying to boot .15. Is this a sole kernel issue? My understanding is the password prompt is issued by some other process?

I am a bit shy on trying .0 kernels. I prefer to wait for some .3 or so version.

kernel-macros is needed for the devel kernels. I need the -devel kernels to have the VMware workstation modules compiled for every new kernel.
I did not find a way to keep macros etc per kept kernel version.
Could zypper be setup accordingly?

gsgx · February 4, 2018, 4:59pm

Tried to verify that.

It seems your hint about the base kernel was a good starting point to find the root cause. In the dracut console I found /dev/nvme* are not available. Probably because the modules for these are missing in the base .15 kernel which is now the only one left. There are no drivers for the NVME disks where root etc reside.

So in the current setup I am not able any more to pinpoint possible causes for the boot problems I had in the beginning. Issue closed.

pylkko · February 8, 2018, 5:55pm

I have now had this problem twice, both times after upgrade and both times I have managed to fix it. I fixed by mounting the btrfs on a Fedora live USB: The btrfs automatically does recovery actions when mounting, and reports quota system inconsistency and misplaced extents. So, I’m guessing that the experimental quota system that btrfs devs consider experimental is acting up some how. Or why else would upgrading the system corrupt the btrfs subvolume? And why would it it be fixed mounting it. Also, why cannot this be done in the recovery shell offered by the initramdisk. Does it not do btrfs?