Boot problem with newer kernels, seems to have no nvme support, stuck on kernel 6.0.12-1-defaul

samgam · July 20, 2023, 8:13am

Hi!

a few months ago i fixate the kernel version to the above because of AMD radeon driver issues. Tumbleweed does install newer kernels but fixed the default kernel to the above. When I try to boot a newer kernel now (tried 6.4.3 today), boot process stuck on mounting a uuid of my encrypted boot partition, that seems to does not exist.

Any ideas?

qwert.zuiop · July 20, 2023, 11:31am

might be the same issue as here,

mrmazda · July 20, 2023, 12:13pm

What exactly do you mean by “boot process stuck”? Is boot process actually completing, just without working login manager or DE (e.g. Ctrl-Alt-F3 provides a shell prompt)? Which AMD GPU do you have? What error messages do you find in journal, dmesg, Xorg.0.log and .xsession-errors?

iamjiwjr · July 20, 2023, 12:49pm

I get a “shim SBAT data failed” message before I can even get to the intial menu. Major security violation. Then a quick shut down. Is this similar? I’ve not been able to boot for over a week now. I have intel 11th gen cpu and amd gpu.

arvidjaar · July 20, 2023, 1:54pm

Did you ever boot Leap or other Linux distribution on this system?

nrickert · July 20, 2023, 2:19pm

That seems to be a different problem from the one that this topic is about.

Your easiest way around this SBAT problem is to disable secure-boot in your BIOS, and leave it disabled until Tumbleweed gets a newer “shim”. See, also, Bug 1209985.

iamjiwjr · July 20, 2023, 2:48pm

Please forgive me. I knew I was wrong. Thank you for not clobbering me. I’m uber frustrated. This goes on and on and on and no solution. I know about the 4 month old bug report. If secure boot wasn’t important it wouldn’t be included in the default setup. Disabling it is not a good solution.

I’m camping at Fedora. They took care of business. Sad state of affairs.

nrickert · July 20, 2023, 5:14pm

Understood.

If you want an alternative way of dealing with this, other than disabling secure-boot, you should start a separate thread. Let’s leave this topic to the problem described in the opening post.

samgam · July 26, 2023, 7:42am

Hi, vacation time…

No, the kernel 6.0.12 can still boot and is booting ever since.

Bootup log says…

Warning: dracut-initqueue: starting timeout scripts
Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
Warning: /lib/dracut/hooks/initqueue/finished/90-crypt.sh: "[ -e /dev/disk/by-id/dm-uuid-CRYPT-LUKS?-*70886...533c*-* ] || exit 1"
Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdisk\x2fby-uuid\x2f4b9f4d82-520d-4821-9081-8102989d7327.sh: "if !grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
    [ -e "dev/disk/by-uuid/4b9f4d82-520d-4821-9081-8102989d7327" ]
fi"
Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-uuid\x2f550B-F516.sh: "[ -e "/dev/distk/by-uuid/550B-F516" ]"
Warning: dracut-initqueue: starting timeout scripts

many times finishing with…

Warning: dracut-initqueue: starting timeout scripts
Warning: Could not boot.
Starting Dracut Emergency Shell...

So it seems for me the disks LUKS needs to decrypt to serve unencrypted block devices do not exist. That’s the reason why I thought, that my nvme drive has no driver in the newer kernel versions.

Infos about my setup:

lsblk -fN
NAME    FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS                                TYPE MODEL                      SERIAL              REV TRAN   RQ-SIZE  MQ
nvme0n1                                                                                   disk SAMSUNG MZVL22T0HBLB-00B00 S677NF0RB03122 GXB7401Q nvme      1023  32

lsblk -f
NAME        FSTYPE      FSVER LABEL     UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                         
├─sda1                                                                                      
├─sda2      vfat        FAT32 EFI       FD7D-C9DE                                           
└─sda3      zfs_member  5000  boot-pool 8231139060244674825                                 
nvme0n1                                                                                     
├─nvme0n1p1                                                                                 
├─nvme0n1p2 vfat        FAT32           550B-F516                            1016.7M     1% /boot/efi
├─nvme0n1p3 crypto_LUKS 1               70886408-93e5-4227-8a87-3a02e49e533c                
│ └─cr_root btrfs                       4b9f4d82-520d-4821-9081-8102989d7327   73.9G    22% /var
│                                                                                           /usr/local
│                                                                                           /srv
│                                                                                           /root
│                                                                                           /opt
│                                                                                           /boot/grub2/x86_64-efi
│                                                                                           /boot/grub2/i386-pc
│                                                                                           /.snapshots
│                                                                                           /
├─nvme0n1p4 zfs_member  5000  pool500g  1886119262578276924                                 
├─nvme0n1p5 ext4        1.0             d034b6dc-de7e-42cd-ac82-1b48007cff1a                
└─nvme0n1p6 crypto_LUKS 2               d8bf8d45-d642-4df7-801a-a34ec9eb8ee2                
  └─cr_home btrfs                       8f1f57de-6cc4-4463-a947-a17af3df228a  358.3G    20% /home

It’s a bit confusing, because the IDs are from both the vfat and the cr_root partition.

Again, this happens when I choose for example kernel 6.4.3 at boot time, not with the 6.0.12-1-default. My actual kernel versions are:

# rpm -qa|grep kernel-default
kernel-default-6.0.12-1.1.x86_64
kernel-default-base-6.4.3-1.1.28.3.x86_64
kernel-default-base-6.3.9-1.1.27.9.x86_64
kernel-default-6.1.3-1.1.x86_64

arvidjaar · July 26, 2023, 8:00am

Did you try to check it using dracut emergency shell?

nrickert · July 26, 2023, 10:21am

samgam:

My actual kernel versions are:

# rpm -qa|grep kernel-default
kernel-default-6.0.12-1.1.x86_64
kernel-default-base-6.4.3-1.1.28.3.x86_64
kernel-default-base-6.3.9-1.1.27.9.x86_64
kernel-default-6.1.3-1.1.x86_64

That’s likely to be your problem.

You should be using “kernel-default” but not “kernel-default-base”.

As far as I know, “kernel-default-base” is a stripped down kernel that lacks some of the drivers that you probably need. Try installing the full kernel to replace “kernel-default-base”.

mrmazda · July 26, 2023, 7:21pm

+1

# zypper info kernel-default-base | grep only
    This package contains only the base modules, required in all installs.
#

My interpretation is that kernel-default-base is intended essentially for VM use. Hardware support modules that are only needed by VM hosts are those omitted.

samgam · July 27, 2023, 6:22am

You are so right! I installed the newest version kernel-default-6.4.4 and it boots up my system again. Many thanks for that.

It seems, after fixing the kernel version to 6.0.12 it silently switched to kernel-default-base without my recognition. Boot menu also never told me about the fact.

What I did a few months ago was (my user interface is german, so i try to translate it correctly):
with 6.0.12 (my old version, that worked for my)
Open Yast → Install or remove software → search for kernel-default → right click on kernel-default → protect [*] → apply → that’s it.

What I did today:
same first steps but unprotect [*] kernel-default → it automatically installed the newest version 6.4.4 beside 6.0.12. Now it boots up again.

I have to say, I assumed protecting does not mean never install a newer version beside the protected version. The fact, that newer versions show up from time to time strengthened the assumption. I never looked close enough to see, that it’s a difference, or maybe ignored it because it could have been a meta package or similar.

So again many thanks for your help!