I am currently still running the Tumbleweed release 20240624 because I have tried and failed to upgrade the system. After every upgrade, the boot becomes stuck at the systemd message “a start job is running for /dev/disk/by-uuid/fa5024…”. I am unable to debug what went wrong and thus my only choice is to boot from an older btrfs snapshot and then do sudo transactional-update rollback
(thankfully I read this blog post about transactional-update before I first installed Tumbleweed).
My computer has two disks: a SATA SSD for my home directory and a NVMe SSD for everything else. Both disks use full disk encryption. My setup looks like this:
$ lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda LVM2_member LVM2 001 veF68U-DZ7m-kyJQ-6sZc-YLA1-C1oR-YYsku2
└─main--data-data--lv crypto_LUKS 1 fa5024db-d236-46e0-b40e-af70897e1728
└─cr_main--data-data--lv btrfs 1ec87e06-f4d3-4416-9bc5-de0776f9e467 3T 16% /home
nvme0n1
├─nvme0n1p1 vfat FAT32 66AC-EA1A /boot/efi
├─nvme0n1p2 vfat FAT32 66AD-2D8F
├─nvme0n1p3 crypto_LUKS 1 78f303b1-7c27-4f30-8de7-294d988f1b7b
│ └─cr_root btrfs 5632b905-1c13-4fe6-ad6f-2829cea25893 359.4G 19% /usr/local
│ /srv
│ /opt
│ /boot/writable
│ /boot/grub2/x86_64-efi
│ /boot/grub2/i386-pc
│ /.snapshots
│ /var
│ /root
│ /
└─nvme0n1p4 crypto_LUKS 1 54f7c85a-c2e4-4320-807d-e5b3868b9445
└─cr_swap swap 1 f6eeeddf-9c4d-488f-a94f-0577bc1f9d76 [SWAP]
$ ls -l /dev/disk/by-uuid/fa5024db-d236-46e0-b40e-af70897e1728
lrwxrwxrwx 1 root root 10 Sep 7 20:14 /dev/disk/by-uuid/fa5024db-d236-46e0-b40e-af70897e1728 -> ../../dm-1
It appears to me that the start job is waiting for the disk with UUID fa5024...
which holds LUKS encrypted data. It appears to use device mapper, although I installed Tumbleweed on this system quite a few years ago and I cannot recall why it used device mapper.
$ sudo dmsetup info
[sudo] password for root:
Name: cr_main--data-data--lv
State: ACTIVE
Read Ahead: 1024
Tables present: LIVE
Open count: 1
Event number: 0
Major, minor: 254, 2
Number of targets: 1
UUID: CRYPT-LUKS1-fa5024dbd23646e0b40eaf70897e1728-cr_main--data-data--lv
Name: cr_root
State: ACTIVE
Read Ahead: 1024
Tables present: LIVE
Open count: 1
Event number: 0
Major, minor: 254, 0
Number of targets: 1
UUID: CRYPT-LUKS1-78f303b17c274f308de7294d988f1b7b-cr_root
Name: cr_swap
State: ACTIVE
Read Ahead: 1024
Tables present: LIVE
Open count: 2
Event number: 0
Major, minor: 254, 3
Number of targets: 1
UUID: CRYPT-LUKS1-54f7c85ac2e44320807de5b3868b9445-cr_swap
Name: main--data-data--lv
State: ACTIVE
Read Ahead: 1024
Tables present: LIVE
Open count: 1
Event number: 0
Major, minor: 254, 1
Number of targets: 1
UUID: LVM-VmW33sTeAZFxqxGZ1GmApEMPQth7j0LmLDJJBwlfPDWS7C921JnpGaXnHeux4RVg
I believe the start job comes from the fact that the volume is mentioned in etc/crypttab
:
$ sudo cat /etc/crypttab
cr_root UUID=78f303b1-7c27-4f30-8de7-294d988f1b7b /.root.key x-initrd.attach
cr_swap UUID=54f7c85a-c2e4-4320-807d-e5b3868b9445 /.root.key x-initrd.attach
cr_main--data-data--lv UUID=fa5024db-d236-46e0-b40e-af70897e1728 /.root.key x-initrd.attach,keyfile-timeout=10s
I have attempted the following strategies to debug the system:
- Since I’m running Linux 6.9.5 and the new Tumbleweed releases that fail to boot use Linux 6.10.x, I attempted to rule out kernel malfunctions by enabling multiversion for the kernel in
/etc/zypp/zypp.conf
and I managed to keep the 6.9.5 kernel together while using the updated packages from the rest of the system. It still didn’t boot. - I tried to use the new system but edit the GRUB boot entry with various things, including appending to the kernel command line
systemd.unit=emergency.target
. That did not boot either. - I also tried to get a shell by specifying
init=/usr/bin/bash
in the kernel command line but that appeared to have no effect. - I thought perhaps the decryption using the keyfile had an issue, so I changed the
crypttab
to always prompt me for a password. That worked with the old Tumbleweed release but not any new release. - I looked into whether there’s a way to do
zypper ref
but refresh into an older version so that I can attempt to upgrade to some newer version and do some bisecting. I didn’t find any way to update to a Tumbleweed snapshot that’s not the latest. - Since I suspect initrd to be the problem, I did
sudo lsinitrd /.snapshots/137/snapshot/boot/initrd
(where 137 is a snapshot that didn’t boot) and looked at the output; I compared it with the old initrd for the 6.9.5 kernel and didn’t find anything suspicious.
I’m out of ideas on how to troubleshoot this further. Please help.