failed to boot LEAP 42.2 after software updates: /dev/RAID1/SUSE42.2-Root does not exist

On July 29th, I installed updates (Software Updates found in Status & Notifications) as usual and eventually logged out. The next day, I wasn’t able to boot the system. I got messages something like:

I/O error dev fd0

warning: dracut_initqueue timeout - starting timeout scripts

could not boot
/dev/RAID1/SUSE42.2-Root does not exist
might want to regenerate your initramfs

I tried to boot the previous kernel using the advanced boot options in grub but had the same problem.

I have a system with 4 drives containing a software RAID 5 which holds an LVM. I use primary partitions on the drives (sdb2 in the case of SuSE 42.2) for boot partitions and then have logical volumes for root, home and other partitions. So I installed another copy of SuSE 42.2 in new boot and root partitions. It was a pretty straight forward out of the box installation. After the first boot, I logged in as root and was prompted to install updates. After applying updates, I rebooted and ran into the same problem.

Since SuSE 42.3 had just been released, I decided to try installing it. After the first boot, I logged in as root and was informed that there were 11 updates. I looked at the list of updates and decided against installing opensuse-2017-847 which is a recommended update for systemd, dracut. After the 10 updates were applied, I rebooted and had no problems. I decided to install the remaining update and when I rebooted I ran into the same problem again.

I’m now back to SuSE 42.3 without the offending recommended patch, but that isn’t a long term solution. Are there any known issues with a logical volume root partition on top of a software RAID?

I’d ask if

  • Any part of your system was an upgrade from a previous version of openSUSE
  • If you’re multi-booting multiple OS or distros

If either of the above might be true even in part,
I suspect that the particular OS you’re attempting to boot into is likely identified differently than what your GRUB is configured to do, and is possibly defaulting to or otherwise think it should be booting into a floppy drive. rotfl!

Or, maybe you made a configuration mistake. I notice that some openSUSE/SUSE documentation recommend identifying your RAID volume as 0xFD.

That is why you’re being prompted to regenerate your initramfs based on the possibility you modified GRUB but your changes but didn’t do the last required step which is to regenerate. Your changes could be just sitting there, not yet activated.

The problem appears to be related to the software RAID. I installed SuSE 42.3 from DVD with root (no separate boot) on a primary partition (sdb4) and /home residing on a logical volume which is in on a RAID. This SuSE 42.3 install booted fine. I applied all suggested updates except opensuse-2017-847. I rebooted with no problems. I applied the opensuse-2017-847 update and when I rebooted, the /home directory wasn’t found so I was prompted to fix things at the command line. I edited the fstab to remove the /home directory. After that I was able to boot the system, but the RAID did not exist. I ran the program: mdadm --assemble --scan. The RAID was found and initialized, but the next time I booted – no RAID. This indicates to me that the opensuse-2017-847 update affected the software RAID initialization.

I did note that I got the following message at the console:

systemd-gpt-auto-generator: /dev/sdb: failed to determine partition table type: Input/output error.

I use DOS, not GPT, partition tables on my drives (it’s been around for a while:(). Is it possible that this failure stops initialization of the RAID?

The openSuSE 42.3 was a fresh install so there shouldn’t be an issue with previous versions. I am multi-booting, but in the case of the openSuSE 42.3 boot, it indicated it couldn’t find /dev/RAID1/SUSE42.3 which is where I had placed the latest install. As for the floppy message, I don’t have one and I updated my BIOS to indicate there isn’t one.:wink: Everything boots fine until I apply the opensuse-2017-847 update and then it’s lights out (see the prior post).

As you can reproduce it and determine exact patch that causes this problem, you should open bug report. There was similar error recently, but it was related to specific hardware (NVMe drives). Are you using them by any chance?

Thanks, I’ll look into submitting a bug report and no to NVMe drives. I’m also looking into using btrfs snapshots to diagnose the problem.

I tried to reproduce it but it works. I installed 42.3 in VM on LV on 4 drives in RAID5 with /boot on separate partition on one drive. After installing all patches it boots just fine. So at the very least it is not universal problem.

So I tried reverting 3 of the scripts modified by the update. This is the snapper diff of the changes I made:

— /.snapshots/33/snapshot/usr/lib/dracut/modules.d/95udev-rules/module-setup.sh 2017-08-09 19:06:30.307530679 -0600
+++ /.snapshots/34/snapshot/usr/lib/dracut/modules.d/95udev-rules/module-setup.sh 2017-08-09 19:08:43.868745682 -0600
@@ -56,10 +56,6 @@
# eudev rules
inst_rules 80-drivers-modprobe.rules

  • bsc#1040153

  • inst_rules 61-persistent-storage-compat.rules
  • inst_multiple -o ${udevdir}/compat-symlink-generation
  • if dracut_module_included “systemd”; then
    inst_multiple -o ${systemdutildir}/network/.link
    $hostonly ]] && inst_multiple -H -o /etc/systemd/network/
    .link
    — /.snapshots/33/snapshot/usr/lib/udev/rules.d/60-persistent-storage.rules 2017-07-19 10:33:31.000000000 -0600
    +++ /.snapshots/34/snapshot/usr/lib/udev/rules.d/60-persistent-storage.rules 2017-08-09 19:20:56.855112253 -0600
    @@ -44,8 +44,17 @@

SCSI devices

KERNEL==“sd*!0-9]|sr*”, ENV{ID_SERIAL}!=“?", IMPORT{program}=“scsi_id --export --whitelisted -d $devnode”, ENV{ID_BUS}=“scsi”
KERNEL=="cciss
”, ENV{DEVTYPE}==“disk”, ENV{ID_SERIAL}!=“?", IMPORT{program}=“scsi_id --export --whitelisted -d $devnode”, ENV{ID_BUS}=“cciss”
-KERNEL=="sd
|sr*|cciss*”, ENV{DEVTYPE}==“disk”, ENV{ID_SERIAL}==“?", SYMLINK+=“disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}”
-KERNEL=="sd
|cciss*”, ENV{DEVTYPE}==“partition”, ENV{ID_SERIAL}==“?", SYMLINK+=“disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n”
+KERNEL=="nvme
”, ENV{DEVTYPE}==“disk”, ENV{ID_SERIAL}!=“?", IMPORT{program}=“scsi_id --export --whitelisted -d $tempnode”, ENV{ID_BUS}=“nvme”
+KERNEL=="sd
|sr*|cciss*|nvme*”, ENV{DEVTYPE}==“disk”, ENV{ID_SERIAL}==“?", SYMLINK+=“disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}”
+KERNEL=="sd
|cciss*|nvme*”, ENV{DEVTYPE}==“partition”, ENV{ID_SERIAL}==“?", SYMLINK+=“disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n”
+
+# scsi compat links for ATA devices
+KERNEL=="sd
!0-9]”, ENV{ID_BUS}==“ata”, PROGRAM=“scsi_id --whitelisted --replace-whitespace -p0x80 -d $devnode”, RESULT==“?“, ENV{ID_SCSI_COMPAT}=”$result", SYMLINK+=“disk/by-id/scsi-$env{ID_SCSI_COMPAT}”
+KERNEL=="sd
[0-9]”, ENV{ID_SCSI_COMPAT}==“?", SYMLINK+=“disk/by-id/scsi-$env{ID_SCSI_COMPAT}-part%n”
+
+# scsi compat links for ATA devices (for compatibility with udev < 184)
+KERNEL=="sd
!0-9]”, ENV{ID_BUS}==“ata”, PROGRAM=“scsi_id --truncated-serial --whitelisted --replace-whitespace -p0x80 -d$tempnode”, RESULT==“?“, ENV{ID_SCSI_COMPAT_TRUNCATED}=”$result", SYMLINK+=“disk/by-id/scsi-$env{ID_SCSI_COMPAT_TRUNCATED}”
+KERNEL=="sd
[0-9]”, ENV{ID_SCSI_COMPAT_TRUNCATED}==“?*”, SYMLINK+=“disk/by-id/scsi-$env{ID_SCSI_COMPAT_TRUNCATED}-part%n”

FireWire

KERNEL==“sd*!0-9]|sr*”, ATTRS{ieee1394_id}==“?“, SYMLINK+=“disk/by-id/ieee1394-$attr{ieee1394_id}”
@@ -66,6 +75,11 @@
ENV{DEVTYPE}==“disk”, ENV{ID_PATH}==”?
”, SYMLINK+=“disk/by-path/$env{ID_PATH}”
ENV{DEVTYPE}==“partition”, ENV{ID_PATH}==“?*”, SYMLINK+=“disk/by-path/$env{ID_PATH}-part%n”

+# by-path (parent device path, compat version, only for ATA/NVMe/SAS bus)
+ENV{DEVTYPE}==“disk”, ENV{ID_BUS}==“ata|nvme|scsi”, DEVPATH!=“/virtual/”, IMPORT{program}=“path_id_compat %p”
+ENV{DEVTYPE}==“disk”, ENV{ID_PATH_COMPAT}==“?“, SYMLINK+=“disk/by-path/$env{ID_PATH_COMPAT}”
+ENV{DEVTYPE}==“partition”, ENV{ID_PATH_COMPAT}==”?
”, SYMLINK+=“disk/by-path/$env{ID_PATH_COMPAT}-part%n”
+

probe filesystem metadata of optical drives which have a media inserted

KERNEL==“sr*”, ENV{DISK_EJECT_REQUEST}!=“?“, ENV{ID_CDROM_MEDIA_TRACK_COUNT_DATA}==”?”, ENV{ID_CDROM_MEDIA_SESSION_LAST_OFFSET}==“?*”,
IMPORT{builtin}=“blkid --offset=$env{ID_CDROM_MEDIA_SESSION_LAST_OFFSET}”
— /.snapshots/33/snapshot/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 2017-07-19 10:33:31.000000000 -0600
+++ /.snapshots/34/snapshot/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 1969-12-31 17:00:00.000000000 -0700
@@ -1,71 +0,0 @@
-# Do not edit this file, it will be overwritten on update.

-# This file contains depecrated rules kept only for backward

I removed the last file so I don’t include all the changes

After rebooting, at least the RAID was activated. The /proc/mdstat file, which hadn’t been showing up, showed the RAID status. The following is a grep for raid in the output of the dmesg command:

2.967352] raid6: sse2x1   gen()  4801 MB/s
3.035334] raid6: sse2x1   xor()  4806 MB/s
3.103349] raid6: sse2x2   gen()  8126 MB/s
3.171334] raid6: sse2x2   xor()  8154 MB/s
3.239348] raid6: sse2x4   gen()  8822 MB/s
3.307343] raid6: sse2x4   xor()  3932 MB/s
3.307345] raid6: using algorithm sse2x4 gen() 8822 MB/s
3.307345] raid6: .... xor() 3932 MB/s, rmw enabled
3.307346] raid6: using intx1 recovery algorithm
8.673240] md/raid:md0: device sdc1 operational as raid disk 2
8.673247] md/raid:md0: device sdb1 operational as raid disk 1
8.673250] md/raid:md0: device sda6 operational as raid disk 0
8.673251] md/raid:md0: device sdd1 operational as raid disk 3
8.674295] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2

I was not seeing the last 5 lines prior to making the script changes. I’m not sure what to try next, but I’m won’t be able to play with this for a couple of days.

Hiho

Unfortunately i’m not an extremely adept Linux user, so my snapshots have been purged by fiddling around too much :slight_smile: but i just wanted to confirm that this is happening to my machine as well.

I have an older model Qnap which has been running software RAID with btrfs for 2 years now and after another maintenance/update cycle (with the same update as mentioned by OP opensuse-2017-847) my SW RAID is no longer automatically mounted at boot. I’ve tried fiddling with “mkinitrd -r md”, grub and some udev rules but as mentioned, I barely know what i’d be doing there… in short, something’s no longer detecting my RAID setup after installing this update.

Just here to confirm

cheers, hope you’ll find a solution.

In case anyone would want to have some logs, please state the appropriate commands

Good to know I’m not the only one having this problem. I’ve submitted bug report #1054616](1054616 – software RAID not initialized at boot after openSUSE-2017-847 patch applied) so hopefully someone more knowledgeable can find a fix. Thx.