I’m experimenting with LEAP 15.2 to find the optimal solution for a complete redundancy through the use of 2 disks and software raid. I’ve a fully redundant layout in my LEAP 42.3 installation and I’m trying to reproduce the same layout in 15.2. Theory says two approach are possible: efi on a raid1 partition or two efi partitions manually aligned each time it is necessary. In LEAP 42.3 only the first approach was successfully, the second one suffers of a bug that make it useless (https://bugzilla.suse.com/show_bug.cgi?id=1059169)
Unfortunately I discovered that LEAP 15.2 suffer the same problem I’ve had with LEAP 42.3: when the partition mounted on /boot/efi is missing, due to a failed disk, the bios can start the boot using the second efi partition (which I called /boot/efi2) but the system (dracut I suppose) is not able to complete the boot and for some reason complaints about the missing /boot/efi even if it is no more needed to complete the boot.
This is the disk layout:
when I simulate a failure on disk2 everything is working fine, when I simulate a failure on disk1 dracut drops me into the emergency shell. Please note that both efi and efi2 mount point have the nofail option in fstab.
I have used a system where the “/etc/fstab” entry for “/boot/efi” gave the “noauto” option, so that “/boot/efi” is never mounted. And that works fine unless there is a grub update that requires writing to that file system.
Presumably, when I had it setup that way there was no attempt of “dracut” to mount “/boot/efi”.
The issue is the that the grub point to vmlinuz and initrd that are on the missing drive and the one in /boot on the good drive also points to the missing drive.
The cure if needed is to boot a recovery disk of OpenSUSE and mount the drive on /mnt and mount /dev /proc /sys and /run onto /mnt - chroot to /mnt and run mkinitrd to point it to the existing drive.
LVM for mirroring has the same issue. I’ve had to recover many LVM systems where the boot drive has failed. Doesn’t hurt to have supported 1000’s of SLES machines. You learn a lot when a main system is down and you have to get it back up. Luckily I had some test systems to practice on and prove the recovery. You have to undo it when you replace and rebuild the mirror - it now points to the alternate drive - not the boot drive.
Yes, dracut adds /boot/efi device as hard requirment in hostonly mode.
Please note that both efi and efi2 mount point have the nofail option in fstab.
Timeout happens inside of dracut own event loop which is completely unrelated to /etc/fstab content.
Any ideas on how can i workaround this issue?
Disable hostonly mode, creating host independent initrd. This will results in much larger size and may uncover different bugs. You may also try --no-hostonly-default-device option (or equivalent conf.d setting). This will probably have less impact.
Did you try to actually read and understand what was written?
LLR1:/boot/efi/EFI/opensuse # cat grub.cfg
search --fs-uuid --set=root 75837315-86e5-4e19-af88-621f330d2be5
Yes, it has a hard coded UUID - that is where it fails when the primary drive is missing. - That in itself gets you the “emergency menu”.
You cannot have a mirror set with it different in each mirror.
This was the one flaw in Veritos (What LVM is based on). As far as I know that was never really solved.
I guess you could also change via recovery disk all the UUID’s of the secondary drive to be those of the primary drive and get it to work also. Then the missing drive is the secondary drive. That would require a file with the blkid’s save where you could access them - I know that info is somewhere in /etc/lvm - I don’t use LVM anymore so I cannot access that info.
Thank you very much for your help! Please note that in my layout /boot and /boot/efi are two different filesystem! /boot is on a soft raid1 so it is always available. Only /boot/efi and /boot/efi2 are two separated vfat partitions. The problem is not in the grub pointing to the boot partition, but dracut that absolutely need /boot/efi even if it is no more necessary.
Thank you very much. When I was dealing with the same bug in LEAP 42.3, disabling hostonly mode did not worked. With hostonly mode disabled the system cannot complete the boot even with without any failed disk (https://bugzilla.suse.com/show_bug.cgi?id=1059169). I can try again the same approach to see what happens with LEAP 15.2. I’ll let you know of course.