Curiosity: Improper raid config, unexpected boot results

As a result of an improper raid configuration, I have a boot-challenged system.
The root directory is on a software-raid0 volume.
I suspect that the core of the issue is that /boot is a directory, instead of a mount from a non-raid partition.

After installation, the following two additional uefi boot entries were present.


Boot0000* opensuse-secureboot HD(1,GPT,7667db86-1370-4fa7-ba7a-9177ac4bf7f3,0x800,0x80000)/File(\EFI\opensuse\shim.efi)
Boot0003* UEFI: ADATA SX900 PciRoot(0x0)/Pci(0x1f,0x2)/Sata(4,65535,0)/HD(1,GPT,7667db86-1370-4fa7-ba7a-9177ac4bf7f3,0x800,0x80000)AMBO

The first, seemingly created by efibootmgr, is not able to boot the system.
The second (created by the firmware?), DOES boot the system.
Any insight to why one works but not the other, is appreciated.

Tumbleweed
Dell optiplex 3010


260MiB /dev/sda1 [FAT]    /boot/efi
50GiB  /dev/md0  [Btrfs]  /
08GiB  /dev/md1  [swap]   [swap]

Normally, those should be equivalent.

Can you post the output from:


ls -l /boot/efi/EFI/Boot
ls -l /boot/efi/EFI/opensuse


cyril@kodi12:~> ls -l /boot/efi/EFI/boot/
total 1536
-rwxr-xr-x 1 root root 1208968 Jun  5 18:42 bootx64.efi
-rwxr-xr-x 1 root root  358768 Jun  5 18:42 fallback.efi
cyril@kodi12:~> ls -l /boot/efi/EFI/opensuse/
total 3644
-rwxr-xr-x 1 root root      58 Jun  5 18:42 boot.csv
-rwxr-xr-x 1 root root   63728 Jun  1 13:15 fwupdx64.efi
-rwxr-xr-x 1 root root   10462 Jun  5 20:04 grub.cfg
-rwxr-xr-x 1 root root 1073504 Jun  5 18:42 grub.efi
-rwxr-xr-x 1 root root  197632 Jun  5 18:42 grubx64.efi
-rwxr-xr-x 1 root root 1158688 Jun  5 18:42 MokManager.efi
-rwxr-xr-x 1 root root 1208968 Jun  5 18:42 shim.efi

Thanks for providing that info.

This makes it all the more puzzling that the two boot methods behave differently.

When you use “Boot0000”, that runs “shim.efi” in the “opensuse” directory in your EFI partition.

When you use “Boot0003”, that runs “bootx64.efi” in the “boot” directory. But normally, “bootx64.efi” is the same as “shim.efi” – a copy of the same file, but with a different name. And that looks to be the case here.

The one difference – the “bootx64.efi” should find that “fallback.efi” and run that. The purpose of “fallback.efi” is to add back entry “Boot0000” if it is missing, and then to run it.

Are you using secure-boot (is secure-boot enabled in your firmware)? (If you are not sure, the output of command “bootctl” should tell you).

nriclet, I appreciate your help.

Things are not as I originally described. I am not consistently dropped to a grub prompt when the ‘opensuse-secureboot’ option is chosen. There are times when all of the boot entries are capable of booting the system.
The things that seems consistent at this point:

  1. each kernel/bootloader update is what breaks the boot process, resulting in a grub prompt upon reboot <- still needs verification.
  2. I am able to boot from the bios-generated entry (DELL boot-menu)

Once at the grub prompt, the ‘ls’ command does list the root drive (md/system)
I thought that I was able to reproduce the issue consistently, but not quite yet, so I have some other tests ahead of me.

The one difference – the “bootx64.efi” should find that “fallback.efi” and run that. The purpose of “fallback.efi” is to add back entry “Boot0000” if it is missing, and then to run it.

This is interesting. Running bootx64.efi automatically runs fallback.efi, which will regenerate a missing boot0000 ?
I am suddenly wondering if the entries that I originally posted were AFTER I had run the Boot0003 entry.
Further, this could account for the inconsistently working entry.

Homework for me:

  • Work out how to >>reliably<< break (& fix) the boot process.
  • capture ‘bootctl’ & ‘efibootmgr -v’ more frequently, to attempt to capture the “broken” state

cyril@kodi12:~> bootctl
systemd-boot not installed in ESP.
System:
     Firmware: n/a (n/a)
  Secure Boot: enabled
   Setup Mode: user

Current Boot Loader:
      Product: n/a
     Features: ✗ Boot counting
               ✗ Menu timeout control
               ✗ One-shot menu timeout control
               ✗ Default entry control
               ✗ One-shot entry control
          ESP: n/a
         File: └─n/a

Available Boot Loaders on ESP:
          ESP: /boot/efi (/dev/disk/by-partuuid/7667db86-1370-4fa7-ba7a-9177ac4bf7f3)
         File: └─/EFI/BOOT/bootx64.efi

Boot Loaders Listed in EFI Variables:
        Title: opensuse-secureboot
           ID: 0x0000
       Status: active, boot-order
    Partition: /dev/disk/by-partuuid/7667db86-1370-4fa7-ba7a-9177ac4bf7f3
         File: └─/EFI/opensuse/shim.efi

        Title: UEFI: openSUS3
           ID: 0x0001
       Status: active, boot-order
    Partition: /dev/disk/by-partuuid/7667db86-1370-4fa7-ba7a-9177ac4bf7f3
         File: └─/EFI/opensuse/shim.efi

Boot Loader Entries:
        $BOOT: /boot/efi (/dev/disk/by-partuuid/7667db86-1370-4fa7-ba7a-9177ac4bf7f3)

snippet of blkid:
/dev/sdc1: SEC_TYPE="msdos" UUID="6C08-1833" TYPE="vfat" PARTUUID="7667db86-1370-4fa7-ba7a-9177ac4bf7f3"

Separately, this all suggests that the having /boot as a subdirectory of a raid0 system is not an issue at all.
Firmware loads a *.efi@ESP file which loads grub, which is easily capable of booting mdadm volumes.

Am I missing something? is there other wisdom in keeping /boot on a non-raid device?

-Cyril

It is looking increasingly as if this is a firmware (BIOS) problem. It looks as if your BIOS is inconsistent on whether it can access the RAID volume.

It is actually possible to copy the kernel, the “initrd” and the “grub.cfg” into your EFI partition – maybe make a new directory for that. And then modify (edit) “/boot/efi/EFI/opensuse/grub.cfg” so that it sources your copied “grub.cfg” rather than the original. The downside is that you will have to repeat this after every kernel update. But it might be an interesting experiment.

I was under the impression that the firmware only needs to access the EFI partition, which has the code to boot the raid volume.
Where in this process is the firmware accessing the RAID volume?

If this is true, then moving the boot contents to their own non-raid partition should solve the issue, no?

example of what should work:


cyril@kodi12:~> lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb         8:16   0  59.6G  0 disk  
├─sdb1      8:17   0   256M  0 part  /boot
├─sdb2      8:18   0     2G  0 part  
│ └─md0     9:0    0     4G  0 raid0 [SWAP]
├─sdb3      8:19   0    25G  0 part  
│ └─md127   9:127  0    50G  0 raid0 /home
└─sdb4      8:20   0  32.4G  0 part  
  └─md126   9:126  0  64.8G  0 raid0 /storage/local
sdc         8:32   0  59.6G  0 disk  
├─sdc1      8:33   0   256M  0 part  /boot/efi
├─sdc2      8:34   0     2G  0 part  
│ └─md0     9:0    0     4G  0 raid0 [SWAP]
├─sdc3      8:35   0    25G  0 part  
│ └─md127   9:127  0    50G  0 raid0 /home
└─sdc4      8:36   0  32.4G  0 part  
  └─md126   9:126  0  64.8G  0 raid0 /storage/local

The firmware directly accesses the EFI partition to load the boot code – in your case, grub.

But then grub has to read other parts of the disk. And it does this by making requests to the firmware. And that’s where things are going wrong.

No, the EFI partition does not have code to boot the raid volume. That code is in the kernels and “initrd”, and those are usually not in the EFI partition. If you can put everything needed into the EFI partition, then it should work.

If this is true, then moving the boot contents to their own non-raid partition should solve the issue, no?

Yes, that should also work.

I appreciate all of your insight. I have changed several things about the installation in an effort to provide the firmware a more-standard path to the kernel (/boot on non-raid partition, /boot partition on same disk as ESP partition), and so far, bootloader updates haven’t jacked the boot process.

side note:
Is there some documentation to let me know the minimum partition sizes for the expert-partitioner? Aside from being told that EFI and /boot partitions were too small, there was no guidance to at least tell me the minimum acceptable values. It’s the expert-partitioner, sure, I’m supposed to know what I’m doing, but it still seems that the hard limits should be mentioned.

Thanks again nricket.
-Cyril

About 300 meg is now recommended. But it is just a warning and in most cases much smaller partitions will work fine if it already exists. The absolute minimum is defined on how many OS’s you will have installed at once and the geometry of the drive. ie sector size for a 4k sector min FAT partition is 260 MiB for 512 sector .32 MB…

https://www.ctrl.blog/entry/esp-size-guide.html

The partitioner wants at least 256M for the EFI partition. I’m not sure what it wants for “/boot”.

My current practice is 500M for the EFI partition and 500M for “/boot”. That’s more than needed.

At one time, I used 100M for “/boot”. But that’s a bit tight with the current sizes of kernel and initrd. I would guess that 200M is probably sufficient in most cases. But disk space is cheap today, so why not go with 500M.