Can't boot from Raid backup disk

This gets complicated. Basically I have two disks, running on different IDE controllers for extra isolation, that are supposed to mirror each other. The goal is to have to system boot up from either disk should the other one fail. I spent days figuring out how grub2 can boot directly off and Raid/Lvm partition, and how UEFI booting works. With all of this new fancy stuff, booting from the second drive should be a snap. Except that it doesn’t work :frowning:

So, each drive has

  1. an EFI Boot partition formated as FAT32
  2. an emergency maintenance partition with an Open Suse Linix system that I can boot up when I want to do tricky things to the Raid/Lvm system
  3. a swap partition
  4. a large Raid partion

The two disk Raid partitions are used to build a Raid1 array. That resulting /dev/md0 partions is allocated as an LVM volume group. And from that Volume Group I allocate a root volume and a home volume where the main OpenSUSE system is to be installed.

Astonishingly, after days and days of sweat and research, it all works. Linux is installed in the LVM partion with the EFI partion mounted as /boot/efi and everything boots up just fine.

So now, I figured there is no boot sector to worry about with EFI boots, all I have to do is use dd to copy sda1 and sda2 to sdb1 and sdb2 and the system should be able to boot up nicely from the second disk when the first one fails. Except it doesn’t work. The system boots up just fine when I take disk2 offline. The Raid device goes into degraded mode as expected. When it bring it up with both drives, I have to re-add the missing partition the the Raid array and everything takes off swimingly.

But when I disable the first drive and try to boot from the second, it gives me the infamous no system found message. The BIOS appears to know about EFI partition, it identfies the two different boot directories I set up to boot the maintenance and main systems. But it doesn’t seem to find a functioning Grub2 installation.

So what silly obvious thing am I missing here???

Either you have BIOS or you have EFI. Please do not make it even more confusing :slight_smile:

To boot from second ESP you need to add it to a list of boot entries. From your description it sounds like firmware fails to find normal boot entry, which refers to the first disk, and continues with boot list. I presume, by default you have something like “try legacy boot from HDD” there which eventually kicks in.

Please post output of “efibootmgr -v” as well as “ls -lR /path/to/ESP/on/second/disk”

all I have to do is use dd to copy sda1 and sda2 to sdb1 and sdb2

That’s really bad idea. It creates filesystems with duplicate UUID which will confuse all - firmware, grub and kernel. You can copy content of sda1 to sdb1 by mounting both and using any usual tools like “cp -a”, “find | cpio -p” or whatever. But do not duplicate filesystem block by block. I would recreate filesystems on sdb now to ensure UUIDs are unique and then copy content from sda.

corbin-goul:/home/ken # efibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0002,0005
Boot0000* opensuse HD(1,800,100000,7c35010c-3aac-4223-b0cd-6e524dcce8c2)File(\EFI\opensuse\grubx64.efi)
Boot0002* maintenance HD(1,800,100000,7c35010c-3aac-4223-b0cd-6e524dcce8c2)File(\EFI\maintenance\grubx64.efi)
Boot0005* UEFI: Hard Drive ACPI(a0341d0,0)PCI(1c,3)PCI(0,0)PCI(0,0)Vendor(cf31fac5-c24e-11d2-85f3-00a0c93ec93b,80)HD(1,800,100000,7d5e31f1-2de5-4cb2-ab17-3dd79b82a9fb)AMBO

corbin-goul:/home/ken # ls -lR /boot/efi
/boot/efi:
total 8
drwxrwxr-x 4 root root 8192 Nov 27 18:18 EFI

/boot/efi/EFI:
total 16
drwxrwxr-x 2 root root 8192 Nov 28 00:25 maintenance
drwxrwxr-x 2 root root 8192 Nov 27 17:27 opensuse

/boot/efi/EFI/maintenance:
total 120
-rwxrwxr-x 1 root root 121344 Nov 28 00:42 grubx64.efi

/boot/efi/EFI/opensuse:
total 144
-rwxrwxr-x 1 root root 147456 Nov 28 10:10 grubx64.efi

That was the (probably misguided) idea. I thought that if the two drives looked identical, UUID’s and all, it wouldn’t matter which one we were actually booting from. I’ve gone back and reformated both partions and done normal file system copies. Does not seem to make any difference. Still cannot boot from the second drive when the first drive is offline.

Anything else I can try?

Please use tags “code” when posting any computer output.

is “maintenance” supposed to be your second boot disk? Then it is obviously wrong; you need boot entry that refers to second disk, not to another directory on the same disk which will be missing.

Boot0005* UEFI: Hard Drive      ACPI(a0341d0,0)PCI(1c,3)PCI(0,0)PCI(0,0)Vendor(cf31fac5-c24e-11d2-85f3-00a0c93ec93b,80)HD(1,800,100000,7d5e31f1-2de5-4cb2-ab17-3dd79b82a9fb)AMBO

Yes, this smells like legacy BIOS boot.

I’ve gone back and reformated both partions and done normal file system copies. Does not seem to make any difference. Still cannot boot from the second drive when the first drive is offline.

Anything else I can try?

You do not have any boot entry that refers to second drive. Assuming /dev/sda is the first one and /dev/sdb - the second, you need something like

efibootmgr -c -d /dev/sdb -p 1 -l '\EFI\opensuse\grubx64.efi' -w -L "maintenance"

Adjust to match your disk names and partition numbers.