Broken grub on installing the 2nd Leap on the same machine

My laptop has two drives. I’ve installed Leap 15.0 on one drive and it’s running well (calling it Leap A on drive A for now). And I intend to install another Leap 15.0 on the other drive (calling it Leap B on drive B), but last two attempts of Leap B installation both ended up in broken grub state.

The machine has UEFI secure boot enabled.

For the first attempt, the installation of Leap B wanted to mount its /boot/EFI on drive A and I didn’t change its proposal although I had liked them to be on separated drive. It so made the drive B a LVM encrypted partition and its /boot/EFI was on drive A. But upon booting for the first time, after I input the luks password, it sent me to grub rescue.

For the 2nd attempt, I manually created its partition all on drive B without touching drive A,
800MB as FAT for /boot/EFI
800MB as ext2 for /boot
the remaining 200+ GB as LVM with logical volumes of /, /home, /swap
So this time, the installation didn’t touch drive A at all.
But upon first boot, it led me directly to grub rescue without even prompting for the luks password.

I have tried to boot into Leap A (using BIOS boot order option) and ran the Yast-bootloader tool to see if I can let it probe for Leap B and make an entry for Leap B in A’s grub. Unfortunately, it could not detect the existence of Leap B on drive B.

I think I should be able to fix Leap B’s grub with Leap A since I can read the /boot/EFI and /boot content on drive B, but I really can’t understand the content of those files.

Firstly: When you mention “/boot/EFI” do you really mean “/boot/efi”? Linux is case-sensitive.

Secondly, you can probably only have one UEFI bootable “opensuse”. Well, it is possible to have more than one, but tricky. So assume that you can only have one.

Thirdly: Your best bet is to use the boot menu from drive A.

Here’s how to do that:

1: Boot into that opensuse system.
2: Unlock the encrypted LVM. You can do that at a root command line with a command something like

cryptsetup luksOpen /dev/sdb2 cr_sdb2

Change “/dev/sdb2” there to the correct device for your encrypted partition.

After doing that, run (again, as root)

grub2-mkconfig -o /boot/grub2/grub.cfg

That should add your drive B system to the grub menu. However, you will have to repeat this whenever the system automatically updates the grub menu.

There is an alternative method, whereby you can manually add an entry to the menu. Here’s what I am doing:


### Entry to boot TW3
menuentry "configfile for TW3"  {
        set btrfs_relative_path="yes"
        cryptomount -u 7428e7b830da407ab4ec6b53ac372022
        search --fs-uuid --set=bootdir 0b34f9bd-2d71-408c-a104-617efe2ad70f
        configfile (${bootdir})/boot/grub2/grub.cfg
}

I have added that to the end of “/etc/grub.d/40_custom”. That would be on the system on your drive A.

I don’t know whether that “btrfs_relative_path” line is needed. I’m currently not using “btrfs” so it is not really needed for me. But it might be needed for you.

Then you need to get the UUID of the encrypted LVM partition, and put that in the “cryptomount” line (to replace the string that I have there). And note that the “-” chars are removed from that UUID. You also need the UUID of the root file system (probably “/dev/system/root” ) to replace the string that I have used in that “search” line. You should be able to get both of those from what is in the “grub.cfg” on your drive B system.

To get the UUID without looking at that grub.cfg, use:


blkid /dev/sdb2
blkid /dev/system/root

Again, change “/dev/sdb2” to whatever is needed. And the second “blkid” should be after that “cryptesetup” mentioned above.

And another possible problem. If you also use an encrypted LVM for the system on disk A, then there will probably be a name conflict and my suggestions above might not work. If that’s your situation, you probably need to boot from rescue media to do part of this.

This is required if you intend to use snapper rollback and it must match /etc/default/grub (SUSE_BTRFS_SNAPSHOT_BOOTING) used to generate grub.cfg (as you load grub.cfg from different installation, it is not necessary the same).

Thanks a lot. It seems I have some progress with your suggestion.

Sorry, I should have written /boot/efi.

Secondly, you can probably only have one UEFI bootable “opensuse”. Well, it is possible to have more than one, but tricky. So assume that you can only have one.

Is that because of the UEFI secure boot key management thing?

Thirdly: Your best bet is to use the boot menu from drive A.

Here’s how to do that:

1: Boot into that opensuse system.
2: Unlock the encrypted LVM. You can do that at a root command line with a command something like

cryptsetup luksOpen /dev/sdb2 cr_sdb2

Change “/dev/sdb2” there to the correct device for your encrypted partition.

After doing that, run (again, as root)

grub2-mkconfig -o /boot/grub2/grub.cfg

That should add your drive B system to the grub menu. However, you will have to repeat this whenever the system automatically updates the grub menu.

So I didn’t try this method due to its limitation.

I have followed this advice. After adding the 40_custom file to Leap A (without using a rescue media but in Leap A environment), I reboot. Then I went to YAST boot loader tool in Leap A and ran it to update the grub. I then reboot again. Yes my Leap A also uses a standard encrypted lvm system but I have not tried to use a resuce media to do it yet.
Here’s the result: it successfully added the entry Leap B. However after I enter Leap B and provided the luks password for the partition on drive B, I was back to the same grub again (grub of Leap A I believe), except that if I try to enter Leap B again, it would not ask for luks password but send me to the same grub, which feels like a loop.
So basically after providing the luks password for the partition on drive B for the first time, it seems I am in a grub loop if I try to enter Leap B (I can’t be sure it’s a loop but grub only blinks for a second here). I am not sure what you meant by name conflict. BTW, I have already changed the name of Leap A by changing the line GRUB_DISTRIBUTOR=Leap_A in /etc/default/grub.
Is that I need to use a rescue meida to do it? But if it’s just adding a 40_custom file, I don’t know why it makes a difference by using a rescue media.

If you have changed “GRUB_DISTRIBUTOR=Leap_A” then you should have a boot entry named “leap_a” in your UEFI boot menu. And that should boot system A. You can possibly get to that menu by hitting F12 during boot, but it might be a different key.

If you are able to do that, then your best solution will be to use the “opensuse” boot entry for system B and the “leap_a” boot entry for system A.

Can you post the output from:


efibootmgr -v
cat /boot/efi/EFI/opensuse/grub.cfg
cat /boot/efi/EFI/leap_a/grub.cfg

The main reason that I mentioned using rescue media, was that if you are using an encrypted LVM for both systems, then you will probably have “/dev/system/root” (as one example) for both systems. And that may cause a conflict when you attempt to open the second LVM on the first system.

Thanks, though I’m not quite sure what “match” means here.

On my own box, I originally installed Tumbleweed with “btrfs”, and looked at the generated “/boot/efi/EFI/opensuse/grub.cfg” to see how to use “configfile” to access that system from another grub menu.

On that system, I see: SUSE_BTRFS_SNAPSHOT_BOOTING=“true”
(that’s in “/etc/default/grub”. And then I use: set btrfs_relative_path=“yes”
as shown in earlier reply. Perhaps by “match” you only mean that I need both of those for it to work with snapshots.

I’ve been doing the fix all the time in system A, which is running fine. It’s just B that can’t boot. Also the pysical luks partition wasn’t named “system” but “Bluks” just to distinguish the two when I manually created partitions for Leap B.


**cat /boot/efi/EFI/leapA/grub.cfg **
set btrfs_relative_path="yes"
search --fs-uuid --set=root *uuid of drive A-/dev/sda2*
set prefix=(${root})/grub2
source "${prefix}/grub.cfg"
**cat /boot/efi/EFI/opensuse/grub.cfg **
set btrfs_relative_path="yes"
cryptomount -u *uuid-1*
search --fs-uuid --set=root *uuid-2*
set prefix=(${root})/boot/grub2
source "${prefix}/grub.cfg"
**cat /etc/grub.d/40_custom **
#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
### Entry to boot leap B
menuentry "leap B"  {
        set btrfs_relative_path="yes"
        cryptomount -u *uuid of dirve B part 3 that is a luks pysical partition*
        search --fs-uuid --set=bootdir *uuid-3*
        configfile (${bootdir})/boot/grub2/grub.cfg
}

The following is the grub that’s on drive B part 2 which was created during the installation of Leap B mounted as /boot. I now mount it manually to see its content.

**cat /mnt/tmp/EFI/opensuse/grub.cfg **
set btrfs_relative_path="yes"
search --fs-uuid --set=root *uuid of drive B part 2 which was mounted as /boot for leap B during its installation*
set prefix=(${root})/grub2
source "${prefix}/grub.cfg"

uuid-1,2,3 are all unique uuids that don’t appear in blkid command ( I haven’t luks open drive B part 3 when doing blkid command but only mounted drive B part2 that was supposed to mount as /boot for Leap B)

Okay, that help. It avoids name conflicts.

But now I’m confused over your setup.

**cat /boot/efi/EFI/opensuse/grub.cfg **
set btrfs_relative_path="yes"
cryptomount -u *uuid-1*
search --fs-uuid --set=root *uuid-2*
set prefix=(${root})/boot/grub2
source "${prefix}/grub.cfg"

This seems to be from when you first installed with the EFI partition on diskA used to boot diskB.

According to this, diskB has “/boot” as part of the root file system, and not as a separate partition. And that is the recommended setup if using “btrfs”.

The following is the grub that’s on drive B part 2 which was created during the installation of Leap B mounted as /boot. I now mount it manually to see its content.

**cat /mnt/tmp/EFI/opensuse/grub.cfg **
set btrfs_relative_path="yes"
search --fs-uuid --set=root *uuid of drive B part 2 which was mounted as /boot for leap B during its installation*
set prefix=(${root})/grub2
source "${prefix}/grub.cfg"

According to this, you now have a separate “/boot” partition, presumably unencrypted. Is that your current setup?

That’s actually easier for booting. But, if you are using “btrfs”, then that setup won’t allow you to boot to a read-only snapshot, so you don’t get the full benefit of “btrfs”.

uuid-1,2,3 are all unique uuids that don’t appear in blkid command

Yes, those won’t show up unless you have unlocked the encrypted luks partition.

I’ll wait for further clarification on your current setup.

Huh? If you have /boot as separate partition, of course you need to look for this partition and read grub.cfg from it.

To clarify, I may repeat what I did in OP.

I tried installing Leap B twice. Both times it uses /boot/efi, /boot and an lvm physical partition (encrypted) setup, similar to my leap A setup. All installation use btrfs for /root logical volume.

First time it defaulted to install /boot/efi and /boot in drive A where Leap A was installed. The current /boot/efi/EFI/opensuse might be the left over of the first installation. But the first attempt failed as it led me to grub rescue after providing the luks password.

So the second time I tried not to touch drive A and made the installation in drive B only. It however led me straight to grub rescue without any password prompt.

I just tried to modify the 40_custom line for the boot uuid to direct it to the /boot partition in drive B but the result is the same (after using yast-bootloader to regenerate grub and reboot). Upon choosing Leap B in grub, it prompts for the luks password, but then the same grub appears, except that this time entering leap B again will not prompt for luks password, but give you the same static grub page (like in a loop).

Here’s what should work (I think) in 40_custom:


set btrfs_relative_path="yes"
search --fs-uuid --set=root *uuid of drive B part 2 which was mounted as /boot for leap B during its installation*
set bootdir=(${root})/grub2
configfile "${bootdir}/grub.cfg" 

I actually took this directly from what you posted as “/mnt/tmp/EFI/opensuse/grub.cfg”, except:

  • I changed “prefix” to “bootdir” because “$prefix” has a special meaning to grub;
  • I changed “source” to “configfile”

You can do just that copying and changing yourself, and it should get the right uuid there.

Can you post the output from:

efibootmgr -v

It’s working! It’s a bit strange but I actually like the result. Now leap B has its entry in the grub of leap A. When I choose to boot Leap B, I am welcomed by grub of Leap B. So I need to get through 2 grub pages to get to boot Leap B.

Leap B will rarely be used so I’m totally fine with it. I just updated both leap A and leap B’s grub (using Yast bootloader) and reboot them. Now it’s OK if the machine boots from drive A, but if the machine boots from drive B it’ll get to grub rescue page. So in bios I disabled drive B as a boot option.

I just wonder if everything will not break and btrfs snapper can all work fine?

If not I’m also fine to reinstall leap B since I have not set the working environment up much yet.

Great. I’m glad to hear that.

Now leap B has its entry in the grub of leap A. When I choose to boot Leap B, I am welcomed by grub of Leap B. So I need to get through 2 grub pages to get to boot Leap B.

Yes, but this can be an advantage.

If your Leap A has an entry to boot Leap B directly, without this extra menu page, then that can be out of date. When there is a kernel update in Leap B, then that direct entry will be wrong until you rebuild the boot menu on Leap A. But with this two page way of doing it, you always get the updated boot entry for Leap B.

I just wonder if everything will not break and btrfs snapper can all work fine?

I think the problem you might have there, is that btrfs snapper and rollback does not work well when you have a separate “/boot” partition. And that’s because the grub boot menu is not part of the snapshot. The double layer of grub menu won’t itself cause any problems.

It seems having /boot partition in the lvm encrypted partition will cause one to submit the luks password twice to boot the system. No? My leap A also has /boot in a separated partition, and I did use snapper quite a few times when installing the nvidia driver, snapper did the rolling back well for me. Do you mean if one wants to roll back some content in /boot it would not work?

Do you have suggestion on a better scheme if I want to re-install Leap B?

Thanks again.

Normally, yes. And that’s what I do on the system I have setup that way.

However, there is an alternative if you don’t mind putting the encryption key in a file and in the “initrd”. I have a VM setup that way. I have to give the encryption key for grub, but the rest of the boot uses the key from the file and from the initrd.

Here’s a recent post by somebody using that method:

*(I think I add root key to wrong partition when trying to avoid typing password twice. - Install/Boot/Login - openSUSE Forums)

My leap A also has /boot in a separated partition, and I did use snapper quite a few times when installing the nvidia driver, snapper did the rolling back well for me. Do you mean if one wants to roll back some content in /boot it would not work?

Okay, so I guess rollback works the way you are doing it.

As I understand it, the problem is that you cannot boot to a read-only snapshot. So if your system fails to boot, you cannot boot an older snapshot and then rollback to that.

Note, however, that I’m not using “btrfs”, so this is outside my experience. I did briefly use it, and I did boot to a snapshot (just for testing). But I never tried a rollback.*

The problem is that content of /boot no more matches content of root. It means that if you revert to snapshot where your current kernel did not exist you will still boot into this kernel (it is present in /boot) but drivers for this kernel will be missing (they are locate in /lib/modules). So kernel will have rather limited functionality.

When /boot is part of /, reverting to previous snapshot also reverts content of /boot (and /boot/grub2/grub.cfg) so you boot into kernels available in this snapshot.

There is no automatic process to clean up “stale” kernels in /boot, although it is certainly possible to do manually once you are aware of it.

P.S. sorry, I meant to answer your question earlier but it turned out more involved and I could not find time :slight_smile: I’m still hoping to do it.

Could you give a more detailed example like how it will fail because of this snapshot roll back failure? What are some pre-cautions I need or how do I fix it if it happens? I’ll take note, I don’t and don’t plan to use snapshot roll back feature much though. Just in case a system failure I hope to get to the system that’s all.

btrfs_relative_path changes how grub interprets pathnames on btrfs. Normally they are assumed to be absolute pathnames (i.e. relative to root). If btrfs_relative_path is set, grub interprets then as relative to the default subvolume. If pathnames used in grub configuration file (grub.cfg in the first place) are created assuming one value of btrfs_relative_path but during execution grub is using different value, it either does not find files or finds wrong files.

Normally user space tools (grub-install, grub-mkconfig) read value of SUSE_BTRFS_SNAPSHOT_BOOTING in /etc/default/grub; if it is “true”, they both generate paths relative to default subvlume and make sure btrfs_relative_path is set during execution.

Now when you read grub configuration created under different system value of SUSE_BTRFS_SNAPSHOT_BOOTING in this system may be different; that what I meant under “match”.

After looking at it once more - openSUSE installation always sets SUSE_BTRFS_SNAPSHOT_BOOTING=true, even if snapshots were explicitly disabled during installation. So it should be relatively safe to always set btrfs_relative_path either.

You installed kernel version 1.2.3. Now you have /boot/vmlinuz-1.2.3 and /boot/initrd-1.2.3 and /boot/grub2/grub.cfg that contains reference to /boot/vmlinuz-1.2.3 as default kernel. You also have /lib/modules/1.2.3 directory that contains most of kernel modules (drivers) for this kernel.

Now you revert to snapshot created before this kernel was installed. You still have /boot/vmlinuz-1.2.3 and reference to it in grub.cfg (because /boot is not part of snapshot) but you no more have /lib/modules/1.2.3 directory. So you will be able to load this kernel, probably mount root (drivers for it are in initrd) but that basically all - it is very unlikely you will be able to do anything useful (even drivers for your network won’t be present). In the worst case if snapshot is sufficiently old none of kernels in /boot will have matching /lib/modules sub-directory and you won’t be able to boot at all.

Thank you. I see. So I guess I’ll try keeping a few more kernels just in case following this guide: SDB:Keep multiple kernel versions - openSUSE Wiki

Say if one day I need to roll back with snapper (that’s to use an older kernel), I’ll be able to choose to boot to an older kernel in grub right? When you say I won’t be able to boot, you didn’t mean grub wouldn’t appear either but that I just wouldn’t be able to boot the latest kernel, right?