How do I boot into a degraded LUKS-encrypted Btrfs RAID 1?

tl;dr: I have set up a LUKS-encrypted Btrfs RAID 1 (in a VM), but can’t get it to boot when I detach one (virtual) hard drive. How do I do that?

More details:

One of my PCs requires redundancy, so I decided to dive into Btrfs and its RAID 1 implementation (in a VM for now). I’m quite new to OpenSUSE and a self-taught Linux user - but generally I manage to solve most problems I come across with some GoogleFu and lots of patience. However, booting into an encrypted Btrfs RAID 1 system with one (of two) hard drives missing is proving to be too tough a nut for me to crack (and I have spent a looooot of time on this, searching this forum, Reddit, search engines, etc.).

Coming from Debian, I must say I love the OpenSUSE installer (as well as many other things). Using the expert partitioner during installation allowed me to set up a Btrfs RAID1 quite easily:

  • sda1 - 0.50GiB - not encrypted - EFI - mount point: /boot/efi
  • sda2 - 63.50GiB - encrypted - unformatted - unmounted (this becomes the RAID)
  • sdb1 - 0.50GiB - not encrypted - EFI - umounted (this later becomes an EFI clone)
  • sdb2 - 63.50GiB - encrypted - unformatted - unmounted (this becomes the RAID)
  • to create the Btrfs RAID:
    – choose Btrfs on the left
    – “Add Btrfs…”
    – RAID Level “RAID1”
    – RAID Level for Metadata “RAID1”
    – add sda2 and sdb2
    – Mount Point /
    – Enable Snapshots

So far so good, the installer finishes, the system boots, encryption works, Btrfs seems to be a RAID1, perfect!

Now comes the tough part though: simulating a drive failure. I (virtually) detached sdb. The system starts, asks for the LUKS password for sda2, then the grub screen, after choosing the first option I briefly get:

 Booting `openSUSE Tumbleweed'

error: ../../grub-core/disk/cryptodisk.c:1519:no such cryptodisk found, perhaps a
needed disk or cryptodisk module is not loaded.
Loading Linux 6.8.6-1-default ...
Loading initial ramdisk ...

Press any key to continue...

The machine then works silently for a while, after which I get:

Warning: crypto LUKS UUID 9902[...] not found

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

Give root password for maintenance
(or press Control-D to continue):

When I press Control-D, I get:

[ 902.527613] dracut-initqueue[422]: Warning: Not all disks have been found.
[ 902.527689] dracut-initqueue[422]: Warning: You might want to regenerate your initramfs.

Then nothing else happens.

So after a restart, I typed in the admin password to get to a basic shell, where I tried to mount the RAID1 as degraded using this command I found all over the interwebs (I adapted the filepath to cr_root):
mount -o degraded /dev/mapper/cr_root /mnt

Unfortunately, I get:
mount: /mnt: mount point does not exist

So I created that directory (did I mention I’m feeeeeling my way forward here…):
mkdir /mnt

… and mounted again, this time without error. I took this as success and exited the shell. Unfortunately, I still get the same Warning: Not all disks have been found. as above.

After a restart, I tried a different approach and (after typing in the LUKS password) I pressed “e” in grub, then added degraded to the line that looked appropriate to me:
linux /boot/vmlinuz-6.8.6-1-default root=UUID=2586[...] ${extra_cmdline} splash=silent mitigations=auto quiet security=apparmor degraded

The system behaves a little differently (but with the same result): I have to type in the LUKS password a second time (this time into a text field in the middle of the screen) but ultimately, I end up with the same Warning: crypto LUKS UUID 9902[...] not found.

My next attempt was going to be to chroot into the system from a live system, but it’s hard to imagine that’s the only way to boot into a degraded Btrfs RAID. I’m just thinking there must be an easier way.

Can someone please help me?

Providing this file may be useful.

Yes, you are of course right. I thought the same while writing all this up. Sorry, should have done this immediately. But here it is now: https://paste.opensuse.org/pastes/575ea6d1bf2e

Edit: It looks like I can’t edit the original post? That’s a shame, I was going to add the link there. Or am I misunderstanding the interface here?

dracut knows nothing about btrfs RAID1 and simply has fixed requirement for the devices that are part of btrfs. Once you are past this, you get next problem - systemd will not attempt to mount filesystem until all devices are present. And only when you are past systemd does degraded option become relevant. But degraded option in the past resulted in broken RAID1 (data written in this state would be written in single profile instead if RAID1); I do not know whether it has been fixed. So degraded as default option was highly discouraged.

The problem of multidevice btrfs vs. systemd is really really old and it can be solved neither in btrfs nor in systemd with their current architecture. Apparently so far nobody was motivated enough to design and implement a solution. You may have more luck with MD RAID1 which should handle the missing piece a bit better.

Thank you, this is really helpful! It connects a few dots I stumbled upon but did not know how to put together.

From my understanding, the degraded option is really only a temporary measure until the damaged drive can be replaced. At that point one needs to rebalance and the RAID1 should continue functioning properly. I’m happy to be corrected about this if I’m wrong, of course.

So the question remains how to boot into an encrypted Btrfs RAID1 with one hard drive missing. Am I correct in assuming I need to chroot into the system from a live system and fix a few things? My GoogleFu and some AI are telling me that initramfs needs to be rebuilt and the GRUB configuration updated with the degraded flag?

That is a correct observation.

That is your personal opinion.

That is exactly what is to be avoided. @arvidjaar asking for something in the second post what is now in the first post. Would be confusing at the least (people still waiting for the third post with your answer, why should they go back and watch the first post) and making post #2 ridiculous at the worst.

Thanks for your explanations and thoughts. When I edit previous posts, I make clear that it’s an edit and in this case I would also make clear that it’s in response to a comment about missing information. I agree that when such clarifications are missing, things become unclear.

But it’s alright for me. Every forum, since the Forum Romanum, has it’s own rules and customs. I’m still new here, so I’m still learning the ones that apply here.

I have spent a good few hours experimenting with this now and would be thankful for some help, since I’m obviously very stuck. Here’s what I did:

  1. boot into live environment
  2. decrypt the one remaining decrypted drive: sudo cryptsetup luksOpen /dev/sda2 cr_root
  3. mount the Btrfs filesystem as degraded: sudo mount -o degraded /dev/mapper/cr_root /mnt
  4. mount additional folders:
  • sudo mount --bind /dev /mnt/dev
  • sudo mount --bind /proc /mnt/proc
  • sudo mount --bind /sys /mnt/sys
  • sudo mount --bind /run /mnt/run
  • sudo mount --bind /var /mnt/var
  1. chroot into the system: sudo chroot /mnt
  2. rebuild initramfs: `dracut -f --regenerate-all
  3. edit GRUB configuration by adding the degraded flag:
  • nano /etc/default/grub
  • add degraded to end of appropriate line, like so: `GRUB_CMDLINE_LINUX_DEFAULT=“ degraded”
  1. update GRUB configuration: grub2-mkconfig -o /boot/grub2/grub.cfg`

Here my luck ended. I received this error message (I hope I wrote it down correctly):
grub2-mkconfig /usr/sbin/grub-probe: error: failed to get canonical path of '`

I thought maybe I could now boot into grub and then append degraded via pressing “e”. But this results in the error message:
Initramfs unpacking failed: invalid magic at start of compressed archive

Then I found this article (SDB:BTRFS - openSUSE Wiki), followed it as best I can to mount the various Btrfs subvolumes. But in the end, I could not chroot into the system, and instead received:
chroot: failed to run command ‘/bin/bash’: No such file or directory

I obviously buggered up my system now. Could anybody help me along?

Not directly related, but it should be --rbind, not --bind.

Educated guess - grub2-probe collects devices that are part of btrfs and it fails to find missing device.

It is still not clear what you are trying to do. The subject says “boot into a degraded btrfs”, but in reality you attempt to change configuration in degraded state. It is much more ambitious.

If your intention is to permanently ignore missing device, then just remove this device from btrfs.

btrfs device remove ...

It is not

tw:~ # btrfs filesystem usage -T /mnt
Overall:
    Device size:		   2.00GiB
    Device allocated:		 761.50MiB
    Device unallocated:		   1.26GiB
    Device missing:		   1.00GiB
    Device slack:		     0.00B
    Used:			 202.75MiB
    Free (estimated):		   1.29GiB	(min: 917.87MiB)
    Free (statfs, df):		 406.50MiB
    Data ratio:			      1.23
    Metadata ratio:		      2.00
    Global reserve:		   5.50MiB	(used: 0.00B)
    Multiple profiles:		       yes	(data)

            Data      Data      Metadata  System                            
Id Path     single    RAID1     RAID1     RAID1    Unallocated Total   Slack
-- -------- --------- --------- --------- -------- ----------- ------- -----
 1 missing          - 102.38MiB 102.38MiB  8.00MiB   811.25MiB 1.00GiB     -
 2 /dev/sdb 336.00MiB 102.38MiB 102.38MiB  8.00MiB   475.25MiB 1.00GiB     -
-- -------- --------- --------- --------- -------- ----------- ------- -----
   Total    336.00MiB 102.38MiB 102.38MiB  8.00MiB     1.26GiB 2.00GiB 0.00B
   Used     198.00MiB   2.00MiB 368.00KiB 16.00KiB                          
tw:~ # 

The problem is, running for a prolonged time you create data with only a single copy and this data will not be automatically converted to RAID1 once you add missing device back. So you lose redundancy without any notification. And it is much more worse. If you now add this missing device back, btrfs silently incorporates it, but its content is out of sync with the surviving piece, leading to

[   95.202650] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 22052864 have 0
[   95.219280] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 22052864 (dev /dev/sdb sector 43072)
[   95.225950] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 22056960 (dev /dev/sdb sector 43080)
[   95.231284] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 22061056 (dev /dev/sdb sector 43088)
[   95.242598] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 22065152 (dev /dev/sdb sector 43096)
[   95.245943] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 31326208 have 0
[   95.259339] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 31326208 (dev /dev/sdb sector 61184)
[   95.265933] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 31330304 (dev /dev/sdb sector 61192)
[   95.270363] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 31334400 (dev /dev/sdb sector 61200)
[   95.282624] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 31338496 (dev /dev/sdb sector 61208)
[   95.288428] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 31342592 have 0
[   95.292599] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 31342592 (dev /dev/sdb sector 61216)
[   95.302751] [ T2350] BTRFS info (device sdc): read error corrected: ino 0 off 31346688 (dev /dev/sdb sector 61224)
[   95.319271] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 31260672 have 0
[   95.345957] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 31358976 have 0
[   95.375956] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 30932992 have 0
[   95.532649] [  T387] BTRFS error (device sdc): bad tree block start, mirror 1 want 31293440 have 0
tw:~ # 

Yes, btrfs managed to notice the mismatch, but it should not have allowed this device into filesystem in the first place. And of course single profile remained single, nothing changed.

            Data      Data      Metadata  System                            
Id Path     single    RAID1     RAID1     RAID1    Unallocated Total   Slack
-- -------- --------- --------- --------- -------- ----------- ------- -----
 1 /dev/sdb         - 102.38MiB 102.38MiB  8.00MiB   811.25MiB 1.00GiB     -
 2 /dev/sdc 336.00MiB 102.38MiB 102.38MiB  8.00MiB   475.25MiB 1.00GiB     -
-- -------- --------- --------- --------- -------- ----------- ------- -----
   Total    336.00MiB 102.38MiB 102.38MiB  8.00MiB     1.26GiB 2.00GiB 0.00B
   Used     198.00MiB   2.00MiB 368.00KiB 16.00KiB                          

So, sorry - btrfs multi-device is by far not a replacement for a true RAID (be it hardware or software). For this reason I do not consider the problem of booting with degraded system btrfs RAID1 to be of any importance. It is simply not there yet.

Wow thank you for taking the time to look into this more!

Oh no, that’s of course a problem. Let me try and state it more precisely: I want to figure out how to boot into an encrypted Btrfs RAID1 system where one of two hard drives is defective/offline, to be able to use the system for a few days until a replacement drive arrives, at which point that drive should be added to the RAID1 to achieve redundancy again.

All my attempts I have described above are the various approaches I have found online to achieve this. If they are misguided or too ambitious, then this is due to the fact that I don’t know better. I am thankful for all better advice.

From my understanding, a btrfs balance should fix these issues, shouldn’t it? In fact, from what I read I gathered that this is exactly the required procedure after the RAID was in a degraded state. You balance it to incorporate the single profile data.

I read up on this as best I could before starting my own experiments and read pretty much everywhere that RAID1 is now considered stable and recommended (as opposed to RAID5 or 6). With the caveat that it’s a little unintuitive to boot into the system and that the system needs to be rebalanced after a drive replacement and/or degraded state. Remembering to balance in the rare case that it’s required - I can live with that. Normally I can also live with it being unintuitive - but in this case, as you can tell, I seem to have hit my limit. Which is very disappointing, but also challenging, which is why I have spent more time on it.

I you have any suggestions on what else I could try, I’d be very grateful for them.

Unattended boot is near to impossible today. Manual boot - stop in initrd and remove missing device from btrfs.

Yes, as long as you are using the correct invocation. Even then it requires extra space to run so it may fail where traditional RAID is expected to succeed.

It may be stable from btrfs development point of view, but integration into operating system is beyond the scope of btrfs development. Pragmatic answer - do not reboot with missing disk. Wait until disk is ready for replacement, shut down, replace the disk, boot (and notice, that for a system device you may need to install bootloader on this disk which is already beyond the scope of btrfs). If you want more comfortable handling known from using a real RAID1 (unattended boot with missing disk, automatic resilvering after disk replacement) - use real RAID1. Or step in and implement this for btrfs.

I gave it a try. It just confirmed that it is unusable as a real life solution.

  1. It starts with grub. grub is configured to access its /boot/grub2 on btrfs using one device (grub does not have any notation to reference multi-device filesystem at all. It takes one device as reference and internally scans others if needed, but the reference must exist). To mitigate it, grub2-install must configure grub2 images differently on each disk, each image must reference the copy located on the same disk.
  2. grub stops in menu because it fails to configure the second encrypted device, but continues after some timeout. This can be worked around by editing menu commands - if you know the correct UUID to delete …
  3. Of course, dracut stops waiting for the non-existent device. This is internal initque loop which is statically configured when initrd is created. After some time dracut enters emergency shell where it is possible to remove the hook script.
  4. But then you hit systemd dependencies - it waits for each device in /etc/crypttab. It can be worked around by passing rd.luks.uuid= option on kernel command line … if you know the correct UUID of course.
  5. Next is btrfs. systemd will not attempt to mount btrfs unless all devices are present. This can be worked around by removing the corresponding rule from udev btrfs rules … in advance. It may be possible to do in initrd, but by default there is no editor included.
  6. Once you are past this, of course next is btrfs itself which will not mount degraded filesystem with -o degraded option. Could be worked around using rootflags on kernel command line.
  7. Finally after switching to the real root you hit again systemd waiting for the second encrypted device. Fortunately, with timeout (by default).

So, if you prepare in advance udev rules:

--- /usr/lib/udev/rules.d/64-btrfs.rules	2024-06-06 16:18:30.000000000 +0300
+++ /etc/udev/rules.d/64-btrfs.rules	2024-06-16 21:41:32.843751221 +0300
@@ -9,7 +9,7 @@ ENV{SYSTEMD_READY}=="0", GOTO="btrfs_end
 IMPORT{builtin}="btrfs ready $devnode"
 
 # mark the device as not ready to be used by the system
-ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
+#ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
 
 # reconsider pending devices in case when multidevice volume awaits
 ENV{ID_BTRFS_READY}=="1", RUN+="/usr/bin/udevadm trigger -s block -p ID_BTRFS_READY=0"

and kernel command line (it is easier to delete than to type the whole UUID)

GRUB_CMDLINE_LINUX_DEFAULT="splash=silent video=1280x960 security=apparmor mitigations=off plymouth.enable=0 rd.luks.uuid=53d7f53e-3005-4caf-96ac-1469f8f0faab rd.luks.uuid=b1c90f03-9f4f-4db1-a055-38e522f326a9"

and of course add rootflags=degraded to the kernel command line - you will have exactly 50% chance to boot. Because grub2 will boot only if you happen to lose “the right” disk. You will still be dropped into dracut emergency shell where you can remove /var/lib/dracut/hooks/initqueue/finished/90-crypt.sh, exit and after waiting another 90 seconds land in the booted system.

Given the grub problem I do not see any point in all this.

Again, wow, thanks for engaging with this so thoroughly! It clearly looks like my quest is doomed and I shall abandon it. So sad.

Maybe most people using a Btrfs RAID1 use it not for their boot drive but for additional drives - as far as I can follow your critique, a non-boot-drive-RAID1 should not have any of these problems? Otherwise I don’t understand how the Internet can be so full of people writing about their Btrfs RAID1, including instructions on what to do in case of a drive failure.

I assume the issues you write about are not related to using encryption, but general issues, right? Also, these issues should be the same on all distros, nut just OpenSUSE?

Without encryption you obviously do not have extra problems with missing LUKS devices.

These issues are related to dracut and systemd. If initrd/initramfs does use them, it should not have precisely these issues.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.