Random boot failures in MicroOS

Hi,

I have two machines running MicroOS and they both started exhibiting random failures to boot a few months ago. I ignored the issue all this time hoping it’d go away with an update, but it still happens. It happens randomly and infrequently:

[FAILED] Failed to mount /.snapshots.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Relabel /.snapshots.
[DEPEND] Dependency failed for Mark autorelabel as done.
You are in emergency mode. After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, or "exit" to continue bootup.
Give root password for maintenance
(or press Control-D to continue):

I tried searching, but haven’t found much. The closest was this reddit post, but the person there claims an update resolved their issue, whereas mine persists. In my case, since MicroOS is immutable and I need to reboot after each update, I end up encountering the problem somewhat frequently. In each case, I have to physically reboot the machine since I can’t ssh into it while it’s stuck in emergency mode (and one of them is a 20min drive away…)

Since the issue happens to both of my systems, it makes me think it’s a problem with how I set them up (the filesystem in particular). I set them up the same so I can have a backup in case one goes down. Here’s how I did it:

  • The root filesystem was created by the MicroOS installer on a dedicated SSD
  • /var/data is mounted from two large HDDs in a btrfs raid-1 configuration made with the following commands:
mkfs.btrfs /dev/sda1
mkfs.btrfs /dev/sdb1
mount /dev/sda1 /var/data
btrfs device add /dev/sdb1 /var/data -f	
btrfs balance start -dconvert=raid1 -mconvert=raid1 /var/data
  • and to fstab I added: UUID=<uuid> /var/data btrfs defaults 0 2

(I got the instructions for the raid-1 from this blog post)

Any help would be appreciated. I can also post more info if needed. Besides this occasional boot issue, both have been running perfectly fine for many months.

Thanks!

1 Like

Start with providing full output of

journalctl -b --full --no-pager

when the problem happens, not some random lines.

These are the logs I dumped yesterday in emergency mode when it happened again:

Show your /etc/fstab
Does booting with selinux=0 kernel parameter change anything?

Here’s fstab:

UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff / btrfs ro 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /usr/local btrfs subvol=/@/usr/local 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /srv btrfs subvol=/@/srv 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /root btrfs subvol=/@/root,x-initrd.mount 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /opt btrfs subvol=/@/opt 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /home btrfs subvol=/@/home 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /boot/writable btrfs subvol=/@/boot/writable 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /boot/grub2/x86_64-efi btrfs subvol=/@/boot/grub2/x86_64-efi 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /boot/grub2/i386-pc btrfs subvol=/@/boot/grub2/i386-pc 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /.snapshots btrfs subvol=/@/.snapshots 0 0
UUID=d2b30956-203c-488e-81ce-75b3ccec8eb1 /var btrfs defaults,x-initrd.mount 0 0
UUID=2790-B3E8 /boot/efi vfat utf8 0 2
UUID=aae9df7d-5d45-42a4-9598-df52eda70225 /var/data btrfs defaults 0 2
overlay /etc overlay defaults,lowerdir=/sysroot/var/lib/overlay/166/etc:/sysroot/etc,upperdir=/sysroot/var/lib/overlay/168/etc,workdir=/sysroot/var/lib/overlay/168/work-etc,x-systemd.requires-mounts-for=/var,x-systemd.requires-mounts-for=/sysroot/var,x-initrd.mount 0 0

Does booting with selinux=0 kernel parameter change anything?

I already have selinux set to permissive on one of the systems, and they both have the same issue. I’ll try adding that kernel parameter to see if it helps, but since it happens randomly it’ll be hard to be certain.

I cannot contribute anything to the solution but we are experiencing the same issue during the booting this morning. Instead, /.snapshots, we have an error from

/boot/grub2/i386-pc

We have also setup multiple file systems, separating /home on the computer (at 4 hours remote by a car).

We will keep updated if we can help.

Hi,

We cannot find the root cause for 100% sure but we think now the boot failed because some partition has got corrupted by the power cut at remote place. Our machine (without UPS) restarts itself when the power returns. Anyhow, the hard restart seems to solve the issue for now.

Following is just FYI

I have reviewed all the journalctl after restarts by

journalctl -b x

where x is an interger to the previous boot.

Journalctl of the section I see the error message as follows (one boot ago).

systemd[1]: Mounting /boot/writable…
mount[676]: mount: /boot/grub2/i386-pc: /dev/nvme0n1p2 already mounted on /.
mount[676]: dmesg(1) may have more information after failed mount system call.

systemd-fsck[679]: fsck.fat 4.2 (2021-01-31)
systemd-fsck[679]: There are differences between boot sector and its backup.
systemd-fsck[679]: This is mostly harmless. Differences: (offset:original/backup)
systemd-fsck[679]: 65:01/00
systemd-fsck[679]: Not automatically fixing this.
systemd-fsck[679]: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
systemd-fsck[679]: Automatically removing dirty bit.
systemd-fsck[679]: *** Filesystem was changed ***
systemd-fsck[679]: Writing changes.

kernel: thermal LNXTHERM:00: registered as thermal_zone0
kernel: ACPI: thermal: Thermal Zone [TZ00] (27 C)
kernel: EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
kernel: EDAC igen6: v2.5.1
kernel: mei_me 0000:00:16.0: enabling device (0000 → 0002)
systemd[1]: boot-grub2-i386\x2dpc.mount: Mount process exited, code=exited, status=32/n/a
systemd[1]: boot-grub2-i386\x2dpc.mount: Failed with result ‘exit-code’.
systemd[1]: Failed to mount /boot/grub2/i386-pc.
systemd[1]: Dependency failed for Local File Systems.