Random boot failures in MicroOS

aramallo · July 5, 2024, 12:13am

Hi,

I have two machines running MicroOS and they both started exhibiting random failures to boot a few months ago. I ignored the issue all this time hoping it’d go away with an update, but it still happens. It happens randomly and infrequently:

[FAILED] Failed to mount /.snapshots.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for Relabel /.snapshots.
[DEPEND] Dependency failed for Mark autorelabel as done.
You are in emergency mode. After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, or "exit" to continue bootup.
Give root password for maintenance
(or press Control-D to continue):

I tried searching, but haven’t found much. The closest was this reddit post, but the person there claims an update resolved their issue, whereas mine persists. In my case, since MicroOS is immutable and I need to reboot after each update, I end up encountering the problem somewhat frequently. In each case, I have to physically reboot the machine since I can’t ssh into it while it’s stuck in emergency mode (and one of them is a 20min drive away…)

Since the issue happens to both of my systems, it makes me think it’s a problem with how I set them up (the filesystem in particular). I set them up the same so I can have a backup in case one goes down. Here’s how I did it:

The root filesystem was created by the MicroOS installer on a dedicated SSD
/var/data is mounted from two large HDDs in a btrfs raid-1 configuration made with the following commands:

mkfs.btrfs /dev/sda1
mkfs.btrfs /dev/sdb1
mount /dev/sda1 /var/data
btrfs device add /dev/sdb1 /var/data -f	
btrfs balance start -dconvert=raid1 -mconvert=raid1 /var/data

and to fstab I added: UUID=<uuid> /var/data btrfs defaults 0 2

(I got the instructions for the raid-1 from this blog post)

Any help would be appreciated. I can also post more info if needed. Besides this occasional boot issue, both have been running perfectly fine for many months.

Thanks!

arvidjaar · July 5, 2024, 4:19am

Start with providing full output of

journalctl -b --full --no-pager

when the problem happens, not some random lines.

aramallo · July 5, 2024, 3:50pm

These are the logs I dumped yesterday in emergency mode when it happened again:

arvidjaar · July 6, 2024, 4:09am

Show your /etc/fstab
Does booting with selinux=0 kernel parameter change anything?

aramallo · July 6, 2024, 8:19pm

Here’s fstab:

UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff / btrfs ro 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /usr/local btrfs subvol=/@/usr/local 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /srv btrfs subvol=/@/srv 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /root btrfs subvol=/@/root,x-initrd.mount 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /opt btrfs subvol=/@/opt 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /home btrfs subvol=/@/home 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /boot/writable btrfs subvol=/@/boot/writable 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /boot/grub2/x86_64-efi btrfs subvol=/@/boot/grub2/x86_64-efi 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /boot/grub2/i386-pc btrfs subvol=/@/boot/grub2/i386-pc 0 0
UUID=92b8cafd-0c5f-4417-af7a-4f757a343cff /.snapshots btrfs subvol=/@/.snapshots 0 0
UUID=d2b30956-203c-488e-81ce-75b3ccec8eb1 /var btrfs defaults,x-initrd.mount 0 0
UUID=2790-B3E8 /boot/efi vfat utf8 0 2
UUID=aae9df7d-5d45-42a4-9598-df52eda70225 /var/data btrfs defaults 0 2
overlay /etc overlay defaults,lowerdir=/sysroot/var/lib/overlay/166/etc:/sysroot/etc,upperdir=/sysroot/var/lib/overlay/168/etc,workdir=/sysroot/var/lib/overlay/168/work-etc,x-systemd.requires-mounts-for=/var,x-systemd.requires-mounts-for=/sysroot/var,x-initrd.mount 0 0

Does booting with selinux=0 kernel parameter change anything?

I already have selinux set to permissive on one of the systems, and they both have the same issue. I’ll try adding that kernel parameter to see if it helps, but since it happens randomly it’ll be hard to be certain.

sensnori · July 8, 2024, 1:00pm

I cannot contribute anything to the solution but we are experiencing the same issue during the booting this morning. Instead, /.snapshots, we have an error from

/boot/grub2/i386-pc

We have also setup multiple file systems, separating /home on the computer (at 4 hours remote by a car).

We will keep updated if we can help.

sensnori · July 11, 2024, 10:32am

Hi,

We cannot find the root cause for 100% sure but we think now the boot failed because some partition has got corrupted by the power cut at remote place. Our machine (without UPS) restarts itself when the power returns. Anyhow, the hard restart seems to solve the issue for now.

Following is just FYI

I have reviewed all the journalctl after restarts by

journalctl -b x

where x is an interger to the previous boot.

Journalctl of the section I see the error message as follows (one boot ago).

systemd[1]: Mounting /boot/writable…
mount[676]: mount: /boot/grub2/i386-pc: /dev/nvme0n1p2 already mounted on /.
mount[676]: dmesg(1) may have more information after failed mount system call.
…
systemd-fsck[679]: fsck.fat 4.2 (2021-01-31)
systemd-fsck[679]: There are differences between boot sector and its backup.
systemd-fsck[679]: This is mostly harmless. Differences: (offset:original/backup)
systemd-fsck[679]: 65:01/00
systemd-fsck[679]: Not automatically fixing this.
systemd-fsck[679]: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
systemd-fsck[679]: Automatically removing dirty bit.
systemd-fsck[679]: *** Filesystem was changed ***
systemd-fsck[679]: Writing changes.
…
kernel: thermal LNXTHERM:00: registered as thermal_zone0
kernel: ACPI: thermal: Thermal Zone [TZ00] (27 C)
kernel: EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
kernel: EDAC igen6 MC0: ADDR 0x7fffffffe0
kernel: EDAC igen6: v2.5.1
kernel: mei_me 0000:00:16.0: enabling device (0000 → 0002)
systemd[1]: boot-grub2-i386\x2dpc.mount: Mount process exited, code=exited, status=32/n/a
systemd[1]: boot-grub2-i386\x2dpc.mount: Failed with result ‘exit-code’.
systemd[1]: Failed to mount /boot/grub2/i386-pc.
systemd[1]: Dependency failed for Local File Systems.

td78 · August 5, 2024, 7:56am

Running 3 identical MicroOS VMs, set to auto-update and autoreboot weekly. 1 of the 3 encountered the problem and hung in emergency shell. Relevant journalctl logs echo @aramallo 's issue:

Aug 04 03:00:55 docker01b mount[1019]: mount: /.snapshots: /dev/sda2 already mounted on /.
Aug 04 03:00:55 docker01b mount[1019]:        dmesg(1) may have more information after failed mount system call.
Aug 04 03:00:55 docker01b systemd[1]: \x2esnapshots.mount: Mount process exited, code=exited, status=32/n/a
Aug 04 03:00:55 docker01b systemd[1]: \x2esnapshots.mount: Failed with result 'exit-code'.
Aug 04 03:00:55 docker01b systemd[1]: Failed to mount /.snapshots.
Aug 04 03:00:55 docker01b systemd[1]: Dependency failed for Local File Systems.
Aug 04 03:00:55 docker01b systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Aug 04 03:00:55 docker01b systemd[1]: local-fs.target: Triggering OnFailure= dependencies.

pestaa · August 6, 2024, 7:54am

Experienced the same with /boot/writable. Reboot solved it for me.

Aug 06 04:31:46 - systemd[1]: Mounting /boot/writable...
Aug 06 04:31:46 - systemd[1]: Mounting /home...
Aug 06 04:31:46 - systemd[1]: Mounting /opt...
Aug 06 04:31:46 - mount[857]: mount: /boot/writable: /dev/vda2 already mounted on /.
Aug 06 04:31:46 - mount[857]:        dmesg(1) may have more information after failed mount system call.
Aug 06 04:31:46 - systemd[1]: Mounting /srv...
Aug 06 04:31:46 - systemd[1]: Mounting /usr/local...
Aug 06 04:31:46 - systemd[1]: Finished Virtual Console Setup.
Aug 06 04:31:46 - systemd[1]: Mounted /.snapshots.
Aug 06 04:31:46 - systemd[1]: Mounted /boot/grub2/i386-pc.
Aug 06 04:31:46 - systemd[1]: Mounted /boot/grub2/x86_64-efi.
Aug 06 04:31:46 - systemd[1]: boot-writable.mount: Mount process exited, code=exited, status=32/n/a
Aug 06 04:31:46 - systemd[1]: boot-writable.mount: Failed with result 'exit-code'.
Aug 06 04:31:46 - systemd[1]: Failed to mount /boot/writable.
Aug 06 04:31:46 - systemd[1]: Dependency failed for Local File Systems.
Aug 06 04:31:46 - systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.

PietroMB · August 12, 2024, 8:15am

I am also having the same problems on both ‘/.snapshoits’ and ‘/boot/writable.’ Initially, I thought it was due to the Proxmox node, but moving the machines did not resolve the issue.

Today screnshots, 2 VM on 15 has this problem

Aug 12 08:16:52 localhost systemd[1]: Starting Load Kernel Module loop...
Aug 12 08:16:52 localhost systemd[1]: Starting Load Kernel Module fuse...
Aug 12 08:16:52 localhost systemd[1]: Starting Load Kernel Module efi_pstore...
Aug 12 08:16:52 localhost systemd[1]: Starting Load Kernel Module drm...
Aug 12 08:16:52 localhost systemd[1]: Starting Load Kernel Module dm_mod...
Aug 12 08:16:52 localhost systemd[1]: Starting Load Kernel Module configfs...
Aug 12 08:16:52 localhost systemd[1]: Apply Kernel Variables for 6.10.3-1-default from /boot was skipped because of an unmet condition check (ConditionPathExists=!/usr/lib/modules/6.10.3-1-default/sysctl.conf).
Aug 12 08:16:52 localhost systemd[1]: Load AppArmor profiles was skipped because of an unmet condition check (ConditionSecurity=apparmor).
Aug 12 08:16:52 localhost systemd[1]: Mounting /.snapshots...
Aug 12 08:16:52 localhost systemd[1]: Stopped target Emergency Mode.
Aug 12 08:16:52 localhost systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
Aug 12 08:16:52 localhost systemd[1]: Stopped Emergency Shell.
Aug 12 08:16:52 localhost systemd[1]: emergency.service: Deactivated successfully.
Aug 12 08:16:52 localhost systemd[1]: Reloading finished in 165 ms.
Aug 12 08:16:52 localhost systemd[1]: Reloading...
Aug 12 08:16:52 localhost systemd[1]: Reload requested from client PID 896 ('systemd-sulogin') (unit emergency.service)...
Aug 12 07:29:20 localhost kernel: BTRFS info (device sda2): qgroup scan completed (inconsistency flag cleared)
Aug 12 07:29:12 localhost (plymouth)[838]: emergency.service: Unable to locate executable 'plymouth': No such file or directory
Aug 12 07:29:12 localhost systemd[1]: Startup finished in 933ms (kernel) + 2.740s (initrd) + 2.595s (userspace) = 6.270s.
Aug 12 07:29:12 localhost systemd[1]: Finished Write boot and shutdown times into wtmpdb.
Aug 12 07:29:12 localhost wtmpdb[894]: Error: cannot open dbus
Aug 12 07:29:11 localhost systemd[1]: Starting Write boot and shutdown times into wtmpdb...
Aug 12 07:29:11 localhost systemd[1]: Update is Completed was skipped because no trigger condition checks were met.
Aug 12 07:29:11 localhost systemd[1]: Save Transient machine-id to Disk was skipped because of an unmet condition check (ConditionPathIsMountPoint=/etc/machine-id).
Aug 12 07:29:11 localhost systemd[1]: Rebuild Journal Catalog was skipped because of an unmet condition check (ConditionNeedsUpdate=/var).
Aug 12 07:29:11 localhost systemd[1]: First Boot Complete was skipped because of an unmet condition check (ConditionFirstBoot=yes).
Aug 12 07:29:11 localhost systemd[1]: First Boot Wizard was skipped because of an unmet condition check (ConditionFirstBoot=yes).
Aug 12 07:29:11 localhost systemd[1]: Finished Create System Files and Directories.
Aug 12 07:29:11 localhost systemd[1]: Mounted /home/git-geoweb.
Aug 12 07:29:11 localhost kernel: evm: overlay not supported
Aug 12 07:29:11 localhost kernel: BTRFS info (device sdb1): using free-space-tree
Aug 12 07:29:11 localhost kernel: BTRFS info (device sdb1): using crc32c (crc32c-intel) checksum algorithm
Aug 12 07:29:11 localhost kernel: BTRFS info (device sdb1): first mount of filesystem 046fde70-af11-4860-901c-2a39a10a20ec
Aug 12 07:29:11 localhost kernel: BTRFS: device fsid 046fde70-af11-4860-901c-2a39a10a20ec devid 1 transid 485 /dev/sdb1 (8:17) scanned by mount (872)
Aug 12 07:29:11 localhost systemd[1]: Starting Create System Files and Directories...
Aug 12 07:29:11 localhost systemd[1]: Mounting /home/git-geoweb...
Aug 12 07:29:11 localhost systemd[1]: Finished Flush Journal to Persistent Storage.
Aug 12 07:29:11 localhost systemd[1]: Finished Load kdump kernel early on startup.
Aug 12 07:29:11 localhost systemd[1]: Finished Create missing directories from rpmdb.
Aug 12 07:29:11 localhost systemd[1]: Mounted /usr/local.
Aug 12 07:29:11 localhost systemd[1]: Mounted /srv.
Aug 12 07:29:11 localhost systemd[1]: Mounted /opt.
Aug 12 07:29:11 localhost systemd[1]: Mounted /home.
Aug 12 07:29:11 localhost systemd[1]: Mounted /boot/writable.
Aug 12 07:29:11 localhost systemd[1]: Mounted /boot/grub2/x86_64-efi.
Aug 12 07:29:11 localhost systemd[1]: Mounted /boot/grub2/i386-pc.
Aug 12 07:29:11 localhost create_dirs_from_rpmdb[835]: RPM cookie unchanged, not doing anything
Aug 12 07:29:11 localhost systemd-journald[666]: Received client request to flush runtime journal.
Aug 12 07:29:11 localhost systemd-journald[666]: System Journal (/var/log/journal/7ef5e6d4308944f989ebd1218f39f422) is 526.7M, max 2.4G, 1.9G free.
Aug 12 07:29:11 localhost systemd-journald[666]: Time spent on flushing to /var/log/journal/7ef5e6d4308944f989ebd1218f39f422 is 54.503ms for 1016 entries.
Aug 12 07:29:11 localhost systemd[1]: Set Up Additional Binary Formats was skipped because no trigger condition checks were met.
Aug 12 07:29:11 localhost systemd[1]: Reached target Emergency Mode.
Aug 12 07:29:11 localhost systemd[1]: Started Emergency Shell.
Aug 12 07:29:11 localhost systemd[1]: Reached target Socket Units.
Aug 12 07:29:11 localhost systemd[1]: Reached target Path Units.
Aug 12 07:29:11 localhost systemd[1]: Reached target Network.
Aug 12 07:29:11 localhost systemd[1]: Starting Flush Journal to Persistent Storage...
Aug 12 07:29:11 localhost systemd[1]: Starting Load kdump kernel early on startup...
Aug 12 07:29:11 localhost systemd[1]: Reached target Preparation for Network.
Aug 12 07:29:11 localhost systemd[1]: Reached target Login Prompts.
Aug 12 07:29:11 localhost systemd[1]: Starting Create missing directories from rpmdb...
Aug 12 07:29:11 localhost systemd[1]: Reached target System Time Synchronized.
Aug 12 07:29:11 localhost systemd[1]: Listening on System Extension Image Management.
Aug 12 07:29:11 localhost systemd[1]: Listening on Boot Entries Service Socket.
Aug 12 07:29:11 localhost systemd[1]: Reached target Timer Units.
Aug 12 07:29:11 localhost systemd[1]: Stopped Dispatch Password Requests to Console Directory Watch.
Aug 12 07:29:11 localhost systemd[1]: systemd-ask-password-console.path: Deactivated successfully.
Aug 12 07:29:11 localhost systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Aug 12 07:29:11 localhost systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Aug 12 07:29:11 localhost systemd[1]: Dependency failed for Local File Systems.
Aug 12 07:29:11 localhost systemd[1]: Failed to mount /.snapshots.
Aug 12 07:29:11 localhost systemd[1]: \x2esnapshots.mount: Failed with result 'exit-code'.
Aug 12 07:29:11 localhost systemd[1]: \x2esnapshots.mount: Mount process exited, code=exited, status=32/n/a
Aug 12 07:29:11 localhost mount[826]:        dmesg(1) may have more information after failed mount system call.
Aug 12 07:29:11 localhost mount[826]: mount: /.snapshots: /dev/sda2 already mounted on /.
Aug 12 07:29:11 localhost systemd[1]: Mounting /usr/local...
Aug 12 07:29:11 localhost systemd[1]: usr-local.mount: Directory /usr/local to mount over is not empty, mounting anyway.
Aug 12 07:29:11 localhost systemd[1]: Mounting /srv...
Aug 12 07:29:11 localhost systemd[1]: srv.mount: Directory /srv to mount over is not empty, mounting anyway.
Aug 12 07:29:11 localhost systemd[1]: Mounting /opt...
Aug 12 07:29:11 localhost systemd[1]: Mounting /home...
Aug 12 07:29:11 localhost systemd[1]: Mounting /boot/writable...
Aug 12 07:29:11 localhost systemd[1]: Mounting /boot/grub2/x86_64-efi...
Aug 12 07:29:11 localhost systemd[1]: Mounting /boot/grub2/i386-pc...
Aug 12 07:29:11 localhost systemd[1]: Mounting /.snapshots...
Aug 12 07:29:10 localhost systemd[1]: Finished Virtual Console Setup.
Aug 12 07:29:10 localhost systemd[1]: Finished Load/Save OS Random Seed.

system · September 11, 2024, 8:16am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.