Btrfs subvolume /.snapshots not automounting correctly on one specific machine

So, I really like the Btrfs-based snapshotting functionality of openSUSE… but it’s broken on one of my Tumbleweed machines (and only one of them, which suggests a configuration issue) and I do not fully understand why.

I already narrowed it down to the /.snapshots Btrfs subvolume not being mounted at the time where I log in on this machine. This is in spite of said subvolume being configured for automounting in /etc/fstab in exactly the same way as other subvolumes which do automount correctly like /var:

hadrien@linux-2ak3:~> cat /etc/fstab 
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 / btrfs noatime 0 0
UUID=80c122ff-dbe7-4b06-b646-5fc0f8085bd6 /home                ext4       noatime,acl,user_xattr 1 2
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /boot/grub2/i386-pc btrfs noatime,subvol=@/boot/grub2/i386-pc 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /boot/grub2/x86_64-efi btrfs noatime,subvol=@/boot/grub2/x86_64-efi 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /opt btrfs noatime,subvol=@/opt 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /srv btrfs noatime,subvol=@/srv 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /tmp btrfs noatime,subvol=@/tmp 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /usr/local btrfs noatime,subvol=@/usr/local 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /var btrfs noatime,subvol=@/var 0 0
UUID=c7d482b7-d69a-49a9-a033-c04aa5b1a6f9 /.snapshots btrfs noatime,subvol=@/.snapshots 0 0

Further, manually running “sudo mount /.snapshots” after startup will work correctly, suggesting that there is nothing wrong with my fstab configuration or with the disk itself. And indeed, the journalctl logs do suggest that the partition is correctly automounted on startup…


...]
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounted /.snapshots.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Started Apply Kernel Variables.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Starting udev Kernel Device Manager...
oct. 28 11:12:54 linux-2ak3 systemd[1]: Starting File System Check on /dev/disk/by-uuid/80c122ff-dbe7-4b06-b646-5fc0f8085bd6...
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /var...
oct. 28 11:12:54 linux-2ak3 systemd[1]: var.mount: Directory /var to mount over is not empty, mounting anyway.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /usr/local...
oct. 28 11:12:54 linux-2ak3 systemd[1]: usr-local.mount: Directory /usr/local to mount over is not empty, mounting anyway.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /tmp...
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /srv...
oct. 28 11:12:54 linux-2ak3 systemd[1]: srv.mount: Directory /srv to mount over is not empty, mounting anyway.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /opt...
oct. 28 11:12:54 linux-2ak3 systemd[1]: opt.mount: Directory /opt to mount over is not empty, mounting anyway.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /boot/grub2/x86_64-efi...
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /boot/grub2/i386-pc...
oct. 28 11:12:54 linux-2ak3 systemd[1]: boot-grub2-i386\x2dpc.mount: Directory /boot/grub2/i386-pc to mount over is not empty, mounting anyway.
oct. 28 11:12:54 linux-2ak3 systemd[1]: Mounting /.snapshots...
...]

…however, it appears that they are unmounted shortly afterwards…


...]
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /usr/local...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /srv...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /opt...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /home...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /boot/grub2/x86_64-efi...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /boot/grub2/i386-pc...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Unmounting /.snapshots...
oct. 28 11:13:17 linux-2ak3 systemd[1]: Stopped target Local File Systems.
oct. 28 11:13:17 linux-2ak3 systemd[1]: Started Machine Check Exception Logging Daemon.
oct. 28 11:13:17 linux-2ak3 systemd[1]: Started Initialize hardware monitoring sensors.
oct. 28 11:13:17 linux-2ak3 systemd[1]: /usr/lib/systemd/system/pcscd.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/pcscd/pcscd.comm → /run/pcscd/pcscd.comm; please update the unit file accordingly.
oct. 28 11:13:17 linux-2ak3 systemd[1]: /usr/lib/systemd/system/rpc-statd.service:15: PIDFile= references a path below legacy directory /var/run/, updating /var/run/rpc.statd.pid → /run/rpc.statd.pid; please update the unit file accordingly.
oct. 28 11:13:17 linux-2ak3 systemd[1]: /usr/lib/systemd/system/ntpd.service:15: PIDFile= references a path below legacy directory /var/run/, updating /var/run/ntp/ntpd.pid → /run/ntp/ntpd.pid; please update the unit file accordingly.
oct. 28 11:13:17 linux-2ak3 systemd[1]: /usr/lib/systemd/system/display-manager.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/displaymanager.pid → /run/displaymanager.pid; please update the unit file accordingly.
oct. 28 11:13:17 linux-2ak3 systemd[1]: /usr/lib/systemd/system/auditd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/auditd.pid → /run/auditd.pid; please update the unit file accordingly.
oct. 28 11:13:16 linux-2ak3 systemd[1]: Reloading.
...]

…and that although some like /var are remounted afterwards, this is not the case for /.snapshots.

Here’s a full log for reference: https://gist.github.com/HadrienG2/a584e381f39e9866ab7e4a504ceca8a4 .

Can someone help me figure out what’s going on, and how I can fix it so that /.snapshots is properly mounted at the time where I log in?

Hi and welcome to the Forum :slight_smile:
There have been a few threads about this lately (partitions not mounting). So is your system up to date via zypper dup?

https://forums.opensuse.org/showthread.php/537966-Partitions-are-unmounted-during-boot
https://forums.opensuse.org/showthread.php/537920-Yet-another-mount-at-boot-problem-sda1-not-mounting-at-boot
https://forums.opensuse.org/showthread.php/537083-srv-empty-after-booting-recovering-folders-after-reboot

Many thanks for the pointers! I thought about searching for “.snapshots”, but not about searching for more general mount problems. Silly me.

My system is up to date, and in particular I am running systemd 243-2.1 which IIUC should include the first round of patches mentioned in https://bugzilla.opensuse.org/show_bug.cgi?id=1137373 . But I did notice that said bug report is not closed yet, and that the corresponding upstream issue suggests that the fix was not good enough : https://github.com/systemd/systemd/issues/12953 . Hmmm… Let’s subscribe to those issues.

From the various threads you linked to, it seems that the default systemd log level is not strong enough to diagnose the problem at hand, and that it’s better to get more verbose logs via the “systemd.log_level=debug” and “printk.devkmsg=on” boot flags. Therefore, here’s what that looks like on my side: https://gist.github.com/HadrienG2/deea624977bf9f77b505e150698a1a98 .

I’m not sure how to make sense of those 6.7 MB of systemd prose, but random grepping through it does highlight see some weird systemd state transitions (e.g. 10x “dev-sda6.device: Changed dead -> plugged” vs 1x “dev-sda6.device: Changed plugged -> dead” doesn’t seem right) that are strongly reminiscent of the other topics and bug reports which you linked to.

One major difference with respect to most of the topics you linked to, however, is that many people seem to observe this issue intermittently, whereas in my case the unmounting happens on every system boot (as far as I can tell at least).

While I was at it, I also did a btrfs scrub on that partition, just to be sure. But no errors were found.

From reading through those, my main conclusions so far are…

  • This looks like a bug in either systemd or the openSUSE-provided systemd units, most likely the former (see https://github.com/systemd/systemd/issues/12953 ).
  • The bug’s symptoms might be triggered by btrfsmaintenance, but I probably want to keep that around as it is a generally useful utility otherwise.
  • In addition to the three topics you linked to, affected people should probably follow the aforementioned systemd issue and https://bugzilla.opensuse.org/show_bug.cgi?id=1137373 .
  • It looks like I can’t really do anything more than adding a “sudo mount -a” to my profile (backed by an appropriate sudoers NOPASSWD rule) for now.

Hi
Nice summary :slight_smile: I’m not using snapshots here and running btrfs/xfs on SSD/NVMe devices without issues… is your system rotating rust, SSD or NVMe?

Well, that’s the funny thing: the affected system uses a SATA HDD, but my two other SATA SSD Tumbleweed systems are unaffected. This is what initially led me to suspect a configuration problem, as opposed to a badly behaving update.

However, the systemd bug discussions mention a race condition as the origin of the issue. If that is indeed the problem that I am encountering, as it seems, speed differences between HDDs and SSDs could provide another explanation as for why only this system is affected: the lower speed of HDDs (and that one is really a pig) can widen the window of time during which the problem can occur.

Has this been resolved? I am facing similar problem, as reported here.

No fix as far as I know (haven’t try to remove the mount -a workaround on the affected machine). Latest development that I am aware of is where Lennart proposed an alternate analysis of the issue in https://github.com/systemd/systemd/issues/13489#issuecomment-549299890 .

Can you please post your snapper log? It is located in /var/logs/snapper.log

There you go: https://gist.github.com/HadrienG2/c4eec6906a706552743d3d26cb3516a5