BTRFS is mounted as read-only; snapper-cleanup.service failed: Please help me trouble shoot!

I have an issue which I don’t know how to even start to address properly. In short:
On boot BTRFS is seemingly mounted as Read-Only. I found that out after trying to sudo zypper dup which returns:
sudo: unable to open /var/lib/sudo/ts/USR: Read-Only file system
The target filesystem is mounted as read-only. Please make sure the target filesystem is writeable.
Then, after some googling, and investigation - here’s the output of mount | grep ro,,

/dev/nvme0n1p2 on / type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=265,subvol=/@/.snapshots/1/snapshot)
ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
/dev/nvme0n1p2 on /.snapshots type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=264,subvol=/@/.snapshots)
/dev/nvme0n1p2 on /boot/grub2/i386-pc type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=263,subvol=/@/boot/grub2/i386-pc)
/dev/nvme0n1p2 on /boot/grub2/x86_64-efi type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=262,subvol=/@/boot/grub2/x86_64-efi)
/dev/nvme0n1p2 on /var type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=257,subvol=/@/var)
/dev/nvme0n1p2 on /usr/local type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=258,subvol=/@/usr/local)
/dev/nvme0n1p2 on /srv type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=259,subvol=/@/srv)
/dev/nvme0n1p2 on /opt type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=261,subvol=/@/opt)
/dev/nvme0n1p2 on /root type btrfs (ro,relatime,ssd,space_cache=v2,subvolid=260,subvol=/@/root)
ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)

and mount | grep error
/dev/nvme0n1p1 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro)
As I’ve noticed that htop shows that systemd’s degraded:
systemctl --failed returns:

  UNIT                    LOAD   ACTIVE SUB    DESCRIPTION                       
● snapper-cleanup.service loaded failed failed Daily Cleanup of Snapper Snapshots

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.

Any hints on what is going on here would be nice! It looks as though I might need to boot from pendrive and chroot into my system and mount them or something. I am still quite fresh on openSUSE (or Linux for that matter). Also, a beginner friendly guide, or approach to how to deal with BTRFS would be really nice. I think that filesystem is a selling point for openSuSE (i.e. snapper is!) but then it would seem that it is not straight forward (one good friend of mine still calls it some sort of experimental, obstruse fs…).
But first: Please help me trouble shoot this issue!

Post full output of journalctl -b as root before you reboot to https://susepaste.org/

Hi! I’ve gotta tell that I’ve been having this issue for about 2 weeks but haven’t had time to deal with it, so there’s been plenty of rebooting in between… anyway here’s the output on SUSE Paste.

I requested full output, not

Dec 17 15:18:33 localhost kernel: BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
...skipping...
Dec 17 13:18:35 localhost systemd[1]: Finished File System Check on /dev/disk/by-uuid/EADB-2930.

My bad! Here it should be in full: SUSE Paste

It it read-only now? There are no errors in log.

Does it affect only /var filesystem? Can you create files in other places (like /root)?

So as root I just tried touch test.txt to no avail on:

/boot
/var
/usr
/opt
/root
/bin
/mnt
/proc
/

The only directories where I could create a test-file were /tmp and /home.
That seems to correspond with the mount | grep ro, output from above. Those directories (btrfs subvolumes, right?) are all read-only…

Yes, the question is where it comes from. dracut seems to have the correct flags:

Dec 17 15:18:33 localhost dracut-cmdline[230]: Using kernel command line parameters:  ... rootflags=rw,relatime,ssd,space_cache=v2,subvolid=265,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot ...

cat /etc/fstab would be interesting.

UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /                       btrfs  defaults                      0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /var                    btrfs  subvol=/@/var                 0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /usr/local              btrfs  subvol=/@/usr/local           0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /srv                    btrfs  subvol=/@/srv                 0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /root                   btrfs  subvol=/@/root                0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /opt                    btrfs  subvol=/@/opt                 0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /boot/grub2/x86_64-efi  btrfs  subvol=/@/boot/grub2/x86_64-efi  0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /boot/grub2/i386-pc     btrfs  subvol=/@/boot/grub2/i386-pc  0  0
UUID=2af261e5-d086-40fa-8b88-6dfbbc1d25ff  /.snapshots             btrfs  subvol=/@/.snapshots          0  0
UUID=b0d5cac2-5cc5-4198-95b6-a0f9048398e1  /home                   btrfs  defaults                      0  0
UUID=8eacce40-42b3-425c-b358-7a620583320c  swap                    swap   defaults                      0  0
UUID=EADB-2930                             /boot/efi               vfat   utf8                          0  2

LGTM…

In the meantime, I’ve tried:

$ sudo btrfs scrub start /
sudo: unable to open /var/lib/sudo/ts/USR: Read-only file system
[sudo] password for root: 
WARNING: failed to open the progress status socket at /var/lib/btrfs/scrub.progress.2af261e5-d086-40fa-8b88-6dfbbc1d25ff: Read-only file system. Progress cannot be queried
WARNING: failed to write the progress status file: Read-only file system. Status recording disabled
scrub started on /, fsid 2af261e5-d086-40fa-8b88-6dfbbc1d25ff (pid=24210)
[USR] @ [localhost] in [~] 
$ sudo btrfs scrub status /
sudo: unable to open /var/lib/sudo/ts/USR: Read-only file system
[sudo] password for root: 
UUID:             2af261e5-d086-40fa-8b88-6dfbbc1d25ff
Scrub started:    Thu Dec  1 07:57:55 2022
Status:           finished
Duration:         0:01:00
Total to scrub:   49.38GiB
Rate:             765.26MiB/s
Error summary:    no errors found

and also
sudo mount -o remount,rw /
returns:
mount: /: mount point not mounted or bad option.

Yes. In this case break into shell in initrd after root is mounted and check whether it is already read-only. Add rd.break=mount or rd.break=pivot to kernel command line (I am not sure whether mount stops before or after root is mounted). Booting stops in shell. Keep in mind, it is initrd so only limited commands are available. Real root should be mounted as /sysroot. If it is read-write, next step is to boot into run level 1 (simply add 1 to kernel command line) and check.

What do you mean by breaking into shell in initrd after root is mounted??
Sorry, we are at the point where I’m deep in “i-dont-know-whats-going-on-territory”… :smiley:

Don’t tinker! Get a sound assessment first!

With a default setup btrfs is virtually maintenance free. When I tried to break it I failed. However more skilled users do exist.

That’s bullstuff. Top causes of failure are:

  • shaky hardware
  • no unallocated space available

Run the following command and post its complete output:

6700k:~ # btrfs filesystem usage -T /
Overall:
    Device size:                  59.57GiB
    Device allocated:             21.05GiB
    Device unallocated:           38.52GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         12.67GiB
    Free (estimated):             46.53GiB      (min: 46.53GiB)
    Free (statfs, df):            46.53GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               46.52MiB      (used: 0.00B)
    Multiple profiles:                  no

             Data     Metadata  System                             
Id Path      single   single    single   Unallocated Total    Slack
-- --------- -------- --------- -------- ----------- -------- -----
 1 /dev/sda8 20.01GiB   1.01GiB 32.00MiB    38.52GiB 59.57GiB     -
-- --------- -------- --------- -------- ----------- -------- -----
   Total     20.01GiB   1.01GiB 32.00MiB    38.52GiB 59.57GiB 0.00B
   Used      12.00GiB 687.14MiB 16.00KiB                           
6700k:~ # 

Thanks for the encouragement! Here’s the output of sudo btrfs filesystem usage -T /

Overall:
    Device size:                  82.27GiB
    Device allocated:             52.07GiB
    Device unallocated:           30.20GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         49.38GiB
    Free (estimated):             32.49GiB      (min: 17.39GiB)
    Free (statfs, df):            32.49GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               53.50MiB      (used: 0.00B)
    Multiple profiles:                  no

                  Data     Metadata  System                             
Id Path           single   DUP       DUP      Unallocated Total    Slack
-- -------------- -------- --------- -------- ----------- -------- -----
 1 /dev/nvme0n1p2 50.01GiB   2.00GiB 64.00MiB    30.20GiB 82.27GiB     -
-- -------------- -------- --------- -------- ----------- -------- -----
   Total          50.01GiB   1.00GiB 32.00MiB    30.20GiB 82.27GiB 0.00B
   Used           47.72GiB 853.94MiB 16.00KiB 

Did you boot from a snapshot?? that produces a RO FS untill you move to it

No, I haven’t. I have tried it though to try and do a snapper rollback but it didn’t work for. Precise reasons I cannot remember, but it did have to do with the same issue.

Great! You have lots of unallocated space. More items to check:

  • Show kernel messages (change “sda” to the appropriate device):
6700k:~ # journalctl -b0 _KERNEL_SUBSYSTEM=scsi -g sda
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] Write Protect is off
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] Preferred minimum I/O size 512 bytes
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] supports TCG Opal
Dec 17 15:55:49 6700k kernel: sd 2:0:0:0: [sda] Attached SCSI disk
Dec 17 18:52:02 6700k kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Dec 17 18:52:02 6700k kernel: sd 2:0:0:0: [sda] Stopping disk
Dec 17 18:52:02 6700k kernel: sd 2:0:0:0: [sda] Starting disk
6700k:~ # 
  • Show messages related to the file system:
6700k:~ # journalctl -b -g btrfs
Dec 17 15:55:49 6700k dracut-cmdline[262]: Using kernel command line parameters:  rd.driver.pre=btrfs root=UUID=9e9fb019-007e-497d-9ff0-bda6eb2b131e rootfstype=btrfs r>
Dec 17 15:55:49 6700k kernel: Btrfs loaded, crc32c=crc32c-intel, assert=on, zoned=yes, fsverity=yes
Dec 17 15:55:49 6700k kernel: BTRFS: device fsid 57bc5f32-9908-4fc8-9f38-5447da95bdc2 devid 1 transid 968 /dev/sdc5 scanned by systemd-udevd (376)
Dec 17 15:55:49 6700k kernel: BTRFS: device label leap154 devid 1 transid 4710 /dev/sda7 scanned by systemd-udevd (360)
Dec 17 15:55:49 6700k kernel: BTRFS: device label tumbleweed-b devid 1 transid 1587 /dev/sda4 scanned by systemd-udevd (380)
Dec 17 15:55:49 6700k kernel: BTRFS: device label backup devid 1 transid 691 /dev/sda5 scanned by systemd-udevd (395)
Dec 17 15:55:49 6700k kernel: BTRFS: device label tumbleweed devid 1 transid 48898 /dev/sda8 scanned by systemd-udevd (394)
Dec 17 15:55:50 6700k kernel: BTRFS info (device sda8): using crc32c (crc32c-intel) checksum algorithm
Dec 17 15:55:50 6700k kernel: BTRFS info (device sda8): disk space caching is enabled
Dec 17 15:55:50 6700k kernel: BTRFS info (device sda8): enabling ssd optimizations
Dec 17 15:55:53 6700k systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
Dec 17 15:55:53 6700k systemd[1]: Started Balance block groups on a btrfs filesystem.
Dec 17 15:55:53 6700k systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
6700k:~ # 

I am trying to run journalctl -b0 _KERNEL_SUBSYSTEM=scsi -g nvme0n1 but it returns -- No entries --.
When I lsblk I see:

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0 232,9G  0 disk 
├─sda1        8:1    0  46,6G  0 part 
├─sda2        8:2    0   513M  0 part 
├─sda3        8:3    0     1K  0 part 
└─sda5        8:5    0 185,8G  0 part 
sdb           8:16   0 931,5G  0 disk 
├─sdb1        8:17   0   100M  0 part 
├─sdb2        8:18   0    16M  0 part 
├─sdb3        8:19   0 930,9G  0 part 
└─sdb4        8:20   0   537M  0 part 
sr0          11:0    1  1024M  0 rom  
nvme0n1     259:0    0 232,9G  0 disk 
├─nvme0n1p1 259:1    0   512M  0 part /boot/efi
├─nvme0n1p2 259:2    0  82,3G  0 part /root
│                                     /srv
│                                     /usr/local
│                                     /opt
│                                     /var
│                                     /boot/grub2/x86_64-efi
│                                     /boot/grub2/i386-pc
│                                     /.snapshots
│                                     /
├─nvme0n1p3 259:3    0 134,5G  0 part /home
└─nvme0n1p4 259:4    0  15,6G  0 part [SWAP]

Why am I not getting the desired output??
PS
Here is the output from journalctl -b -g btrfs:

Dec 17 21:02:13 localhost dracut-cmdline[228]: Using kernel command line parameters:  rd.driver.pre=btrfs resume=UUID=8eacce40-42b3->
Dec 17 21:02:13 localhost kernel: Btrfs loaded, crc32c=crc32c-intel, assert=on, zoned=yes, fsverity=yes
Dec 17 21:02:13 localhost kernel: BTRFS: device fsid 2af261e5-d086-40fa-8b88-6dfbbc1d25ff devid 1 transid 41981 /dev/nvme0n1p2 scann>
Dec 17 21:02:13 localhost kernel: BTRFS: device fsid b0d5cac2-5cc5-4198-95b6-a0f9048398e1 devid 1 transid 62249 /dev/nvme0n1p3 scann>
Dec 17 21:02:14 localhost kernel: BTRFS info (device nvme0n1p2): using crc32c (crc32c-intel) checksum algorithm
Dec 17 21:02:14 localhost kernel: BTRFS info (device nvme0n1p2): using free space tree
Dec 17 21:02:14 localhost kernel: BTRFS info (device nvme0n1p2): enabling ssd optimizations
Dec 17 21:02:14 localhost kernel: BTRFS info (device nvme0n1p2): start tree-log replay
Dec 17 19:02:15 localhost kernel: BTRFS info (device nvme0n1p3): using crc32c (crc32c-intel) checksum algorithm
Dec 17 19:02:15 localhost kernel: BTRFS info (device nvme0n1p3): using free space tree
Dec 17 19:02:15 localhost kernel: BTRFS info (device nvme0n1p3): enabling ssd optimizations
Dec 17 19:02:15 localhost kernel: BTRFS info (device nvme0n1p3): start tree-log replay
Dec 17 19:02:15 localhost kernel: BTRFS info (device nvme0n1p3): checking UUID tree

Thanks for the help!

It’s not scsi but nvme.

You have some extra lines I don’t see in my journal. Run the following check:

erlangen:~ # btrfs check --force /dev/nvme0n1p2
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/nvme0n1p2
UUID: 0e58bbe5-eff7-4884-bb5d-a0aac3d8a344
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 501530976256 bytes used, no error found
total csum bytes: 480640024
total tree bytes: 2443313152
total fs tree bytes: 1723138048
total extent tree bytes: 136265728
btree space waste bytes: 525388577
file data blocks allocated: 881895366656
 referenced 580847742976
erlangen:~ #

I only get

# journalctl -b0 _KERNEL_SUBSYSTEM=nvme -g nvme0
Dec 17 21:02:13 localhost kernel: nvme nvme0: pci function 0000:05:00.0
Dec 17 21:02:13 localhost kernel: nvme nvme0: 4/0/0 default/read/poll queues

Here’s output from the check:

# btrfs check --force /dev/nvme0n1p2
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/nvme0n1p2
UUID: 2af261e5-d086-40fa-8b88-6dfbbc1d25ff
[1/7] checking root items
[2/7] checking extents
tree backref 1049444352 parent 1049460736 not found in extent tree
backref 1049444352 parent 982351872 not referenced back 0x55c386ca2220
incorrect global backref count on 1049444352 found 4 wanted 3
backpointer mismatch on [1049444352 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups
ERROR: bytenr ref not found for parent 982351872
ERROR: not enough memory: accounting for refs for qgroups
ERROR: failed to check quota groups
found 53325733888 bytes used, error(s) found
total csum bytes: 21678348
total tree bytes: 990363648
total fs tree bytes: 934133760
total extent tree bytes: 29425664
btree space waste bytes: 200538933
file data blocks allocated: 189721890816
 referenced 136314822656

That looks like something’s up!