Noticed a warning last time I updated the system upon kernel compilation:
warning: %posttrans(dracut-059+suse.563.g4900899a-1.1.x86_64) scriptlet failed, exit status 1
This is from an update today, a different package was listed on the aforementioned update. Anyway, the three lines above it says:
dracut[I]: *** Creating image file '/boot/initrd-6.8.6-1-default' ***
cp: error writing '/boot/initrd-6.8.6-1-default': No space left on device
dracut[F]: Creation of /boot/initrd-6.8.6-1-default failed
This prompted me to lazily start a btrfs full balance before realizing it’s neither necessary nor would it solve the problem (a reflexive response since it’s not the first time the system has run out of space due to… I’ll investigate more deeply some day).
Anyway, I started up YaST Software Management to maybe uninstall an old kernel while the full balance was still going. That resulted in the balance stopping due to a “Segmentation fault”, and now my filesystem is in ready-only mode (discovered after trying to forcefully install the latest kernel).
Since the system is actively trying to protect itself from me, I figured I’d make this post before rebooting and maybe causing even more harm…
I found two separate SUSE Support pages offering this as a solution: mount -o remount,rw,skip_balance <mountpoint>
Not entirely sure what the mountpoint would be (guessing /), or if I should try anything else without someone smarter than me weighing in.
Yeah, I tried to restart, it hanged, I forced it, and now it’s borked with 7 start jobs that probably will never finish. Oof. Guess I’ll go to bed. Typical that these things happen in the evening…
Edit: suddenly I’m in emergency mode… dmesg | grep btrfs
Shows me “Filesystem corrupted” among many things. Lots of item nn key itemoff itemsize etc in dmesg. BTRFS critical (device dm-0: state EMA): unable to find ref byte nr <long number> parent <long number> root <long number> owner 0 offset 0 slot 76.
I can still see stuff with ls… Is there hope? System is apparently still read-only.
I booted a few more times yesterday… Although it doesn’t appear to be doing any more damage. Still 7 services that fail to start after 5 min. before emergency mode kicks in (among them mounting of /var, /opt, and /usr/local), and a read-only filesystem.
Might only have the opportunity to make a rescue boot device after 6 or so hours, but I won’t boot into the system any more if that’s too risky.
Worth noting the SSD is encrypted (LVM), with the exception of /boot if I’m not mistaken. Basically the default partitioning with encryption from the installer.
I think I explicitly put /boot outside just to avoid having to type on my password twice, which I had to do last time I did something like that.
To be clear, only /boot was “full”, and only on compilation. I’ve had btrfs balance clean up lots of space before, which is why I stupidly threw myself at it… Should have just let it go undisturbed, but it also seems unlikely that merely opening Software Manager would be enough to cause a segmentation fault while balance was going in the background.
Anyway, I’ve got the LiveCD going. There are 3 crypto_LUKS partitions. Status on /dev/nvme0n1p2 was: size: 28318351 sectors. Otherwise identical output to nvme0n1p4. btrfs check gave:
Opening filesystem to check...
No valid Btrfs found on /dev/mapper/rootfs
ERROR: cannot open file system
With btrfs check it drowned out the terminal with a whole lotta mismatches (added one) before ending with:
ref mismatch on [298791989248 212992] extent item 0, found 1
data extent[298791989248, 212992] referencer count mismatch (parent 272035201024) wanted 0 have 1
backpointer mismatch on [298791989248 212992]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
warning line 3966
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups
ERROR: failed to add qgroup relation, member=811 parent=71776119061218091: No such file or directory
ERROR: loading qgroups from disk: -2
ERROR: failed to check quota groups
found 72810631168 bytes used, error(s) found
total csum bytes: 20625796
total tree bytes: 832585728
total fs tree bytes: 756973568
total extent tree bytes: 49184768
btree space waste bytes: 203844439
file data blocks allocated: 641373999104
referenced 98353680384
extent buffer leak: start 158878040064 len 16384
extent buffer leak: start 271702392832 len 16384
Status for nvme0n1p5 was identical to the others, except size: 1760497664 sectors. btrfs check gave the same as nvme0n1p2.
Definitely out of space in /boot if what KDE says is true (39.7 MiB free), that’s not enough for a new kernel/initramfs. The previous distros I’ve used that had a separate /boot reserved 1G of space for it in addition to 256M for ESP (/boot/efi).
That’s a dying btrfs FS. You should first save it as an image (use dd) to prevent further damage. Hope you have backups
Then try btrfs check --repair or attempt a mount using the rescue option.
If you’re able to mount it and don’t have backups, disable quota/qgroups (btrfs quota disable /) and make it a priority to backup data first before anything else.
For running balance without sufficient space, just add a drive (see btrfs device --help), perform balance and remove drive.
Haven’t been very good at backing up the root partition (usually won’t miss much)… but at least I have my /home (even though it’s fine, backup is slightly old though).
I assume something like dd if=/dev/nvme0n1p4 of=/some/flashdrive/file.partition will work? I’ll also backup the /home partition before doing anything. Something tells me it was only a matter of time until that btrfs partition would blow up. Isn’t the first time a hidden death clock is ticking despite setting up btrfs/snapper and then forgetting about it…
Please you are posting all the time output without the command you gave. Please always copy things complete (I e.g. doubt that fdisk -l is complete). Start with the line with the prompt and the command, then all output and end with the line with the new prompt (the last then signals that it is indeed all there is).
E.g. in post #8, is that really lsblk -f ?
I ask specially because you are now talking about a /home partition where I can not find in this thread that you have one. (or did I miss something?)
Apropos your “possibly dying” NVME SSD drive – “Don’t Panic!”
Modern disk devices and modern filesystems are quite good at preserving data.
Even if, a drive is failing, you can usually (99.99 % of the time) recover your data from that drive provided that, you discover the issue before the thing really becomes unusable.
Backup your “/home” and other user/application partitions such as “/srv” to an external drive or tape system.
Check the failing drive’s S.M.A.R.T. Health with –
# smartctl --all /dev/nvme0n1
Buy a new drive, install it and, re-install the O.S.
For the case of a multi-drive system, you only have to reinstall the O.S. if the failing drive is the one where the partition “/” is located.
Consider buying a disk enclosure for the “maybe dead” drive – simply re-initialise the thing as follows and then, place a complete new, fresh, filesystem on it – consider using XFS – and then use that device for backups.
Ah, apologies for mentioning /home. For me it’s quite obvious which partition is what from the size, so /dev/nvme0n1p5 is the XFS /home partition, and it seems to be fine. I’m currently using rsync to back it up just in case (mounted without issues). Only the btrfs root partition ( /dev/nvme0n1p4) is the one with any issue. I’ll dd that to a flash drive I hopefully can reformat in this LiveCD system. Asked about dd just in case I needed to format the flash drive in a particular way, or maybe not at all.
After dd of root I might try and mount it? Something tells me I should try that before btrfs check --repair, but as long as I’ve imaged it with dd I have my options… I’m fairly certain my NVME SSD drive itself should be fine, but I’ll do a check just in case after the dd.
btw, I’m fairly certain the last time I listed btrfs qgroups, there were a lot of empty entries. Maybe because I may have rm -rf a few numbers in /.snapshots that weren’t listed in snapper ls, lol.
Here’s full output of lsblk -f and fdisk -l respectively (I have a tendency to remove output that I’m fairly certain won’t be useful):
linux@localhost:~> lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
loop0 squashfs 4.0 0 100% /run/overlay/squashfs_container
loop1 ext4 1.0 b994ebb6-ab2c-494f-8c77-f7000591f636 1.1G 72% /run/overlay/rootfsbase
sda iso9660 Joliet Extension openSUSE_Tumbleweed_KDE_Live 2024-03-19-16-18-33-00
├─sda1 iso9660 Joliet Extension openSUSE_Tumbleweed_KDE_Live 2024-03-19-16-18-33-00 0 100% /run/overlay/live
├─sda2 vfat FAT16 BOOT EC31-7115
└─sda3 ext4 1.0 cow a1456023-7daf-44d2-9a5d-f4ce1d6a4f93 51.3G 2% /run/overlay/overlayfs
sdb iso9660 Joliet Extension Debian 12.2.0 amd64 n 2023-10-07-10-32-09-00
├─sdb1 iso9660 Joliet Extension Debian 12.2.0 amd64 n 2023-10-07-10-32-09-00 0 100% /run/media/linux/Debian 12.2.0 amd64 n
└─sdb2 vfat FAT16 Debian 12.2.0 amd64 n DEB0-0001
sdc ext4 1.0 Backup 7f0fef3b-b919-46b7-855d-a02534adf7c7 253.3G 81% /run/media/linux/Backup
nvme0n1
├─nvme0n1p1 vfat FAT32 6CC6-F5BF
├─nvme0n1p2 crypto_LUKS 1 fd8007f5-8019-412d-ab43-027fd9792041
├─nvme0n1p3 btrfs ed5ed943-82f1-41af-bd02-d3598e56f3e8
├─nvme0n1p4 crypto_LUKS 1 e6bcc38b-eb1f-49c4-851d-63393d3e3f63
└─nvme0n1p5 crypto_LUKS 1 cdda7931-1a9d-4c27-830b-cc65bc31f414
└─rootfs xfs a16a4d69-eedb-4d62-aa26-fb343b93c3a9 286.7G 66% /run/media/linux/a16a4d69-eedb-4d62-aa26-fb343b93c3a9
linux@localhost:~> fdisk -l
Absolute path to 'fdisk' is '/usr/sbin/fdisk', so running it may require superuser privileges (eg. root).
linux@localhost:~> sudo fdisk -l
Disk /dev/loop0: 859.19 MiB, 900923392 bytes, 1759616 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/loop1: 4.95 GiB, 5314183168 bytes, 10379264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/nvme0n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVLB1T0HBLR-000L2
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F94488F4-DF24-450E-A63C-80C17EC53BC6
Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 1050623 1048576 512M EFI System
/dev/nvme0n1p2 1972086784 2000409230 28322447 13.5G Linux swap
/dev/nvme0n1p3 1050624 1869823 819200 400M Linux filesystem
/dev/nvme0n1p4 1869824 211585023 209715200 100G Linux filesystem
/dev/nvme0n1p5 211585024 1972086783 1760501760 839.5G Linux filesystem
Partition table entries are not in disk order.
Disk /dev/sda: 57.3 GiB, 61530439680 bytes, 120176640 sectors
Disk model: SanDisk 3.2Gen1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x9ef297b4
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 64 1952363 1952300 953.3M cd unknown
/dev/sda2 1952364 1993323 40960 20M ef EFI (FAT-12/16/32)
/dev/sda3 1994752 120176639 118181888 56.4G 83 Linux
Disk /dev/sdb: 59.75 GiB, 64160400896 bytes, 125313283 sectors
Disk model: Flash Drive
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x52bf7ba9
Device Boot Start End Sectors Size Id Type
/dev/sdb1 * 0 1286143 1286144 628M 0 Empty
/dev/sdb2 4476 23451 18976 9.3M ef EFI (FAT-12/16/32)
Disk /dev/sdc: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: TB003
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 2097152 bytes
Disk /dev/mapper/rootfs: 839.47 GiB, 901374803968 bytes, 1760497664 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
linux@localhost:~>
dd is nearing completion. Took a while with 14 MB/s, lol. From post #4 I mentioned an article, would the mount command there work fine? mount -t btrfs -o recovery,ro /dev/<device_name> /<mount_point>
Since I need to decrypt it first I guess I’ll need to give the name I gave it as first argument? So: mount -t btrfs -o recovery,ro rootfs /mnt?
localhost:~ # dd if=/dev/nvme0n1p4 of=/run/media/linux/Future\ Windowns/root.partition status=progress
107360854528 bytes (107 GB, 100 GiB) copied, 7555 s, 14.2 MB/s
209715200+0 records in
209715200+0 records out
107374182400 bytes (107 GB, 100 GiB) copied, 7557.49 s, 14.2 MB/s
Tried to mount, but it’s possible I did it wrong:
localhost:~ # mount -t btrfs -o recovery,ro rootfs /mnt
mount: /mnt: special device rootfs does not exist.
dmesg(1) may have more information after failed mount system call.
localhost:~ # mount -t btrfs -o recovery,ro dm-0 /mnt
mount: /mnt: special device dm-0 does not exist.
dmesg(1) may have more information after failed mount system call.
localhost:~ # mount -t btrfs -o recovery,ro /dev/nvme0n1p4 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/nvme0n1p4, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.
dmesg:
[12181.190324] btrfs: Deprecated parameter 'recovery'
[12181.190330] BTRFS warning: 'recovery' is deprecated, use 'rescue=usebackuproot' instead
[12296.966484] btrfs: Deprecated parameter 'recovery'
[12296.966489] BTRFS warning: 'recovery' is deprecated, use 'rescue=usebackuproot' instead
[12332.302995] btrfs: Deprecated parameter 'recovery'
[12332.303001] BTRFS warning: 'recovery' is deprecated, use 'rescue=usebackuproot' instead