No space left on /boot for kernel to complie, and btrfs balance segmentation fault

dcurtisfra · April 19, 2024, 4:58pm

Wrong.

> lsblk --fs

Will give you the correct answer.

The FAT32 or FAT16 “vfat” partition is “/boot” – for UEFI …

Serophis · April 19, 2024, 5:15pm

For some reason I wasn’t sharing fdisk -l (which I usually use):


Device              Start        End    Sectors   Size Type
/dev/nvme0n1p1       2048    1050623    1048576   512M EFI System
/dev/nvme0n1p2 1972086784 2000409230   28322447  13.5G Linux swap
/dev/nvme0n1p3    1050624    1869823     819200   400M Linux filesystem
/dev/nvme0n1p4    1869824  211585023  209715200   100G Linux filesystem
/dev/nvme0n1p5  211585024 1972086783 1760501760 839.5G Linux filesystem

pavinjoseph · April 19, 2024, 8:57pm

Definitely out of space in /boot if what KDE says is true (39.7 MiB free), that’s not enough for a new kernel/initramfs. The previous distros I’ve used that had a separate /boot reserved 1G of space for it in addition to 256M for ESP (/boot/efi).

That’s a dying btrfs FS. You should first save it as an image (use dd) to prevent further damage. Hope you have backups
Then try btrfs check --repair or attempt a mount using the rescue option.

If you’re able to mount it and don’t have backups, disable quota/qgroups (btrfs quota disable /) and make it a priority to backup data first before anything else.

For running balance without sufficient space, just add a drive (see btrfs device --help), perform balance and remove drive.

Serophis · April 20, 2024, 10:32am

Haven’t been very good at backing up the root partition (usually won’t miss much)… but at least I have my /home (even though it’s fine, backup is slightly old though).

I assume something like dd if=/dev/nvme0n1p4 of=/some/flashdrive/file.partition will work? I’ll also backup the /home partition before doing anything. Something tells me it was only a matter of time until that btrfs partition would blow up. Isn’t the first time a hidden death clock is ticking despite setting up btrfs/snapper and then forgetting about it…

hcvv · April 20, 2024, 10:41am

Please you are posting all the time output without the command you gave. Please always copy things complete (I e.g. doubt that fdisk -l is complete). Start with the line with the prompt and the command, then all output and end with the line with the new prompt (the last then signals that it is indeed all there is).

E.g. in post #8, is that really lsblk -f ?

I ask specially because you are now talking about a /home partition where I can not find in this thread that you have one. (or did I miss something?)

pavinjoseph · April 20, 2024, 2:11pm

Yep, that should do it. Additionally use status=progress to keep track of it.

May want to disable quota/qgroups in the new btrfs filesystem.
And backup the snapshots to another btrfs target using btrbk.

Slow, steady, calm when dealing with filesystem issues.

Slow, steady, calm when dealing with filesystem issues

Thanks to @karlmistelberger for the sticker

dcurtisfra · April 20, 2024, 3:32pm

The openSUSE default for the EFI partition (mount point /boot/efi) is to setup with a partition size of 500M – here on Leap 15.5 but, no Btrfs …

 > lsblk --fs /dev/sdb
NAME        FSTYPE FSVER LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb                                                                                      
├─sdb1      vfat   FAT16             E385-55AF                             494,6M     1% /boot/efi
├─sdb2      ext4   1.0   System_Root c59a64bf-b464-4ea2-bf3a-d3fd9dded03f   79,4G    21% /
└─sdb3                                                                                   
  └─cr_swap swap   1     cr_swap     408e51f6-d91d-4076-8c11-2c4027f06d55                [SWAP]
 >

 # LANG=C fdisk --list /dev/sdb
Disk /dev/sdb: 111.79 GiB, 120034123776 bytes, 234441648 sectors
Disk model: Intenso SSD Sata
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 782CEFBA-9790-458F-9159-F55B7969408E

Device         Start       End   Sectors   Size Type
/dev/sdb1       2048   1026047   1024000   500M EFI System
/dev/sdb2    1026048 230246399 229220352 109.3G Linux filesystem
/dev/sdb3  230246400 234441614   4195215     2G Linux swap
 #

@Serophis:

Apropos your “possibly dying” NVME SSD drive – “Don’t Panic!”

Modern disk devices and modern filesystems are quite good at preserving data.
Even if, a drive is failing, you can usually (99.99 % of the time) recover your data from that drive provided that, you discover the issue before the thing really becomes unusable.

Backup your “/home” and other user/application partitions such as “/srv” to an external drive or tape system.
Check the failing drive’s S.M.A.R.T. Health with –

# smartctl --all /dev/nvme0n1

Buy a new drive, install it and, re-install the O.S.
For the case of a multi-drive system, you only have to reinstall the O.S. if the failing drive is the one where the partition “/” is located.
Consider buying a disk enclosure for the “maybe dead” drive – simply re-initialise the thing as follows and then, place a complete new, fresh, filesystem on it – consider using XFS – and then use that device for backups.

 # dd if=/dev/urandom of=/dev/sd? iflag=fullblock bs=2G count=2 status=progress
 # dd if=/dev/zero of=/dev/sd? iflag=fullblock bs=2G count=2 status=progress

Serophis · April 20, 2024, 4:49pm

Ah, apologies for mentioning /home. For me it’s quite obvious which partition is what from the size, so /dev/nvme0n1p5 is the XFS /home partition, and it seems to be fine. I’m currently using rsync to back it up just in case (mounted without issues). Only the btrfs root partition ( /dev/nvme0n1p4) is the one with any issue. I’ll dd that to a flash drive I hopefully can reformat in this LiveCD system. Asked about dd just in case I needed to format the flash drive in a particular way, or maybe not at all.

After dd of root I might try and mount it? Something tells me I should try that before btrfs check --repair, but as long as I’ve imaged it with dd I have my options… I’m fairly certain my NVME SSD drive itself should be fine, but I’ll do a check just in case after the dd.

btw, I’m fairly certain the last time I listed btrfs qgroups, there were a lot of empty entries. Maybe because I may have rm -rf a few numbers in /.snapshots that weren’t listed in snapper ls, lol.

Here’s full output of lsblk -f and fdisk -l respectively (I have a tendency to remove output that I’m fairly certain won’t be useful):

linux@localhost:~> lsblk -f
NAME        FSTYPE      FSVER            LABEL                        UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
loop0       squashfs    4.0                                                                                      0   100% /run/overlay/squashfs_container
loop1       ext4        1.0                                           b994ebb6-ab2c-494f-8c77-f7000591f636    1.1G    72% /run/overlay/rootfsbase
sda         iso9660     Joliet Extension openSUSE_Tumbleweed_KDE_Live 2024-03-19-16-18-33-00                              
├─sda1      iso9660     Joliet Extension openSUSE_Tumbleweed_KDE_Live 2024-03-19-16-18-33-00                     0   100% /run/overlay/live
├─sda2      vfat        FAT16            BOOT                         EC31-7115                                           
└─sda3      ext4        1.0              cow                          a1456023-7daf-44d2-9a5d-f4ce1d6a4f93   51.3G     2% /run/overlay/overlayfs
sdb         iso9660     Joliet Extension Debian 12.2.0 amd64 n        2023-10-07-10-32-09-00                              
├─sdb1      iso9660     Joliet Extension Debian 12.2.0 amd64 n        2023-10-07-10-32-09-00                     0   100% /run/media/linux/Debian 12.2.0 amd64 n
└─sdb2      vfat        FAT16            Debian 12.2.0 amd64 n        DEB0-0001                                           
sdc         ext4        1.0              Backup                       7f0fef3b-b919-46b7-855d-a02534adf7c7  253.3G    81% /run/media/linux/Backup
nvme0n1                                                                                                                   
├─nvme0n1p1 vfat        FAT32                                         6CC6-F5BF                                           
├─nvme0n1p2 crypto_LUKS 1                                             fd8007f5-8019-412d-ab43-027fd9792041                
├─nvme0n1p3 btrfs                                                     ed5ed943-82f1-41af-bd02-d3598e56f3e8                
├─nvme0n1p4 crypto_LUKS 1                                             e6bcc38b-eb1f-49c4-851d-63393d3e3f63                
└─nvme0n1p5 crypto_LUKS 1                                             cdda7931-1a9d-4c27-830b-cc65bc31f414                
  └─rootfs  xfs                                                       a16a4d69-eedb-4d62-aa26-fb343b93c3a9  286.7G    66% /run/media/linux/a16a4d69-eedb-4d62-aa26-fb343b93c3a9
linux@localhost:~> fdisk -l
Absolute path to 'fdisk' is '/usr/sbin/fdisk', so running it may require superuser privileges (eg. root).
linux@localhost:~> sudo fdisk -l
Disk /dev/loop0: 859.19 MiB, 900923392 bytes, 1759616 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/loop1: 4.95 GiB, 5314183168 bytes, 10379264 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/nvme0n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVLB1T0HBLR-000L2              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F94488F4-DF24-450E-A63C-80C17EC53BC6

Device              Start        End    Sectors   Size Type
/dev/nvme0n1p1       2048    1050623    1048576   512M EFI System
/dev/nvme0n1p2 1972086784 2000409230   28322447  13.5G Linux swap
/dev/nvme0n1p3    1050624    1869823     819200   400M Linux filesystem
/dev/nvme0n1p4    1869824  211585023  209715200   100G Linux filesystem
/dev/nvme0n1p5  211585024 1972086783 1760501760 839.5G Linux filesystem

Partition table entries are not in disk order.


Disk /dev/sda: 57.3 GiB, 61530439680 bytes, 120176640 sectors
Disk model:  SanDisk 3.2Gen1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x9ef297b4

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1  *         64   1952363   1952300 953.3M cd unknown
/dev/sda2       1952364   1993323     40960    20M ef EFI (FAT-12/16/32)
/dev/sda3       1994752 120176639 118181888  56.4G 83 Linux


Disk /dev/sdb: 59.75 GiB, 64160400896 bytes, 125313283 sectors
Disk model: Flash Drive     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x52bf7ba9

Device     Boot Start     End Sectors  Size Id Type
/dev/sdb1  *        0 1286143 1286144  628M  0 Empty
/dev/sdb2        4476   23451   18976  9.3M ef EFI (FAT-12/16/32)


Disk /dev/sdc: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: TB003           
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 2097152 bytes


Disk /dev/mapper/rootfs: 839.47 GiB, 901374803968 bytes, 1760497664 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
linux@localhost:~>

pavinjoseph · April 20, 2024, 6:44pm

Yep, the default is around 500 MiB, but it’s using all of 8M on my machine with Secure Boot and Trusted Boot enabled.

Btw, I really doubt OP’s SSD is itself dying, unlike HDDs that fail few sectors at a time and then snowball SSDs usually fail all at once.

Sure, as long as you have the image backup of that partition it’s all good either way!

If you’re able to mount it and as soon as you backup anything important from it, please do check the journal logs for btrfs issues:

sudo journalctl --no-pager -p3 -g btrfs

Serophis · April 20, 2024, 7:18pm

dd is nearing completion. Took a while with 14 MB/s, lol. From post #4 I mentioned an article, would the mount command there work fine?
mount -t btrfs -o recovery,ro /dev/<device_name> /<mount_point>
Since I need to decrypt it first I guess I’ll need to give the name I gave it as first argument? So:
mount -t btrfs -o recovery,ro rootfs /mnt?

Serophis · April 20, 2024, 7:42pm

Finally!

localhost:~ # dd if=/dev/nvme0n1p4 of=/run/media/linux/Future\ Windowns/root.partition status=progress
107360854528 bytes (107 GB, 100 GiB) copied, 7555 s, 14.2 MB/s
209715200+0 records in
209715200+0 records out
107374182400 bytes (107 GB, 100 GiB) copied, 7557.49 s, 14.2 MB/s

Tried to mount, but it’s possible I did it wrong:


localhost:~ # mount -t btrfs -o recovery,ro rootfs /mnt
mount: /mnt: special device rootfs does not exist.
       dmesg(1) may have more information after failed mount system call.
localhost:~ # mount -t btrfs -o recovery,ro dm-0 /mnt
mount: /mnt: special device dm-0 does not exist.
       dmesg(1) may have more information after failed mount system call.
localhost:~ # mount -t btrfs -o recovery,ro /dev/nvme0n1p4 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/nvme0n1p4, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.

dmesg:


[12181.190324] btrfs: Deprecated parameter 'recovery'
[12181.190330] BTRFS warning: 'recovery' is deprecated, use 'rescue=usebackuproot' instead
[12296.966484] btrfs: Deprecated parameter 'recovery'
[12296.966489] BTRFS warning: 'recovery' is deprecated, use 'rescue=usebackuproot' instead
[12332.302995] btrfs: Deprecated parameter 'recovery'
[12332.303001] BTRFS warning: 'recovery' is deprecated, use 'rescue=usebackuproot' instead

Serophis · April 20, 2024, 7:58pm

Too late to edit post… but I got it mounted with mount -t btrfs -o rescue=usebackuproot,ro /dev/mapper/rootfs /mnt/rootfs_content/!

Some empty directories, /var among them (think I had a VM there).

Nonempty directories(/symlinks): /bin, /etc, /lib, /lib64, /sbin, /usr.

localhost:~ # journalctl --no-pager -p3 -g btrfs
-- Boot 60e43659dc2a4d9991f1e0808313ec0d --
-- Boot 77ee5af2d0bc4a7b925fc9ec048fbbe0 --
-- No entries --

Also forgot this:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

malcolmlewis · April 20, 2024, 8:57pm

@Serophis Just a couple of observations on your issue;

Have you tweaked zypper configuration to adjust the amount of kernels kept?
Has the purge-kernels.service run recently? systemctl status purge-kernels.service

Serophis · April 20, 2024, 9:40pm

I think I’d remember if I tweaked that… Is there a configuration file I could check? Went ahead and booted into the system to take a look at purge-kernels.service. It’s enabled, but I didn’t see a history. Fairly certain it works as it should though. And while I was there I tried journalctl --no-pager -p3 -g btrfs again:

malcolmlewis · April 20, 2024, 10:23pm

@Serophis run cat /etc/zypp/zypp.conf | grep multiversion.kernels

Serophis · April 20, 2024, 10:27pm

localhost:~ # cat /mnt/rootfs_content/etc/zypp/zypp.conf | grep multiversion.kernels
multiversion.kernels = latest,latest-1,running

malcolmlewis · April 20, 2024, 10:30pm

@Serophis then I would surmise that the service has not run and left some old kernels present… I think if you (as root user ) meet the condition via touch /boot/do_purge_kernels and run the service it should clean things up…

Serophis · April 20, 2024, 10:54pm

touch /boot/do_purge_kernels worked fine, but systemctl start purge-kernels.service hangs, proba- huh. It never reached a conclusion and mounting failed again for /.snapshots, /usr/local, /opt, /srv, /var, and finally /root, before returning to emergency mode. Listing /boot I don’t see any difference.

malcolmlewis · April 20, 2024, 10:56pm

@Serophis So can you show the output from zypper se -si kernel-default I suspect since it can’t run, manual intervention is required… as in some zypper rm or a manual deletion in /boot.

Serophis · April 20, 2024, 11:01pm

The latest kernel never even got me to the decryption screen btw, so I’ve been using a previous kernel to even be able to boot. I’ll edit this post once I have output (it’ll be a photo). Just got to wait for the 7 start jobs to fail again.

Edit: read-only filesystem, so I guess this is to be expected.

New edit: /boot still looks the same after reboot.

Last edit: I’m off to bed, so apologies for any late replies. And thanks so far!