BTRFS Crashes Machine

Howdy Geekos!

I have a new install of Tumbleweed – 20220103 that is giving me BTRFS errors after being on for some amount of time. I’m not sure if this is a software fix, a BTRFS fix, an SSD going bad, or what.
I am entirely ignorant of this sort of low-level issue.

If it belongs in another category, feel free to move it!

Also let me know if I can provide any other information.

The error is not present in ‘journalctl’ when I search for it, and I haven’t had much luck online either. It is preceded by a whole line of “@^@^@^@^@^@^” at the top of the screen.

BTRFS error (device dm-1): bdev /dev/mapper/cl root errs: write 211, rd 13, flush 0, corrupt 0, gen 0

‘journalctl | grep btrfs’ shows some scrubs right before the crash (around 10:52). I have added a space gap around the line that looks the most confusing to me, and potentially similar to the error output above.


Jan 06 10:51:26 <HOSTNAME> systemd[1]: Stopped Balance block groups on a btrfs filesystem.
Jan 06 10:51:26 <HOSTNAME> systemd[1]: btrfs-defrag.timer: Deactivated successfully.
Jan 06 10:51:26 <HOSTNAME> systemd[1]: btrfs-scrub.timer: Deactivated successfully.
Jan 06 10:51:26 <HOSTNAME> systemd[1]: Stopped Scrub btrfs filesystem, verify block checksums.
Jan 06 10:51:26 <HOSTNAME> systemd[1]: btrfs-trim.timer: Deactivated successfully.
Jan 06 10:51:28 <HOSTNAME> systemd[1]: btrfsmaintenance-refresh.path: Deactivated successfully.
Jan 06 10:51:28 <HOSTNAME> systemd[1]: Stopped Watch /etc/sysconfig/btrfsmaintenance.
Jan  06 10:52:05 localhost dracut-cmdline[305]: Using kernel command line  parameters:  rd.driver.pre=btrfs  rd.luks.uuid=luks-eb23bd18-dd9a-412c-aa14-3530d13fb038  rd.luks.uuid=luks-32eb3bab-373c-46c6-9906-585d0276896e  root=/dev/mapper/cr_root rootfstype=btrfs  rootflags=rw,relatime,ssd,space_cache=v2,subvolid=266,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot    BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default  root=UUID=25172501-f833-4edd-a758-020c0fa501af splash=silent quiet  lsm=apparmor mitigations=auto
Jan 06 10:52:16 <HOSTNAME> systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
Jan 06 10:52:16 <HOSTNAME> systemd[1]: Started Balance block groups on a btrfs filesystem.
Jan 06 10:52:16 <HOSTNAME> systemd[1]: Started Scrub btrfs filesystem, verify block checksums.

Jan  06 12:54:09 localhost dracut-cmdline[305]: Using kernel command line  parameters:  rd.driver.pre=btrfs  rd.luks.uuid=luks-eb23bd18-dd9a-412c-aa14-3530d13fb038  rd.luks.uuid=luks-32eb3bab-373c-46c6-9906-585d0276896e  root=/dev/mapper/cr_root rootfstype=btrfs  rootflags=rw,relatime,ssd,space_cache=v2,subvolid=266,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot    BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default  root=UUID=25172501-f833-4edd-a758-020c0fa501af splash=silent quiet  lsm=apparmor mitigations=auto

Jan 06 12:54:17 <HOSTNAME> systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
Jan 06 12:54:17 <HOSTNAME> systemd[1]: Started Balance block groups on a btrfs filesystem.
Jan 06 12:54:17 <HOSTNAME> systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
Jan  06 12:56:34 <HOSTNAME> sudo[3975]:  <USER> : TTY=pts/1 ;  PWD=/home/<USER> ; USER=root ; COMMAND=/usr/sbin/btrfs scrub /
Jan  06 12:56:40 <HOSTNAME> sudo[3980]:  <USER> : TTY=pts/1 ;  PWD=/home/<USER> ; USER=root ; COMMAND=/usr/sbin/btrfs scrub start  /


I used to have kernel parameter added to boot: “nvme_core.default_ps_max_latency_us=0” but this (verified) results in incredibly slow shutdowns and restarts, taking around 5 minutes rather than the normal 10sec.

I can put up with slow shutdowns if it prevents system crashes, but I know there is something I am missing.

HARDWARE:

$ inxi -Fxxc


System:    Host: <HOSTNAME> Kernel: 5.15.12-1-default x86_64 bits: 64 compiler: gcc v: 11.2.1 
           Desktop: KDE Plasma 5.23.4 tk: Qt 5.15.2 wm: kwin_wayland dm: SDDM Distro: openSUSE Tumbleweed 20220103 
Machine:   Type: Laptop System: LENOVO product: 20UDCTO1WW v: ThinkPad T14 Gen 1 serial: <superuser required> Chassis: 
           type: 10 serial: <superuser required> 
           Mobo: LENOVO model: 20UDCTO1WW v: SDK0J40709 WIN serial: <superuser required> UEFI: LENOVO v: R1BET66W(1.35 ) 
           date: 07/30/2021 
Battery:   ID-1: BAT0 charge: 44.3 Wh (86.0%) condition: 51.5/50.5 Wh (101.9%) volts: 12.1 min: 11.6 model: LGC 5B10W139 
           serial: 3176 status: Discharging 
CPU:       Info: 8-Core model: AMD Ryzen 7 PRO 4750U with Radeon Graphics bits: 64 type: MT MCP arch: Zen 2 rev: 1 cache: 
           L2: 4 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 54299 
           Speed: 1391 MHz min/max: 1400/1700 MHz boost: enabled Core speeds (MHz): 1: 1391 2: 1329 3: 2127 4: 1650 5: 1319 
           6: 1319 7: 1441 8: 1540 9: 1986 10: 1398 11: 1388 12: 1340 13: 1386 14: 1389 15: 1936 16: 1672 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Renoir vendor: Lenovo driver: amdgpu v: kernel bus-ID: 07:00.0 
           chip-ID: 1002:1636 
           Device-2: Chicony Integrated Camera type: USB driver: uvcvideo bus-ID: 2-2:2 chip-ID: 04f2:b6d0 
           Display: wayland server: X.org 1.21.1.2 compositor: kwin_wayland driver: loaded: amdgpu,ati 
           unloaded: fbdev,modesetting,vesa resolution: <missing: xdpyinfo> 
           OpenGL: renderer: AMD RENOIR (DRM 3.42.0 5.15.12-1-default LLVM 13.0.0) v: 4.6 Mesa 21.3.1 direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Renoir Radeon High Definition Audio vendor: Lenovo driver: snd_hda_intel 
           v: kernel bus-ID: 07:00.1 chip-ID: 1002:1637 
           Device-2: Advanced Micro Devices [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor vendor: Lenovo 
           driver: snd_rn_pci_acp3x v: kernel bus-ID: 07:00.5 chip-ID: 1022:15e2 
           Device-3: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: Lenovo driver: snd_hda_intel v: kernel 
           bus-ID: 07:00.6 chip-ID: 1022:15e3 
           Sound Server-1: ALSA v: k5.15.12-1-default running: yes 
           Sound Server-2: PulseAudio v: 15.0 running: yes 
           Sound Server-3: PipeWire v: 0.3.42 running: no 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Lenovo driver: r8169 v: kernel port: 3400 
           bus-ID: 02:00.0 chip-ID: 10ec:8168 
           IF: enp2s0f0 state: down mac: 00:2b:67:e7:a4:94 
           Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus-ID: 03:00.0 chip-ID: 8086:2723 
           IF: wlp3s0 state: up mac: 34:cf:f6:f4:8c:e7 
           Device-3: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Lenovo driver: r8169 v: kernel port: 2000 
           bus-ID: 05:00.0 chip-ID: 10ec:8168 
           IF: enp5s0 state: down mac: 00:2b:67:e7:a4:93 
           IF-ID-1: virbr0 state: down mac: 52:54:00:62:f7:c1 
Drives:    Local Storage: total: 931.51 GiB used: 334.48 GiB (35.9%) 
           ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS100T2B0C-00PXH0 size: 931.51 GiB speed: 31.6 Gb/s lanes: 4 
           serial: 2041B2808034 temp: 45.9 C 
Partition: ID-1: / size: 929.01 GiB used: 334.47 GiB (36.0%) fs: btrfs dev: /dev/dm-1 mapped: cr_root 
           ID-2: /boot/efi size: 511 MiB used: 5.1 MiB (1.0%) fs: vfat dev: /dev/nvme0n1p1 
           ID-3: /home size: 929.01 GiB used: 334.47 GiB (36.0%) fs: btrfs dev: /dev/dm-1 mapped: cr_root 
           ID-4: /opt size: 929.01 GiB used: 334.47 GiB (36.0%) fs: btrfs dev: /dev/dm-1 mapped: cr_root 
           ID-5: /var size: 929.01 GiB used: 334.47 GiB (36.0%) fs: btrfs dev: /dev/dm-1 mapped: cr_root 
Swap:      ID-1: swap-1 type: partition size: 2 GiB used: 0 KiB (0.0%) priority: -2 dev: /dev/dm-0 mapped: cr_swap 
Sensors:   System Temperatures: cpu: 45.0 C mobo: N/A gpu: amdgpu temp: 44.0 C 
           Fan Speeds (RPM): fan-1: 2900 
Info:      Processes: 363 Uptime: 0h 5m Memory: 30.57 GiB used: 2.52 GiB (8.2%) Init: systemd v: 249 runlevel: 5 
           target: graphical.target Compilers: gcc: 11.2.1 alt: 11 Packages: note: see --pkg flatpak: 13 Shell: Bash v: 5.1.12 
           running-in: yakuake inxi: 3.3.07

So where do you see this error?

BTRFS error (device dm-1): bdev /dev/mapper/cl root errs: write 211, rd 13, flush 0, corrupt 0, gen 0

This is persistent errors summary on btrfs device. These errors may have happened at any point in the past. Check /var/log/messages, I think by default openSUSE does not have persistent journal and forwards logs to rsyslog.

Do you observe any issues right now? Title of your post says “crash” which implies your system suddenly stopped and had to be restarted bu you did not show any evidence of it.

The error shows up on the screen after the GUI freezes and disappears, and the machine has to be hard reset. I cannot access virtual TTYs or get any response from the machine.

Any work that was unsaved and open is entirely lost–no “recovery” option form LibreOffice, even, no crash logs that I can find (heavy caveat, perhaps I am not looking in the right place).

(image link: https://imgur.com/a/HRVZ6Yy)

The error is not present in ‘journalctl’ when searched (i.e. for that string BTRFS error (device dm-1) …]).

At this point, I’m don’t know enough to say what is even causing these errors, but any diagnostic help is very much appreciated!

https://i.imgur.com/1XurqTN.jpg

Perhaps failing drive…

I have thought that before, but with the kernel paramater ‘nvme_core.default <…> =0’ I never see this issue, just very (very) long shutdown times.

I had this issue before (https://forums.opensuse.org/showthread.php/549897-btrfs-Enters-Read-Only-mode-and-crashes-every-few-hours) and the ‘nvme_core’ code resolves the issue, but I wonder why it arrives if I remove that (and why there is a conflict between shutdown and system dropping to read-only).

Seeing the same issue again after a year has renewed my interest in trying to figure out what exactly might be going wrong.

EDIT: I have found on the Arch Wiki a description of power saving that apparently has issues with some NVME drives…the Wiki states that setting ‘nvme.core’ to 0 should no longer be necessary, but I am finding that it is for some reason.

And that setting the ‘nvme.core_latency’ to 0 has some impact on shutdown…

Hi
I don’t see that on Leap 15.3 and that option on a WDC NVMe, The Intel motherboard won’t boot with NVME AND should be the only reason it’s needed…, I have the efi partition on a USB boot device.

Shutdown is instantaneous…


nvme list


Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     XXXXXXXXXXXX         WDC WDS250G1B0C-00S6U0                   1         250.06  GB / 250.06  GB      4 KiB +  0 B   201000WD




cat /etc/default/grub | grep GRUB_CMDLINE_LINUX_DEFAULT


GRUB_CMDLINE_LINUX_DEFAULT="splash=silent nvme_core.default_ps_max_latency_us=0 quiet mitigations=auto"

Does your system boot direct from the NVMe device?

Is your system really stable? Back in 2003 both of the disks attached to a desktop machine were gone. Removing the faulty DVD burner brought them back. Btrfs errors could be triggered by other faulty components. Does a live system run stable?

Host erlangen never experienced any problems related to btrfs. Journal of current boot is:

**erlangen:~ #** journalctl --no-pager -b  -g nvme
-- Journal begins at Wed 2021-12-15 08:27:08 CET, ends at Sun 2022-01-09 08:01:12 CET. -- 
Jan 07 18:13:12 erlangen kernel: nvme nvme0: pci function 0000:04:00.0 
Jan 07 18:13:12 erlangen kernel: **nvme**** nvme0: missing or invalid SUBNQN field.**
Jan 07 18:13:12 erlangen kernel: nvme nvme0: Shutdown timeout set to 8 seconds 
Jan 07 18:13:12 erlangen kernel: nvme nvme0: 8/0/0 default/read/poll queues 
Jan 07 18:13:12 erlangen kernel:  nvme0n1: p1 p2 
Jan 07 18:13:12 erlangen kernel: BTRFS: device fsid 0e58bbe5-eff7-4884-bb5d-a0aac3d8a344 devid 1 transid 50073 /dev/nvme0n1p2 scanned by systemd-udevd (393) 
Jan 07 18:13:12 erlangen kernel: BTRFS info (device nvme0n1p2): flagging fs with big metadata feature 
Jan 07 18:13:12 erlangen kernel: BTRFS info (device nvme0n1p2): using free space tree 
Jan 07 18:13:12 erlangen kernel: BTRFS info (device nvme0n1p2): has skinny extents 
Jan 07 18:13:13 erlangen kernel: BTRFS info (device nvme0n1p2): enabling ssd optimizations 
Jan 07 18:13:13 erlangen kernel: BTRFS info (device nvme0n1p2): using free space tree 
Jan 07 18:13:14 erlangen systemd[1]: Condition check resulted in Auto-connect to subsystems on FC-NVME devices found during boot being skipped. 
Jan 08 04:36:23 erlangen kernel: nvme nvme0: Shutdown timeout set to 8 seconds 
Jan 08 04:36:23 erlangen kernel: nvme nvme0: 8/0/0 default/read/poll queues 
Jan 08 17:22:41 erlangen kernel: nvme nvme0: Shutdown timeout set to 8 seconds 
Jan 08 17:22:41 erlangen kernel: nvme nvme0: 8/0/0 default/read/poll queues 
Jan 08 22:30:39 erlangen kernel: nvme nvme0: Shutdown timeout set to 8 seconds 
Jan 08 22:30:39 erlangen kernel: nvme nvme0: 8/0/0 default/read/poll queues 
Jan 09 05:56:05 erlangen kernel: nvme nvme0: Shutdown timeout set to 8 seconds 
Jan 09 05:56:05 erlangen kernel: nvme nvme0: 8/0/0 default/read/poll queues 
**erlangen:~ #**
**erlangen:~ #** journalctl --no-pager -b  -g btrfs 
-- Journal begins at Wed 2021-12-15 08:27:08 CET, ends at Sun 2022-01-09 08:01:12 CET. -- 
Jan 07 18:13:11 erlangen dracut-cmdline[255]: Using kernel command line parameters:  rd.driver.pre=btrfs root=UUID=0e58bbe5-eff7-4884-bb5d-a0aac3d8a344 rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache=v2,subvolid=266,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapsho
ts/1/snapshot   BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default root=UUID=0e58bbe5-eff7-4884-bb5d-a0aac3d8a344 quiet plymouth.enable=0 net.ifnames=0 mitigations=auto 
Jan 07 18:13:11 erlangen kernel: Btrfs loaded, crc32c=crc32c-intel, assert=on, zoned=yes, fsverity=yes 
Jan 07 18:13:12 erlangen kernel: BTRFS: device label Leap-15.3 devid 1 transid 3507 /dev/sdc6 scanned by systemd-udevd (368) 
Jan 07 18:13:12 erlangen kernel: BTRFS: device fsid 9e9fb019-007e-497d-9ff0-bda6eb2b131e devid 1 transid 6582 /dev/sdc7 scanned by systemd-udevd (360) 
Jan 07 18:13:12 erlangen kernel: BTRFS: device fsid 0e58bbe5-eff7-4884-bb5d-a0aac3d8a344 devid 1 transid 50073 /dev/nvme0n1p2 scanned by systemd-udevd (393) 
Jan 07 18:13:12 erlangen kernel: BTRFS info (device nvme0n1p2): flagging fs with big metadata feature 
Jan 07 18:13:12 erlangen kernel: BTRFS info (device nvme0n1p2): using free space tree 
Jan 07 18:13:12 erlangen kernel: BTRFS info (device nvme0n1p2): has skinny extents 
Jan 07 18:13:13 erlangen kernel: BTRFS info (device nvme0n1p2): enabling ssd optimizations 
Jan 07 18:13:13 erlangen kernel: BTRFS info (device nvme0n1p2): using free space tree 
Jan 07 18:13:14 erlangen systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance. 
Jan 07 18:13:14 erlangen systemd[1]: Started Balance block groups on a btrfs filesystem. 
Jan 07 18:13:14 erlangen systemd[1]: Started Scrub btrfs filesystem, verify block checksums. 
**erlangen:~ #**

On your machine watch for all lines not present in the above. Use susepaste if the output doesn’t fit the character limit.

Thank you for taking the time to help, firstly!!

Ah, perhaps booting from an NVME has some impact? I do boot directly from the NVME, and it appears have quite a bit less space allo?cated under FORMAT than you do (512 B vs 4 KiB)…

Also note that this is an AMD laptop, although the x86_64 code might be the same as an Intel motherboard, I know nothing of that sort of thing.

I’ve played around with the ‘nvme_core.default’ setting it to various values from 4000 to 30000, although I can’t tell what this does (if anything). The good news is I don’t see system crashes. The bad news is shutdown is incredibly slow–my preferred problem of the two!


$ nvme list

Node                  SN                                           Model                                    Namespace Usage                      Format           FW Rev  
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          2041B2808034         WDC WDS100T2B0C-00PXH0                   1           1.00  TB /   1.00  TB    512   B +  0 B   211070WD


$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX_DEFAULT

GRUB_CMDLINE_LINUX_DEFAULT="splash=silent quiet lsm=apparmor nvme_core.default_ps_max_latency_us=4000 mitigations=auto"


Likewise, karlmistelberger, thank you for your time helping me!!

Oh yikes, I do see quite a few worrying errors, or at least they seem worrying to me. Again, I am largely ignorant of these things. But I am learning!

I have bolded the errors I see below. “device incomplete in udev” seems…bad. But I do not have LVM running on this machine, so I’m unsure why I am seeing an error from lvm?


$ journalctl --no-pager -b -g nvme

-- Journal begins at Wed 2022-01-05 17:58:09 EST, ends at Sun 2022-01-09 12:44:36 EST. --
Jan 09 12:34:10 <HOSTNAME> kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default root=UUID=25172501-f833-4edd-a758-020c0fa501af splash=silent quiet lsm=apparmor nvme_core.default_ps_max_latency_us=4000 mitigations=auto
Jan 09 12:34:10 <HOSTNAME> kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default root=UUID=25172501-f833-4edd-a758-020c0fa501af splash=silent quiet lsm=apparmor nvme_core.default_ps_max_latency_us=4000 mitigations=auto
Jan 09 12:34:10 <HOSTNAME> dracut-cmdline[305]: Using kernel command line parameters:  rd.driver.pre=btrfs rd.luks.uuid=luks-eb23bd18-dd9a-412c-aa14-3530d13fb038 rd.luks.uuid=luks-32eb3bab-373c-46c6-9906-585d0276896e root=/dev/mapper/cr_root rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache=v2,subvolid=266,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot   BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default root=UUID=25172501-f833-4edd-a758-020c0fa501af splash=silent quiet lsm=apparmor nvme_core.default_ps_max_latency_us=4000 mitigations=auto
**
Jan 09 12:34:11 <HOSTNAME> kernel: nvme 0000:01:00.0: platform quirk: setting simple suspend
Jan 09 12:34:11 <HOSTNAME> kernel: nvme nvme0: pci function 0000:01:00.0**

Jan 09 12:34:11 <HOSTNAME> kernel: nvme nvme0: allocated 32 MiB host memory buffer.
Jan 09 12:34:11 <HOSTNAME> kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jan 09 12:34:11 <HOSTNAME> kernel:  nvme0n1: p1 p2 p3

**Jan 09 12:34:18 <HOSTNAME> lvm[843]:   Udev database has incomplete information about device /dev/nvme0n1.
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   /dev/nvme0n1: Failed to get external handle [udev].
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   Udev database has incomplete information about device /dev/nvme0n1p1.
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   /dev/nvme0n1p1: Failed to get external handle [udev].
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   Udev database has incomplete information about device /dev/nvme0n1p2.
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   /dev/nvme0n1p2: Failed to get external handle [udev].
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   Udev database has incomplete information about device /dev/nvme0n1p3.
Jan 09 12:34:18 <HOSTNAME> lvm[843]:   /dev/nvme0n1p3: Failed to get external handle [udev].
Jan 09 12:34:18 <HOSTNAME> systemd-fsck[924]: /dev/nvme0n1p1: 12 files, 1297/130812 clusters
Jan 09 12:34:18 <HOSTNAME> systemd[1]: Condition check resulted in Auto-connect to subsystems on FC-NVME devices found during boot being skipped.**


Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/sda [USB NVMe Realtek], opened
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/sda [USB NVMe Realtek], WDC WDS100T2B0C, S/N:21282H800391, FW:211210WD, 1.00 TB
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/sda [USB NVMe Realtek], is SMART capable. Adding to "monitor" list.
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/sda [USB NVMe Realtek], state read from /var/lib/smartmontools/smartd.WDC_WDS100T2B0C-21282H800391.nvme.state
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/nvme0, opened
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/nvme0, WDC WDS100T2B0C-00PXH0, S/N:2041B2808034, FW:211070WD, 1.00 TB
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.WDC_WDS100T2B0C_00PXH0-2041B2808034.nvme.state
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/sda [USB NVMe Realtek], state written to /var/lib/smartmontools/smartd.WDC_WDS100T2B0C-21282H800391.nvme.state
Jan 09 12:34:18 <HOSTNAME> smartd[1122]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.WDC_WDS100T2B0C_00PXH0-2041B2808034.nvme.state

The only difference here are the kernel params with ‘nvme.core’ and the fact that my system is running with LUKS encryption.


$ journalctl --no-pager -b -g btrfs

-- Journal begins at Wed 2022-01-05 17:58:09 EST, ends at Sun 2022-01-09 12:45:27 EST. --
Jan 09 12:34:10 <HOSTNAME> dracut-cmdline[305]: Using kernel command line parameters:  rd.driver.pre=btrfs rd.luks.uuid=luks-eb23bd18-dd9a-412c-aa14-3530d13fb038 rd.luks.uuid=luks-32eb3bab-373c-46c6-9906-585d0276896e root=/dev/mapper/cr_root rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache=v2,subvolid=266,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot   BOOT_IMAGE=/boot/vmlinuz-5.15.12-1-default root=UUID=25172501-f833-4edd-a758-020c0fa501af splash=silent quiet lsm=apparmor nvme_core.default_ps_max_latency_us=4000 mitigations=auto
Jan 09 12:34:11 <HOSTNAME> kernel: Btrfs loaded, crc32c=crc32c-intel, assert=on, zoned=yes, fsverity=yes
Jan 09 12:34:17 <HOSTNAME> kernel: BTRFS: device fsid 25172501-f833-4edd-a758-020c0fa501af devid 1 transid 6402 /dev/dm-1 scanned by systemd-udevd (723)
Jan 09 12:34:17 <HOSTNAME> kernel: BTRFS info (device dm-1): flagging fs with big metadata feature
Jan 09 12:34:17 <HOSTNAME> kernel: BTRFS info (device dm-1): using free space tree
Jan 09 12:34:17 <HOSTNAME> kernel: BTRFS info (device dm-1): has skinny extents
Jan 09 12:34:17 <HOSTNAME> kernel: BTRFS info (device dm-1): enabling ssd optimizations
Jan 09 12:34:18 <HOSTNAME> kernel: BTRFS info (device dm-1): using free space tree
Jan 09 12:34:18 <HOSTNAME> systemd[1]: Started Watch /etc/sysconfig/btrfsmaintenance.
Jan 09 12:34:19 <HOSTNAME> systemd[1]: Started Balance block groups on a btrfs filesystem.
Jan 09 12:34:19 <HOSTNAME> systemd[1]: Started Scrub btrfs filesystem, verify block checksums.
Jan 09 12:34:19 <HOSTNAME> kernel: BTRFS info (device dm-1): devid 1 device path /dev/mapper/cr_root changed to /dev/dm-1 scanned by systemd-udevd (912)
Jan 09 12:34:19 <HOSTNAME> kernel: BTRFS info (device dm-1): devid 1 device path /dev/dm-1 changed to /dev/mapper/cr_root scanned by systemd-udevd (912)
Jan 09 12:45:07 <HOSTNAME> kernel: BTRFS info (device dm-1): qgroup scan completed (inconsistency flag cleared)

Hi
If you want to use 4K sectors, that means a complete re-install… what happens if you remove?

If I remove what particularly? The ‘nvme.core’ line, or another bit?

Everything is backed up, and I’m willing to reinstall or “break” this machine trying to fix this issue. :slight_smile:

Hi
The nvme one, just reboot and at grub press the ‘e’ key to edit and arrow down to the linuxefi line and delete temporarily, press F10 to boot and see how it goes.

To change the format, you need to boot from a rescue USB device so the nvme device is not mounted…

**FOR REFERENCE ONLY THIS IS DESTRUCTIVE!!
**


Check NVME

nvme id-ns -H /dev/nvmeXnY


Set to 4096 rather than 512


nvme format --lbaf=NUMBER /dev/nvmeXnY

malcolmlewis,

If I remove the ‘nvme.core=0’ line, typically the machine boots fine, but at some point (within a couple hours) the GUI dies and I see the ‘BTRFS error’ on the screen (from my earlier post in this thread).

There are some Stack exchange posts on the issue, and other forums (including on Western Digital’s forum), but they seem concerned with startup–as in Linux would not boot without the GRUB parameter, whereas I find boot and userland are fine, but the system experiences crashes after a while.

I’ll try cloning the drive and seeing if it could be hardware, but I’m not sure how a GRUB config would change a hardware issue. Hence this curious and tricky problem…](https://community.wd.com/t/linux-support-for-wd-black-nvme-2018/225446/11)

Nothing to worry here but lvm. Make sure it is disabled by running “systemctl disable --now lvm2-monitor.service lvm2-lvmpolld.socket”:

**erlangen:~ #** systemctl list-unit-files lvm* 
UNIT FILE             STATE    VENDOR PRESET
lvm2-lvmpolld.service static   -            
lvm2-monitor.service  **disabled ****enabled      **
lvm2-pvscan@.service  static   -            
lvm2-lvmpolld.socket  **disabled ****enabled      **

4 unit files listed. 
**erlangen:~ #** systemctl list-units lvm*      
  UNIT LOAD ACTIVE SUB DESCRIPTION
**0 loaded units listed.** Pass --all to see loaded but inactive units, too. 
To show all installed unit files use 'systemctl list-unit-files'. 
**erlangen:~ #**

Note: lvm is installed, but deactivated:

**erlangen:~ #** journalctl -g lvm2 
-- Journal begins at Wed 2021-12-15 08:27:08 CET, ends at Sun 2022-01-09 20:31:25 CET. -- 
**-- Boot 0649246d479f442b84ce423b79792aab --**
**-- Boot ca63465bf82b4a56bc3c2b1447f08a75 --**
**-- Boot 1812fef95504402d897a10d449778ced --**
**-- Boot 8f709d2700c54d5f906c110ad60706a7 --**
**-- Boot 6a6d9bafb74d46f4a8621995d122e49d --**
**-- Boot 08c401e4fad5451bbf656cd6a9cfe6ff --**
**-- Boot 8e72b8f2b5754a00993ef0a24000c6fb --**
**-- Boot 86ae4c0590c447dcafb81ab753c7d1c8 --**
**-- Boot 63f2559c364844a1989d150c53a90c0d --**
**-- Boot 2af3377281f049bb812bbf5f7a4fca54 --**
**-- Boot 76efbe153a694e2b8f85b457af196df0 --**
Jan 09 19:48:26 erlangen [RPM][14533]: **erase lib****lvm2****cmd2_03-2.03.12-3.2.x86_64: success**
Jan 09 19:48:26 erlangen [RPM][14533]: **install lib****lvm2****cmd2_03-2.03.12-3.3.x86_64: success**
Jan 09 19:48:26 erlangen [RPM][14533]: **erase lib****lvm2****cmd2_03-2.03.12-3.2.x86_64: success**
Jan 09 19:48:26 erlangen [RPM][14533]: **install lib****lvm2****cmd2_03-2.03.12-3.3.x86_64: success**
Jan 09 19:53:29 erlangen [RPM][18965]: **erase ****lvm2****-2.03.12-3.2.x86_64: success**
Jan 09 19:53:29 erlangen [RPM][18965]: **install ****lvm2****-2.03.12-3.3.x86_64: success**
Jan 09 19:53:30 erlangen [RPM][18965]: **erase ****lvm2****-2.03.12-3.2.x86_64: success**
Jan 09 19:54:11 erlangen [RPM][21326]: **erase libbd_****lvm2****-2.26-2.1.x86_64: success**
Jan 09 19:54:12 erlangen [RPM][21326]: **install libbd_****lvm2****-2.26-2.2.x86_64: success**
Jan 09 19:54:12 erlangen [RPM][21326]: **erase libbd_****lvm2****-2.26-2.1.x86_64: success**
Jan 09 19:54:12 erlangen [RPM][21326]: **install libbd_****lvm2****-2.26-2.2.x86_64: success**
**-- Boot 25050f121f7049d0a3af17c30dfd481c --**
**erlangen:~ #**

Thank you for that! I’ve disabled it from spamming my logs now.

I’ll keep playing around with this kernel paramater, and report back what I find (if anything can be systematically determined).

Thank you so much!

lvm2 does much more than spamming the logs. It can cause delays on shutdown and more. That’s why I bothered. As a rule of thumb: If you don’t use it disable:

**erlangen:~ #** systemctl list-unit-files | grep enabled | grep disabled      
apache2.service                              enabled         **disabled**
avahi-daemon.service                         **disabled**        enabled 
bluetooth.service                            enabled         **disabled**
chronyd.service                              enabled         **disabled**
FR735.service                                enabled         **disabled**
GARMIN.service                               enabled         **disabled**
hd-idle.service                              enabled         **disabled**
lm_sensors.service                           enabled         **disabled**
lvm2-monitor.service                         **disabled**        enabled 
minidlna.service                             enabled         **disabled**
ModemManager.service                         **disabled**        enabled 
nscd.service                                 **disabled**        enabled 
smartd.service                               **disabled**        enabled 
systemd-networkd.service                     enabled         **disabled**
systemd-remount-fs.service                   enabled-runtime **disabled**
systemd-resolved.service                     enabled         **disabled**
lvm2-lvmpolld.socket                         **disabled**        enabled 
systemd-networkd.socket                      enabled         **disabled**
backup-home.timer                            enabled         **disabled**
btrfs-defrag.timer                           **disabled**        enabled 
btrfs-trim.timer                             **disabled**        enabled 
fetchmail.timer                              enabled         **disabled**
mdcheck_start.timer                          **disabled**        enabled 
packagekit-background.timer                  enabled         **disabled**
**erlangen:~ #**