Journal corruption or btrfs corruption - can't decide which

Overview of our desktop machines:
a) each desktop machine has two NVME drives
b) on each NVME drive, there is a dedicated installation of Tumbleweed. (primary and secondary)
c) each TW installation (primary and secondary) has a separate partition for root and a separate partition for /home
… this is in case of boot failure, there is another install to boot from (which happened yesterday)

Yesterday, after a zypper dup … after it finished, I noticed errors when I ran GUI Yast, so I thought I should reboot because I ran zypper dup. I rebooted to the primary TW, and as I’m waiting for the KDE login screen, I see the command line instead.

So I logged in and ran dmesg and noticed both journal errors and btrfs errors.

I did capture the output of dmesg to a file. I booted to the secondary TW on the other drive and did two backups of the primary /home to a 3rd hard drive and an external backup drive.

So, a few minutes ago, I booted up to the primary TW installation again and it repeated the journal errors and btrfs errors, and ended up at a command line.

Out of curiosity, I logged in as my regular user and ran startx and I have a KDE Plasma desktop running, as if nothing is wrong. I did some research on the journal errors and btrfs errors to find a solution. But no definite answer.

So, my question to the experts … because I’m confused how to attempt to fix this:
====== Is it a corrupted journal problem or a corrupted btrfs problem?

Any suggestions on next steps?

I snipped out the end of dmesg output, where the major errors showed up, shown below

==============near the end of dmesg output =================
[   12.566777] BTRFS error (device nvme0n1p3): unable to find ref byte nr 1766604800 parent 1308884992 root 265  owner 12003347 offset 0

[   12.566784] ------------[ cut here ]------------
[   12.566785] BTRFS: Transaction aborted (error -2)
[   12.566806] WARNING: CPU: 8 PID: 116 at fs/btrfs/extent-tree.c:3074 __btrfs_free_extent+0xd2c/0x10f0 [btrfs]
[   12.566882] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib cmac algif_hash algif_skcipher af_alg nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security iscsi_ibft iscsi_boot_sysfs ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vboxnetadp(O) vboxnetflt(O) qrtr snd_seq snd_seq_device vboxdrv(O) bnep joydev nls_iso8859_1 nls_cp437 vfat fat rtw88_8822be rtw88_8822b rtw88_pci rtw88_core snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 snd_hda_intel btusb snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi btrtl intel_rapl_common snd_hda_codec eeepc_wmi btbcm asus_wmi edac_mce_amd btintel snd_hda_core battery btmtk ledtrig_audio snd_hwdep bluetooth kvm_amd sparse_keymap
[   12.566930]  libarc4 snd_pcm platform_profile xfs cfg80211 kvm igb ecdh_generic snd_timer asus_wmi_sensors irqbypass hid_logitech_hidpp mxm_wmi wmi_bmof pcspkr i2c_piix4 rfkill efi_pstore k10temp snd dca soundcore tiny_power_button gpio_amdpt gpio_generic button acpi_cpufreq fuse configfs dmi_sysfs ip_tables x_tables hid_logitech_dj hid_generic usbhid amdgpu crct10dif_pclmul xhci_pci crc32_pclmul xhci_pci_renesas polyval_clmulni polyval_generic gf128mul drm_ttm_helper ttm xhci_hcd ghash_clmulni_intel video iommu_v2 sha512_ssse3 drm_buddy gpu_sched aesni_intel drm_display_helper nvme crypto_simd cryptd usbcore cec ccp rc_core nvme_core sp5100_tco wmi btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_intel sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
[   12.566976] CPU: 8 PID: 116 Comm: kworker/u64:6 Tainted: G        W  O       6.2.12-1-default #1 openSUSE Tumbleweed cf30e455e68e00d1061d6e53870e1ba6290ba30f
[   12.566980] Hardware name: ASUS System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 4007 12/09/2020
[   12.566981] Workqueue: events_unbound btrfs_preempt_reclaim_metadata_space [btrfs]
[   12.567071] RIP: 0010:__btrfs_free_extent+0xd2c/0x10f0 [btrfs]
[   12.567144] Code: fe ff ff 44 89 e6 48 c7 c7 a0 f6 76 c0 e8 9c 97 e3 df 0f 0b e9 a2 fa ff ff be fe ff ff ff 48 c7 c7 a0 f6 76 c0 e8 84 97 e3 df <0f> 0b e9 dd fd ff ff 8b 94 24 a8 00 00 00 48 8b 7c 24 30 49 89 d8
[   12.567147] RSP: 0018:ffffb326805abbc0 EFLAGS: 00010282
[   12.567149] RAX: 0000000000000000 RBX: 00000000694c4000 RCX: 0000000000000027
[   12.567151] RDX: ffff9b9d7ec224c8 RSI: 0000000000000001 RDI: ffff9b9d7ec224c0
[   12.567152] RBP: ffff9b8ee1cca7e0 R08: 0000000000000000 R09: ffffb326805aba68
[   12.567154] R10: 0000000000000003 R11: ffff9b9d7e7fffe8 R12: 0000000000000000
[   12.567155] R13: 0000000000000000 R14: ffff9b8ec16b9478 R15: ffff9b8e96756930
[   12.567156] FS:  0000000000000000(0000) GS:ffff9b9d7ec00000(0000) knlGS:0000000000000000
[   12.567158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.567160] CR2: 00005651d010cb28 CR3: 0000000894e10000 CR4: 00000000003506e0
[   12.567162] Call Trace:
[   12.567163]  <TASK>
[   12.567165]  ? btrfs_block_rsv_release+0xb3/0x1c0 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567254]  __btrfs_run_delayed_refs+0x2c1/0x1210 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567330]  ? start_transaction+0x22b/0x5c0 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567410]  ? kmem_cache_alloc+0x166/0x380
[   12.567414]  ? join_transaction+0xf0/0x400 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567494]  btrfs_run_delayed_refs+0x55/0x200 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567568]  flush_space+0x1ca/0x610 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567657]  ? __switch_to_asm+0x3a/0x80
[   12.567661]  ? finish_task_switch.isra.0+0x94/0x2f0
[   12.567665]  ? btrfs_get_alloc_profile+0xbd/0x1a0 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567752]  btrfs_preempt_reclaim_metadata_space+0x93/0x1b0 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.567841]  process_one_work+0x20a/0x420
[   12.567845]  worker_thread+0x4a/0x3b0
[   12.567848]  ? __pfx_worker_thread+0x10/0x10
[   12.567851]  kthread+0xda/0x100
[   12.567855]  ? __pfx_kthread+0x10/0x10
[   12.567859]  ret_from_fork+0x2c/0x50
[   12.567864]  </TASK>
[   12.567865] ---[ end trace 0000000000000000 ]---

[   12.567867] BTRFS: error (device nvme0n1p3: state A) in __btrfs_free_extent:3074: errno=-2 No such entry
[   12.567871] BTRFS info (device nvme0n1p3: state EA): forced readonly
[   12.567873] BTRFS error (device nvme0n1p3: state EA): failed to run delayed ref for logical 1766604800 num_bytes 4096 type 184 action 2 ref_mod 1: -2
[   12.567879] BTRFS: error (device nvme0n1p3: state EA) in btrfs_run_delayed_refs:2151: errno=-2 No such entry
[   12.749706] ------------[ cut here ]------------

[   12.749709] WARNING: CPU: 4 PID: 602 at fs/btrfs/transaction.c:144 btrfs_put_transaction+0x127/0x130 [btrfs]
[   12.749772] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib cmac algif_hash algif_skcipher af_alg nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security iscsi_ibft iscsi_boot_sysfs ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vboxnetadp(O) vboxnetflt(O) qrtr snd_seq snd_seq_device vboxdrv(O) bnep joydev nls_iso8859_1 nls_cp437 vfat fat rtw88_8822be rtw88_8822b rtw88_pci rtw88_core snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 snd_hda_intel btusb snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi btrtl intel_rapl_common snd_hda_codec eeepc_wmi btbcm asus_wmi edac_mce_amd btintel snd_hda_core battery btmtk ledtrig_audio snd_hwdep bluetooth kvm_amd sparse_keymap
[   12.749813]  libarc4 snd_pcm platform_profile xfs cfg80211 kvm igb ecdh_generic snd_timer asus_wmi_sensors irqbypass hid_logitech_hidpp mxm_wmi wmi_bmof pcspkr i2c_piix4 rfkill efi_pstore k10temp snd dca soundcore tiny_power_button gpio_amdpt gpio_generic button acpi_cpufreq fuse configfs dmi_sysfs ip_tables x_tables hid_logitech_dj hid_generic usbhid amdgpu crct10dif_pclmul xhci_pci crc32_pclmul xhci_pci_renesas polyval_clmulni polyval_generic gf128mul drm_ttm_helper ttm xhci_hcd ghash_clmulni_intel video iommu_v2 sha512_ssse3 drm_buddy gpu_sched aesni_intel drm_display_helper nvme crypto_simd cryptd usbcore cec ccp rc_core nvme_core sp5100_tco wmi btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_intel sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
[   12.749851] CPU: 4 PID: 602 Comm: btrfs-transacti Tainted: G        W  O       6.2.12-1-default #1 openSUSE Tumbleweed cf30e455e68e00d1061d6e53870e1ba6290ba30f
[   12.749855] Hardware name: ASUS System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 4007 12/09/2020
[   12.749856] RIP: 0010:btrfs_put_transaction+0x127/0x130 [btrfs]
[   12.749913] Code: 48 8b bb a0 01 00 00 48 c7 c6 b3 b0 76 c0 e8 70 e7 0b 00 e9 67 ff ff ff 0f 0b e9 fb fe ff ff 0f 0b eb c1 0f 0b e9 35 ff ff ff <0f> 0b e9 3e ff ff ff 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[   12.749915] RSP: 0018:ffffb32680f3fe30 EFLAGS: 00010282
[   12.749917] RAX: ffff9b8eae28f980 RBX: ffff9b8e8acf5c00 RCX: 0000000000000000
[   12.749919] RDX: ffff9b8e8acf5c28 RSI: 0000000000000246 RDI: ffff9b8e8acf5c10
[   12.749920] RBP: ffff9b8ea1e1f000 R08: 0000000000000000 R09: ffffb32680f3fde8
[   12.749921] R10: 0000000010080000 R11: 0000000000000000 R12: ffff9b8e8acf5c00
[   12.749922] R13: ffff9b8ea1e1f428 R14: ffff9b8ea1e1f450 R15: ffff9b8e8acf5c28
[   12.749923] FS:  0000000000000000(0000) GS:ffff9b9d7eb00000(0000) knlGS:0000000000000000
[   12.749925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.749926] CR2: 00007f8e318ae8ae CR3: 0000000894e10000 CR4: 00000000003506e0
[   12.749927] Call Trace:
[   12.749930]  <TASK>

[   12.749932]  btrfs_cleanup_transaction.isra.0+0xb3/0x540 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.749991]  ? __pfx_autoremove_wake_function+0x10/0x10
[   12.749996]  transaction_kthread+0x154/0x1b0 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.750053]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs 5cf1c926e84f6974482b86f4a2b4538a9fab6b09]
[   12.750109]  kthread+0xda/0x100
[   12.750113]  ? __pfx_kthread+0x10/0x10
[   12.750116]  ret_from_fork+0x2c/0x50
[   12.750121]  </TASK>
[   12.750122] ---[ end trace 0000000000000000 ]---

[   13.633415] igb 0000:06:00.0 enp6s0: igb: enp6s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[   13.741509] IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready
[   13.758389] NET: Registered PF_PACKET protocol family
[   42.324281] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating.
[   42.324304] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[   42.324427] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
[   42.326662] systemd-journald[678]: Failed to write entry to /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal (30 items, 780 bytes) despite vacuuming, ignoring: Bad message
[   42.347100] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating.
[   42.347114] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[   42.347212] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
[   42.356512] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[   42.356612] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
[   42.593376] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.595644] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.596128] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.598293] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.607818] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.609936] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.617373] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.619480] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.626549] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.628651] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.635872] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.638151] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.642955] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.645072] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.649856] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.651951] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.652187] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.654279] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.663790] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.665895] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.670988] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.673071] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.676585] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.678023] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.680865] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   42.682091] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[   49.925294] systemd-journald[678]: Failed to write entry to /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal (21 items, 580 bytes) despite vacuuming, ignoring: Bad message (Dropped 23 similar message(s))
[  110.919171] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[  110.919416] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating. (Dropped 65 similar message(s))
[  110.919430] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system (Dropped 64 similar message(s))
[  110.919534] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system (Dropped 64 similar message(s))
[  110.921614] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[  110.921635] systemd-journald[678]: Failed to write entry to /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal (29 items, 702 bytes) despite vacuuming, ignoring: Bad message (Dropped 42 similar message(s))
[  110.935811] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating.
[  110.935823] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[  110.935917] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
[  110.945027] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating.
[  110.945039] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[  110.945132] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
[  110.997316] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[  110.998878] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[  130.055468] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[  130.057635] systemd-journald[678]: Failed to open user journal file, falling back to system journal: Read-only file system
[  130.057644] systemd-journald[678]: Failed to write entry to /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal (29 items, 728 bytes) despite vacuuming, ignoring: Bad message (Dropped 43 similar message(s))
[  172.113536] logitech-hidpp-device 0003:046D:1025.0008: HID++ 1.0 device connected.
[  172.117677] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating. (Dropped 42 similar message(s))
[  172.117696] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system (Dropped 42 similar message(s))
[  172.117808] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system (Dropped 42 similar message(s))
[  172.120061] systemd-journald[678]: Failed to write entry to /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal (13 items, 398 bytes) despite vacuuming, ignoring: Bad message
[  172.129602] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating.
[  172.129614] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[  172.129709] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
[  172.131902] systemd-journald[678]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Journal file corrupted, rotating.
[  172.131913] systemd-journald[678]: Failed to rotate /var/log/journal/a9637f095381461d9ace3985c0ae5331/system.journal: Read-only file system
[  172.132007] systemd-journald[678]: Failed to open journal file '/var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal' for rotation: Read-only file system
================== end of dmesg output ===================

Show full dmesg, not “near the end of it”. Could be long, so upload to https://susepaste.org/

@arvidjaar … thanks for assistance!

Here’s the link:
https://paste.opensuse.org/pastes/094286cfbdc1

There are BTRFS errors. Systemd journal errors at the end are self-explanatory.

Which procedure was used to create the secondary installation?

I appreciate your reply … and correct - as I stated, I see the BTRFS errors and the journal errors.

But I guess what I’m hoping to decide is whether the journal errors are showing up because of the BTRFS errors. Or if it’s the journal issue causing BTRFS problems. I assume that would determine “what” needs to get fixed.

It seems that you’re hinting that it’s the BTRFS errors are the main cause? If yes, what do you suggest as a fix ?

====
To answer your question about the installation procedure. I did it as most anyone would do a fresh installation, to a desktop or laptop. (Keep in mind, both of these TW installations have been running at least three years now on this machine. We have a second desktop that’s configured the exact same way.)

  1. Download the TW ISO install file and write the ISO to a thumb-drive.
  2. Boot up the machine, and ensure the BIOS is set to boot from the thumb-drive. Continue.
  3. The TW Installer boots up and runs (on the thumb-drive).
  4. Choose to install TW to the primary NVME drive (there are two separate NVME drives)
  5. During setup, choose to create four partitions: /boot/EFI, swap, root (BTRFS), and /home (XFS)
  6. Continue and finish the installation (software, etc).
  7. After installation is complete, boot to the new TW installed on the primary NVME, ensure everything is good, update software, etc.
  8. Shutdown TW.
    ======
    a) Next, boot up the machine, with the thumb-drive still plugged in.
    b) Now, when the TW installer shows up onscreen, follow Steps 4-8 above, but NOW install to the secondary NVME drive, complete with the same four partitions.
    c) Ensure the Boot Loader contains an entry to boot the primary TW install. So now the secondary install can (GRUB) boot to either.
    d) Restart the secondary TW install and boot into the primary TW install. Modify the Boot Loader (GRUB) to include the secondary TW install. And obviously, we can go into the BIOS and set whichever NVME drive we want to be primary.
    ==== ALL DONE !! ====
    Screenshot of the Yast Partitioner, showing the setup - you will also notice there is a TW installation on a third (OCZ) drive that can be booted to.
    ====
    partitions-on-primary

Line 2049 in your paste shows that after btrfs failure it is forced readonly, so no write processes can take place. No permanent issues with journal itself. It only warned that it can’t operate on a readonly filesystem. SDDM also failed under this state, even though you could login at the command line and start plasma directly. You won’t be able to persist anything on this filesystem until it is fixed (no persistent journal, no system updates, etc).

This wiki entry can be helpful for your troubleshooting, read carefully SDB:BTRFS - openSUSE Wiki
Otherwise that would be above my paygrade.

Thanks @awerlang … I will read the Wiki article

I am booted into the secondary TW install I have on the secondary NVME drive

I just ran btrfs check on the root partition on the primary drive and this is the result of the check (the “not enough memory” error is strange):

ren :~ # btrfs check /dev/nvme1n1p3
Opening filesystem to check...
Checking filesystem on /dev/nvme1n1p3
UUID: 2b2def5c-620e-4317-91ba-8bf47ced7401
[1/7] checking root items
[2/7] checking extents
data extent[1766604800, 4096] referencer count mismatch (parent 1308884992) wanted 0 have 1
data extent[1766604800, 4096] bytenr mimsmatch, extent item bytenr 1766604800 file item bytenr 0
data extent[1766604800, 4096] referencer count mismatch (parent 7945814016) wanted 1 have 0
data extent[1766604800, 4096] referencer count mismatch (parent 8012922880) wanted 0 have 1
data extent[1766604800, 4096] bytenr mimsmatch, extent item bytenr 1766604800 file item bytenr 0
data extent[1766604800, 4096] referencer count mismatch (parent 1241776128) wanted 1 have 0
backpointer mismatch on [1766604800 4096]
data extent[1766858752, 4096] referencer count mismatch (root 257 owner 3698286 offset 3538944) wanted 0 have 1
data extent[1766858752, 4096] bytenr mimsmatch, extent item bytenr 1766858752 file item bytenr 0
data extent[1766858752, 4096] referencer count mismatch (root 257 owner 3697262 offset 3538944) wanted 1 have 0
backpointer mismatch on [1766858752 4096]
data extent[1767280640, 4096] referencer count mismatch (root 257 owner 3698286 offset 3047424) wanted 0 have 1
data extent[1767280640, 4096] bytenr mimsmatch, extent item bytenr 1767280640 file item bytenr 0
data extent[1767280640, 4096] referencer count mismatch (root 257 owner 3698282 offset 3047424) wanted 1 have 0
backpointer mismatch on [1767280640 4096]
data extent[1771085824, 8192] referencer count mismatch (parent 1321697280) wanted 0 have 1
data extent[1771085824, 8192] bytenr mimsmatch, extent item bytenr 1771085824 file item bytenr 0
data extent[1771085824, 8192] referencer count mismatch (parent 1321435136) wanted 1 have 0
backpointer mismatch on [1771085824 8192]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups
ERROR: bytenr ref not found for parent 7945814016
ERROR: not enough memory: accounting for refs for qgroups
ERROR: failed to check quota groups
found 21178617856 bytes used, error(s) found
total csum bytes: 18968900
total tree bytes: 830472192
total fs tree bytes: 770867200
total extent tree bytes: 32800768
btree space waste bytes: 217723490
file data blocks allocated: 65441669120
 referenced 58874085376
ren :~ #

There are no hardware errors (at least, during this boot). The message start tree-log replay implies that filesystem was not properly unmounted. Filesystem is corrupted, probably earlier (could be due to unclean shutdown). If you are interested in keeping this filesystem, post question to Btrfs mailing list - btrfs Wiki (kernel.org), provide this dmesg output and output from btrfs check.

btrfs check needs a lot of space for metadata. You could try btrfs check --mode=lowmem ..., it may find additional problems. But seeing that this message was in quota groups check, this is probably the least interesting problem (quotas can be rebuilt at any time).

Yea, I checked for any drive errors also. As far as the “probably an unclean shutdown” - minutes before this BTRFS issue showed up, I had just completed a “zypper dup”, i.e., I did the “dup”, then restarted the system as usual. Upon boot, the BTRFS errors showed up.

I personally did not do any drive mounting or un-mounting.

BTW - when the BTRFS issue first showed up, I immediately ran a dmesg .
Then the second time I booted to that TW installation (the next day), got the same BTRFS errors, and ran a another dmesg, and compared the two - they are exactly the same entries.

Yea, I’ll give the Wiki submission option a couple of hours thought / consideration. Your key phrase, “If you are interested in keeping this filesystem”, is what’s holding me off, because I’m also considering switching from BTRFS. I may submit to the Wiki and give it a day to see what the response is. I don’t want too many days to go by without a zypper dup :slight_smile:

I might try that.

Interesting though. This machine has 64GB of RAM. This (and the other desktop and two TW laptops) are personal machines, not server-side business-oriented machines, so there’s no “hard-core” system usage, RAM-wise. What I’ll do shortly is re-run the btrfs check again and monitor RAM usage. The btrfs check only took seconds to run. I’ve often considered bumping the RAM to 128GB and the two machines, but that would be way overkill :slight_smile: !

Well, if metadata is corrupted, we cannot exclude that it simply miscalculates needed memory.

1 Like

==== Almost-Concluding Thoughts ====

So, I may submit to the BTRFS Wiki and see what kind of response I get.

I’m still at 50 / 50 thought of replacing the root partition away from BTRFS to another filesystem that’s been long-standing, trustworthy, and reliable, and with proper tools to recover filesystem issues.

I’ve had one other instance where I’ve had a BTRFS / mounting read-only issue. It was probably a year+ ago. Unfortunately, I didn’t document what the issue was and the fix, but I was able to “fix” the problem, mostly out of frustration - probably a “luck of the draw”, as they say. I may have screenshots of it back in Google Photos, but I’m not gonna search now :slight_smile:

Here’s the MAIN reason why I’m considering switching away from BTRFS: has anyone read the documentation related to the btrfs --repair option?
==== It’s the first option under the category labeled Dangerous Options.

And even the SUSE SLES documentation has the statement:

WARNING : Using ‘–repair’ can further damage a filesystem instead of helping if it can’t fix your particular issue.

Anyway, I’m prepared to switch the filesystem type, if I decide. What I suspect I will do before then is to run some of the btrfs options provided in the SUSE SLES documentation, i.e., probably a repair, knowing that it most likely will not work.

I’ll report back !

btrfs-docs-snap

1 Like

Okay, I just now ran btrfs check two ways - here’s with lowmem set (too much to paste in here directly):

https://paste.opensuse.org/pastes/6824ccffe652

At a pre-corona openSUSE Conference (can’t remember which one and, can’t find the session which I thought was relevant – I checked the 2016, 2017 and 2019 conferences – I wasn’t in Prague for the 2018 conference), one of the (well respected) presenters mentioned during a session in the main hall in the Z-Bau that, the Btrfs repair tools are reasonably reliable and, if you absolutely have to use them then, about 99 % of the time you’ll be spared a re-installation …

  • Bottom line: yes the Btrfs developers are a little bit nervous about the repair tools but, for most systems they’re reasonably reliable.
1 Like

Thanks for the positive encouragement !!

I am about to boot into that primary TW install and gather some info the BTRFS Mail-List requires for submitting problems - afterwards, I’ll boot back here to this secondary to do the submission and then run to other checks.

In the meantime, here’s the output from smartctl:

ren:~ # smartctl -A /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.12-1-default] (SUSE RPM)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    9,031,450 [4.62 TB]
Data Units Written:                 37,105,670 [18.9 TB]
Host Read Commands:                 80,316,765
Host Write Commands:                196,330,891
Controller Busy Time:               1,064
Power Cycles:                       635
Power On Hours:                     1,094
Unsafe Shutdowns:                   171
Media and Data Integrity Errors:    0
Error Information Log Entries:      1,851
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               37 Celsius
Temperature Sensor 2:               40 Celsius
ren :~ #

@myswtest:

You’ll need to check how often “btrfs trim” has been executed on that drive –

  • From the SMART data it seems that, more than a little bit of data has been read and written on that drive …
1 Like

Hmmmm. I’ve done some searching BTRFS docs for “how often “btrfs trim has been executed”, but haven’t come up with anything.

I do see it’s an option when mounting the filesystem, and I’ve read it’s usually the default as of 6.x … mine is btrfs --version == btrfs-progs v6.1.3 … I should check the mount parameters for the TW installation.

You need to check “systemctl status btrfs-trim.timer” to see how often it should be executed on your system.

  • You can check the systemd Journal to see when in the past the Btrfs trim procedure has been executed.
1 Like

Here - keep in mind, even though I’m booted into that TW Install at the moment, root is mounted read-only (but as my regular user, i can run startx for a KDE session).

ren # systemctl status btrfs-trim.timer

btrfs-trim.timer - Discard unused blocks on a mounted filesystem
     Loaded: loaded (/usr/lib/systemd/system/btrfs-trim.timer; disabled; preset: enabled)
     Active: inactive (dead)
    Trigger: n/a
   Triggers: * btrfs-trim.service
       Docs: man:fstrim
[quote="dcurtisfra, post:17, topic:165861, full:true"]
You can check the systemd Journal to see when in the past the Btrfs trim procedure has been executed.

.
Here’s the output I get:

ren : # journalctl | grep trim > journalctl-trim.txt

Mar 28 21:53:23 ren fstrim[12850]: /boot/efi: 494.5 MiB (518529024 bytes) trimmed on /dev/nvme0n1p1
Mar 28 21:53:23 ren fstrim[12850]: /home: 211.7 GiB (227299934208 bytes) trimmed on /dev/nvme0n1p4
Mar 28 21:53:23 ren fstrim[12850]: /: 6.9 GiB (7375536128 bytes) trimmed on /dev/nvme0n1p3
Mar 28 21:53:23 ren systemd[1]: fstrim.service: Deactivated successfully.
Mar 28 21:53:23 ren systemd[1]: fstrim.service: Consumed 1.192s CPU time.
Mar 28 22:00:11 ren systemd[1]: fstrim.timer: Deactivated successfully.
Mar 29 22:28:58 ren systemd[1]: fstrim.timer: Deactivated successfully.
Mar 30 22:46:58 ren systemd[1]: fstrim.timer: Deactivated successfully.
Mar 31 13:32:18 ren systemd[1]: fstrim.timer: Deactivated successfully.
Mar 31 23:35:22 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 01 21:29:08 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 02 22:09:35 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 03 20:45:24 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 03 20:47:07 ren fstrim[4748]: /boot/efi: 494.5 MiB (518529024 bytes) trimmed on /dev/nvme1n1p1
Apr 03 20:47:07 ren fstrim[4748]: /home: 211.6 GiB (227230113792 bytes) trimmed on /dev/nvme1n1p4
Apr 03 20:47:07 ren fstrim[4748]: /: 11 GiB (11859058688 bytes) trimmed on /dev/nvme1n1p3
Apr 03 20:47:07 ren systemd[1]: fstrim.service: Deactivated successfully.
Apr 03 20:47:07 ren systemd[1]: fstrim.service: Consumed 1.350s CPU time.
Apr 03 20:47:57 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 04 11:58:58 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 04 22:41:30 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 05 21:32:45 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 06 20:48:09 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 07 20:26:42 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 10 21:06:27 ren systemd[1]: fstrim.service: Main process exited, code=killed, status=15/TERM
Apr 10 21:06:27 ren systemd[1]: fstrim.service: Failed with result 'signal'.
Apr 10 21:06:27 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 10 21:08:19 ren fstrim[7333]: /boot/efi: 494.5 MiB (518529024 bytes) trimmed on /dev/nvme1n1p1
Apr 10 21:08:19 ren fstrim[7333]: /home: 211.6 GiB (227181912064 bytes) trimmed on /dev/nvme1n1p4
Apr 10 21:08:19 ren fstrim[7333]: /: 12.5 GiB (13399752704 bytes) trimmed on /dev/nvme1n1p3
Apr 10 21:08:19 ren systemd[1]: fstrim.service: Deactivated successfully.
Apr 10 21:08:20 ren systemd[1]: fstrim.service: Consumed 1.699s CPU time.
Apr 10 21:47:40 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 11 21:38:22 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 12 18:57:29 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 13 23:13:03 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 14 22:25:35 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 15 23:17:21 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 16 21:21:06 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 17 10:59:56 ren fstrim[7283]: /boot/efi: 494.5 MiB (518529024 bytes) trimmed on /dev/nvme1n1p1
Apr 17 10:59:56 ren fstrim[7283]: /home: 211.4 GiB (226985738240 bytes) trimmed on /dev/nvme1n1p4
Apr 17 10:59:56 ren fstrim[7283]: /: 13 GiB (13934247936 bytes) trimmed on /dev/nvme1n1p3
Apr 17 10:59:56 ren systemd[1]: fstrim.service: Deactivated successfully.
Apr 17 10:59:56 ren systemd[1]: fstrim.service: Consumed 1.526s CPU time.
Apr 17 11:22:40 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 17 22:49:33 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 18 12:20:02 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 19 23:49:33 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 20 21:26:51 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 21 09:52:38 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 21 10:19:50 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 21 10:21:31 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 22 00:15:39 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 22 21:10:24 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 22 21:12:03 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 23 17:19:08 ren systemd[1]: fstrim.timer: Deactivated successfully.
Apr 23 21:42:45 ren systemd[1]: fstrim.timer: Deactivated successfully.
== end ==

I am HAPPY to report that I ran the steps, one by one, found in here

repair broken btrfs
… which was suggested by @awerlang

I also toggled to this webpage to compare the steps (SLES documentation):

How to recover from BTRFS errors

I am now booted in the the primary TW installation - root (BTRFS filesystem) is now mounted r-w, no more read-only mount ! The KDE Plasma login screen showed up as usual, I logged in and here I am :slight_smile:

I did run dmesg just after login to check the boot up results, and there are a couple of niggles:

[    9.597221] BTRFS warning (device nvme1n1p3): checksum verify failed on logical 18335219712 mirror 1 wanted 0x29262f58 found 0x914d513d level 1
[    9.597235] BTRFS warning (device nvme1n1p3): error accounting new delayed refs extent (err code: -5), quota inconsistent
...
... and ...
[   32.462569] systemd-journald[701]: /var/log/journal/a9637f095381461d9ace3985c0ae5331/user-1000.journal: Journal file corrupted, rotating.

I did redirect output of the –repair, so we could see all the fixes, but unfortunately I wanted to append more info onto that file and I used the redirect > instead of append >>. Oh well.

I am going to boot out of here, to the secondary and run a check, then I’ll boot back to this primary and run a zypper dup.

So, a BIG THANKS to @awerlang and @arvidjaar and @dcurtisfra !!
.

@myswtest:

Weird – here on Leap 15.4 –

 > grep -Ri 'btrfs-trim' /usr/lib/systemd/*
/usr/lib/systemd/system/btrfs-scrub.service:After=fstrim.service btrfs-trim.service
/usr/lib/systemd/system/btrfs-defrag.service:After=fstrim.service btrfs-trim.service btrfs-scrub.service
/usr/lib/systemd/system/btrfs-balance.service:After=fstrim.service btrfs-trim.service btrfs-scrub.service
/usr/lib/systemd/system/btrfs-trim.service:ExecStart=/usr/share/btrfsmaintenance/btrfs-trim.sh
/usr/lib/systemd/system-preset/95-default-SUSE.preset:enable btrfs-trim.timer
 > 
 > find /usr/lib/systemd/ -iname '*btrfs-trim*'
/usr/lib/systemd/system/btrfs-trim.timer
/usr/lib/systemd/system/btrfs-trim.service
 > 

AFAICS, the “btrfs-trim.timer” should be, by default, “enabled” and, AFAICS, there ain’t nothing – no link in a ???.wants/ directory – which is calling either the systemd service or, the systemd timer …

1 Like