Tumbleweed freezing since upgrades from last dec-23

Hi guys! Since last December I can’t upgrade my Tumbleweed due to freezes like in this image. Maybe something related to the BTRFS driver, I guess, since if I roll back to a snapshot from the end of November, everything works fine. In the last 3 months I have tried to upgrade 3 times but the same issue occurs. I can boot the system and use it for some time, but it suddenly freezes or reboot a few minutes or hour later.

This 4th time I did a manual snapshot, upgrade to 20240328 and the same symptom occurs.

Any enlightenment?

Looks like you’ve got a storage device failure in process.

If you boot it from the snapshot that works, run:

dmesg

And see what the output is. My guess is that you’ll see that there are messages indicating read errors, write errors, or general I/O errors to the device nvme0n1p6.

1 Like

There’s no failure in my device since it works pretty well in snapshot 20231106 (where I am right now). There’s no error in smartctl nor dmesg. This is why I believe it is some driver update issue.

The screen shows “writeback errors”. Can you post log contents from dmesg so we can see for ourselves, along with the smartctl output?

Just because it works in a particular snapshot doesn’t mean there isn’t something going on, and the photo you posted clearly shows at least btrfs errors.

You may need to run a filesystem check if the hardware is clear of errors.

There are no indications of hardware error. All those messages are btrfs related, not hardware related. There was at least one similar report, unfortunately without any follow-up:

https://lore.kernel.org/all/CAOCpoWcgZ3ZZi3LhZ7kR-zg+q8Th7n1DCvECdfCsYJ7ckQFL=w@mail.gmail.com/T/

This needs bug report.

2 Likes

These sorts of btrfs errors have thrown me off course too, thinking it was hardware related.
The issue was with Btrfs RAID1C3 feature, I have detailed them in this thread.

All I can say is if you’re certain the hardware is alright, then disable btrfs features one by one until it works. Easier said than done though :weary:

Yep! From Hardware considerations — BTRFS documentation

Hardware as the main source of filesystem corruptions

If you use unreliable hardware and don’t know about that, don’t blame the filesystem when it tells you.

Thank you all guys for your replies! Since these last days I did some tests and I noticed:

  1. If I boot the snapshot 336 of 2024-03-28 and no rollback the system, the system runs without any problems and no error logs was given for hours (see log below)
  2. I decide to rollback to 336, and the issue got back in a way I can’t even capture the log. See the picture taken.

vanitas2206l:~ # snapper list;cat /etc/os-release;smartctl /dev/nvme0 -a;uname -a;dmesg -T --follow|grep -i -E "(nvme|btrfs)"
   # | Type   | Pre # | Date                     | User | Used Space | Cleanup | Description                | Userdata
-----+--------+-------+--------------------------+------+------------+---------+----------------------------+--------------
  0  | single |       |                          | root |            |         | current                    |
323  | single |       | Fri Mar 29 18:24:38 2024 | root |   3.42 MiB |         | Before new upgrade attempt |
328  | pre    |       | Fri Mar 29 19:22:05 2024 | root |   4.17 MiB | number  | zypp(zypper)               | important=yes
329  | post   |   328 | Fri Mar 29 21:03:55 2024 | root |  19.83 MiB | number  |                            | important=yes
330  | pre    |       | Fri Mar 29 21:10:20 2024 | root |   2.14 MiB | number  | zypp(zypper)               | important=yes
331  | post   |   330 | Fri Mar 29 21:11:22 2024 | root |   1.11 MiB | number  |                            | important=yes
332  | pre    |       | Fri Mar 29 21:17:21 2024 | root | 912.00 KiB | number  | zypp(zypper)               | important=yes
333  | post   |   332 | Fri Mar 29 21:40:26 2024 | root |   2.67 MiB | number  |                            | important=yes
334  | single |       | Sat Mar 30 10:55:15 2024 | root |   2.91 MiB | number  | rollback backup of #322    | important=yes
335  | single |       | Sat Mar 30 10:55:28 2024 | root |  16.00 KiB | number  | writable copy of #331      |
336- | single |       | Mon Apr  1 20:22:51 2024 | root |  16.00 KiB | number  | rollback backup of #335    | important=yes
337+ | single |       | Mon Apr  1 20:22:52 2024 | root |  17.56 MiB |         | writable copy of #323      |
338  | pre    |       | Mon Apr  1 20:38:41 2024 | root |   7.55 MiB | number  | zypp(zypper)               | important=no
339  | post   |   338 | Mon Apr  1 20:38:55 2024 | root |   6.05 MiB | number  |                            | important=no
NAME="openSUSE Tumbleweed"
# VERSION="20240328"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20240328"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
# CPE 2.3 format, boo#1217921
CPE_NAME="cpe:2.3:o:opensuse:tumbleweed:20240328:*:*:*:*:*:*:*"
#CPE 2.2 format
#CPE_NAME="cpe:/o:opensuse:tumbleweed:20240328"
BUG_REPORT_URL="https://bugzilla.opensuse.org"
SUPPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.1-1-default] (SUSE RPM)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Netac NVMe SSD 1TB
Serial Number:                      RN202210131TB467755
Firmware Version:                   3.S.F.9
PCI Vendor/Subsystem ID:            0x1f40
IEEE OUI Identifier:                0x000000
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            494e4e 4f47524954
Local Time is:                      Sat Apr  6 11:34:38 2024 -03
Firmware Updates (0x0e):            7 Slots
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     110 Celsius
Critical Comp. Temp. Threshold:     120 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.50W       -        -    0  0  0  0        5       5
 1 +     3.30W       -        -    1  1  1  1       50     100
 2 +     2.80W       -        -    2  2  2  2       50     200
 3 -   0.1500W       -        -    3  3  3  3      500    7500
 4 -   0.0200W       -        -    4  4  4  4     2000   60000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        54 Celsius
Available Spare:                    100%
Available Spare Threshold:          25%
Percentage Used:                    2%
Data Units Read:                    52,501,184 [26.8 TB]
Data Units Written:                 27,956,636 [14.3 TB]
Host Read Commands:                 917,271,634
Host Write Commands:                661,635,846
Controller Busy Time:               62
Power Cycles:                       803
Power On Hours:                     3,921
Unsafe Shutdowns:                   30
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               54 Celsius
Temperature Sensor 2:               38 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-tests not supported

free(): invalid pointer
Aborted (core dumped)
Linux vanitas2206l 6.8.1-1-default #1 SMP PREEMPT_DYNAMIC Tue Mar 19 07:32:20 UTC 2024 (d922afa) x86_64 x86_64 x86_64 GNU/Linux
[Sat Apr  6 11:33:19 2024] Btrfs loaded, assert=on, zoned=yes, fsverity=yes
[Sat Apr  6 11:33:19 2024] nvme 0000:01:00.0: platform quirk: setting simple suspend
[Sat Apr  6 11:33:19 2024] nvme nvme0: pci function 0000:01:00.0
[Sat Apr  6 11:33:19 2024] nvme nvme0: 16/0/0 default/read/poll queues
[Sat Apr  6 11:33:19 2024] nvme nvme0: Ignoring bogus Namespace Identifiers
[Sat Apr  6 11:33:19 2024]  nvme0n1: p1 p2 p3 p4 p5 p6 p7
[Sat Apr  6 11:33:23 2024] BTRFS: device fsid 4b9679aa-e051-4991-852e-dd9ba553dfd0 devid 1 transid 523760 /dev/nvme0n1p6 scanned by mount (550)
[Sat Apr  6 11:33:23 2024] BTRFS info (device nvme0n1p6): first mount of filesystem 4b9679aa-e051-4991-852e-dd9ba553dfd0
[Sat Apr  6 11:33:23 2024] BTRFS info (device nvme0n1p6): using crc32c (crc32c-intel) checksum algorithm
[Sat Apr  6 11:33:23 2024] BTRFS info (device nvme0n1p6): using free-space-tree
[Sat Apr  6 11:33:24 2024] systemd[1]: Starting Load Kernel Module nvme_fabrics...
[Sat Apr  6 11:33:24 2024] systemd[1]: modprobe@nvme_fabrics.service: Deactivated successfully.
[Sat Apr  6 11:33:24 2024] systemd[1]: Finished Load Kernel Module nvme_fabrics.
[Sat Apr  6 11:33:24 2024] Adding 14266364k swap on /dev/nvme0n1p5.  Priority:-2 extents:1 across:14266364k SS
[Sat Apr  6 11:33:25 2024] XFS (nvme0n1p7): Mounting V5 Filesystem 624db481-9b1d-4375-9788-afddb6459a4a
[Sat Apr  6 11:33:25 2024] XFS (nvme0n1p7): Ending clean mount
[Sat Apr  6 11:34:38 2024] BTRFS info (device nvme0n1p6): qgroup scan completed (inconsistency flag cleared)

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.