Hi guys! Since last December I can’t upgrade my Tumbleweed due to freezes like in this image. Maybe something related to the BTRFS driver, I guess, since if I roll back to a snapshot from the end of November, everything works fine. In the last 3 months I have tried to upgrade 3 times but the same issue occurs. I can boot the system and use it for some time, but it suddenly freezes or reboot a few minutes or hour later.
This 4th time I did a manual snapshot, upgrade to 20240328 and the same symptom occurs.
Looks like you’ve got a storage device failure in process.
If you boot it from the snapshot that works, run:
dmesg
And see what the output is. My guess is that you’ll see that there are messages indicating read errors, write errors, or general I/O errors to the device nvme0n1p6.
There’s no failure in my device since it works pretty well in snapshot 20231106 (where I am right now). There’s no error in smartctl nor dmesg. This is why I believe it is some driver update issue.
The screen shows “writeback errors”. Can you post log contents from dmesg so we can see for ourselves, along with the smartctl output?
Just because it works in a particular snapshot doesn’t mean there isn’t something going on, and the photo you posted clearly shows at least btrfs errors.
You may need to run a filesystem check if the hardware is clear of errors.
There are no indications of hardware error. All those messages are btrfs related, not hardware related. There was at least one similar report, unfortunately without any follow-up:
These sorts of btrfs errors have thrown me off course too, thinking it was hardware related.
The issue was with Btrfs RAID1C3 feature, I have detailed them in this thread.
All I can say is if you’re certain the hardware is alright, then disable btrfs features one by one until it works. Easier said than done though
Thank you all guys for your replies! Since these last days I did some tests and I noticed:
If I boot the snapshot 336 of 2024-03-28 and no rollback the system, the system runs without any problems and no error logs was given for hours (see log below)
I decide to rollback to 336, and the issue got back in a way I can’t even capture the log. See the picture taken.