Tumbleweed won't boot after update

Hi there,

I was fine at 20210220 when I decided its time to update with the command “tumbleweed switch --install 20210320”.
During (the very long) update everything went fine. Time to reboot.
After the few initial text messages that are normal before the graphical mode kicks in, the screen went black.
CTRL-ALT_DEL and CTRL+ALT+F1 not worked. Reboot this time changing the boot command bty adding “3 nomodeset” at end and removing “quit” and “splash=silent”.
I added nomodeset because my GPU is a ****ing NVidia GTX-970 using proprietary driver.
But, this time looks like it is not the problem.
Booting in pure text mode this time, drop me in the emergency mode showing BTRFS related messages. Please see a screenshot https://imgur.com/a/QQMQieQ
Booting again in GRUB, selecting “bootloader from a RO snapshot” followed by “snapper rollback” restored the system as it was before.
I’ve tried 20210319 and 20210317 all with the same behaviour.
Currently I am on I was before the update: 5.10.16-1-default and KDE 5.20, tumbleweed 20210220. Everything is fine !!
So I suspect that BTRFS is not an issue really. 'Cause it is, why it not manifest itself on the current setup ?
I suspect it is the kernel 5.11 or btrfs itself that was updated in some previous update since 20210220.
Is it possible to update everything else expect the kernel ?
Doing that, if the problem is not present after such update, then it is a kernel 5.11 issue, am I right ?

Why don’t you do a

zypper dup

like most (all?) others?

Thank you for your curiosity,

Some reasons:
Because I can.
Because I am not like the (all) others.
Because tumbleweed-cli exist and linux is about choices.
Because I want to control when I will update my system.
Because tumbleweed-cli will run a zypper dist-upgrade anyway.
Because I am happy to managed to avoid all the hassle in opensuse updates related to big changes in the past weeks.

My turn to ask: Can you guarantee me that if I was not running tumblewed cli I would never had this problem?
Are you saying the reason I am having this issue is because I choose to use tumbleweed-cli ?

Sorry if my reply sounded rude, but it is not the first time (and not the last, I am sure about this) someone lost the focus on the main problem to question me about why I am running tumblewee-cli and how good and perfect is the QA and so I must stay with tumbleweed vanilla or move to leap.

1 Like
zypper al kernel-def*
zypper dup

I always have kernels locked. When I want a new kernel, I ask for one specifically, choosing override the lock, the actual result when “remove lock” is offered of a lock that includes any wildcard. It seems to me most NVidia driver users should be doing this, as much NVidia driver trouble as I see in forums and mailing lists.

Did you try booting an older installed kernel before snapping back, or was 5.10.16 already gone?

Old kernels are available from home:/tiwai.

Hi
Could be glibc related, normally when this comes through it is prudent to move to that snapshot…

I keep a number of old kernel but not felt the need to lock, can just set booting to a specific kernel (did that some months back with a qemu issue).

Always use the hard way with nvidia drivers, not let me down yet, easy enough to add any necessary patches to the run file. My qemu machines run nvidia, so can test an update there if needed prior to updating my main system…

So do what it tells you to do and provide rdsosreport.txt.

Thank you, add-lock is new to me. I will try. I go back here after.
Yes I tried to boot other kernel (5.10 is still there), but I got black screen because the nvidia driver for that version is not in place.
Now, I see I must add-lock to nvidia packages too.

I couldn’t ! The file is big. In emergency mode I had no way to transfer or copy it to another location.

When you boot another kernel I supposed the nvidia packages for that specific version must be present, am I right ?
I suppose if I lock a specific kernel version, automatically the nvidia packages will not be update too, it’s right ?

ok, things moved on.
Thank you so far. Locking the kernel worked and I could confirm it is a kernel problem. And it is 2 different problems after all.
The system is now fully updated.
I’ve uninstalled the nvidia drivers and switched to noveau to fix the first problem with the proprietary nvidia driver.
With the noveau driver updated I could even unlock the kernel and installed the 5.11.6-1-default
So, no nvidia to make things worse than necessary.

Booting on 5.11.6-1-default
I got the same screen saying there is a BTRFS problem and I was droped in emmergency mode shell.
Please, check the screenshot here: https://imgur.com/a/OUPC7kw

On grub menu Advanced option I select 5.10.16-1-default and boot:
I got the message:

 6.534238] nouveau 0000:01:00.0: bus: MMIO write of B88881f4 FAULT at 18eb14  IBUS ]

Welcome to openSUSE Tumbleweed 28218328 Kernel 5.10.16-1-default (tty).

enp3sB: 192.168.1.33 feBB: :6424:37f8:c78e:d276

kimera login: miguel
Password:
Last login: Sun Mar 28 08:23:31 on ttyl 
Have a lot of fun... 

miguel $

So, on 5.10 no BTRFS error ! No graphical mode either (X11)
So I think it is safe to assume there is a problem in 5.11 related to BTRFS, right ?
Where should I post/report this bug ?

The second problem is related to nvidia card both noveau and proprietary driver, both 5.10 and 5.11 kernels.

The strange is I get a graphical display in 2 stages: I got the graphical boot screen (the one with a spinning whell and infinite symbol Tumbleweed below) and the graphical logout/poweroff/reboot screen .

Going back to 20200220 snapshot, everything works as expected,

Did you check (after you had removed the proprietary NVIDIA driver) that nouveau is no longer blacklisted?

Regards

susejunky

NVidia eradication is apparently not a simple process. Was the following part of your uninstallation process?:

[zypper in -f libglvnd](https://forums.opensuse.org/showthread.php/551726-switching-from-nvidia-gt710-to-igpu-HD530-doesn-t-start-anymore?p=3015250#post3015250)

The BRTFS error was fixed.

I opened a issue in bugzilla and I got the following answer and the solution that fixed the problem.

**Goldwyn Rodrigues **2021-03-29 12:52:41 UTC
The filesystem is corrupt with respect to total_bytes of the device stored. It does not manifest in earlier kernels because the following patch was added later:

commit 3a160a933111241376799244e3587747af574b89
Author: Anand Jain <anand.jain@oracle.com>
Date: Tue Nov 3 13:49:42 2020 +0800

btrfs: drop never met disk total bytes check in verify_one_dev_extent

reply] −]](https://bugzilla.suse.com/show_bug.cgi?id=1184077#)Comment 4**Goldwyn Rodrigues **2021-03-29 13:26:11 UTC
Assuming the problem is on the root filesystem, could you try:

btrfs fi resize 685579636736 /

and check if it is able to boot in the newer kernel?

uhm…I am not sure the proper way to test if a package is blacklisted but “zypper search nouveau” shows “i+” on “S” column. I had seen blacklisted packages shows as a “l” (non-capital L) on that column…
The same for “zypper search nvidia”, so I suppose I do not have any blacklisted packages in my system, am I right ?

No ! It was not part of unistallation process !
Anyway:

**kimera:~ #** zypper search libglvnd
Loading repository data...
Reading installed packages...

S  | Name                 | Summary                                | Type
---+----------------------+----------------------------------------+--------
i+ | libglvnd             | The GL Vendor-Neutral Dispatch library | package
i+ | libglvnd-32bit       | The GL Vendor-Neutral Dispatch library | package
   | libglvnd-devel       | Development files for libglvnd         | package
   | libglvnd-devel-32bit | Development files for libglvnd         | package
**kimera:~ #**


All I did was disabling the nvidia repo on yast and proceed to the dist-update.
As far I understand, as there is a new kernel version, that new version will not get the nvidia driver, but the existing kernel/init ram disk will not be touched.
However I do not know how to remove the nvidia kernel from the current kernel/initiramdisk .

Could you provide instructions, please ?
I’ve found tons of posts about how to INSTALL NVidia driver, none how to UNinstall it and that mention libglvnd…

My post #12 provides a link to a very recent instance where zypper in -f libglvnd solved an incomplete NVidia eradication problem. It’s my incomplete understanding about NVidia driver installation, since I never install it on any of my own hardware, that the installation instructions include uninstallation instructions that must be followed in order to completely eradicate the installation, a part of which may include tainting of some standard library, at least one of which is provided by libglvnd.

As to blacklisting, my knowlege is also incomplete. All I know about blacklisting nouveau is that it is configured somewhere in /etc/mod*, which takes effect in initrds, thus requiring initrds be rebuilt as part of the process of eliminating NVidia’s proprietary drivers any time that their installation included blacklisting.

Hi
I don’t see how a standard distribution library that can be installed by any user at any time irrespective of hardware causes an issue. The likely case is failure to remove the nouveau blacklist file AND running mkinitrd to include back in. I may also pay to check dracut configuration as well to ensure nouveau is not in the omit drivers.

I think #67 in the other thread](https://forums.opensuse.org/showthread.php/551726-switching-from-nvidia-gt710-to-igpu-HD530-doesn-t-start-anymore?p=3015554#post3015554) makes it evident it can be, since force reinstalling libglvnd was OP’s ultimate solution. Possibly the only change from the forced re-installation was to (a) library symlink(s), leaving, as might NVidia driver installation, the original libglvnd libraries intact?

The likely case is failure to remove the nouveau blacklist file AND running mkinitrd to include back in. I may also pay to check dracut configuration as well to ensure nouveau is not in the omit drivers.
As ample evidence in forums and mailing list archives presents, no doubt these are parts of of NVidia taint eradication often stumbled over.

To close this thread, it was told me to stick with the nvidia driver and re-install it (https://bugzilla.opensuse.org/show_bug.cgi?id=1182666)
Re-enabling the proprietary nvidia driver and blacklisting nouveau I got a working system again.

As far I understand, there was a problem both with nouveau and nvidia in the recent updates, and the nvidia problem was already fixed.

Unfortunately there was another problem related with pam-unix ( https://bugzilla.opensuse.org/show_bug.cgi?id=1184314 ) than I managed to fix after a few days.

thank you very much.