Graphical boot fails after updating TW

Hello,

I’m having issues with TW amd64 freezing on boot after upgrading it (via “zypper dup”) from 20190315 to 20190415. The machine has an AMD GPU (Radeon RX 560) and kernel 5.0.7-default after upgrade. When I remove the “quiet” option from the kernel boot line in GRUB, I see that it always freezes immediately after “[OK] Started Locale Service”, and at this point it’s completely frozen (cursor stops blinking, doesn’t accept keyboard input) so I can’t switch to another virtual terminal with Ctrl+Alt+F2 and see what’s going on. I am able to boot into a shell by adding “3” to the kernel boot line in GRUB. Using that, I was able to capture the failed boot output from ~/.local/share/xorg/Xorg.0.log (attached here: https://pastebin.com/myFK6cVp). After booting into a shell, I tried “startx”, which yielded the following output (I had to copy this by hand so bear with me here):


xauth: File /home/[user]/.serverauth.2691 does not exist
... [other xstart messages here]
xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
X: symbol lookup error: /usr/lib64/libEGL_mesa.so.0: Undefined symbol: gbm_format_get_name
xinit: giving up
xinit: unable to connect to X server: Connection refused

xinit also complained about “/usr/bin/Xorg is not setuid”, though I assume that’s a red herring.

Options I’ve tried that have failed so far:

  • Removing “amdgpu.dc=1”
  • plymouth.enable=0
  • nomodeset
  • Using the installed alternative 5.0.2-default kernel (this is also the same kernel I was running fine with 20190315 so I doubt it’s the problem)
  • Booting in recovery mode (doesn’t matter whether I use 5.0.7-default or 5.0.2-default)

For what it’s worth, I had this same issue after trying to upgrade once earlier this month (I think to 20190327?) and that time I just reverted back to 20190315 via tumbleweed-cli and it worked normally again; I was hoping I wouldn’t have to do that this time around. I’m using KDE, SDDM, and the open-source AMDGPU drivers.

You may eventually have to roll back again.

First, according to your log you are indeed running an xorg X server and not Wayland… any reason why?

Aside from that,
I find your posted xorg log file interesting, looks like the system checked down a number of possible video driver modules all the way down to VESA as the last try which all failed all due to missing (likely not built) modules. And, it’s been my personal experience that it’s pretty hard for a system not to run VESA. So my conclusion is that your display driver is thoroughly broken because something is fundamentally wrong that caused the display driver to not be compiled.

Recommend you roll back so that you can do some troubleshooting with full capabilities.
Re-identify your GPU to make sure you have correct information, then look for updated display drivers or just different drivers, and only then try upgrading again.

HTH,
TSU

Does it change anything to remove both instances of amdgpu.dc=1 from cmdline? Xorg.0.log shows 2.

Booting with nomodeset is only useful for performing rescue procedures, so the Xorg.0.log you pastebin’d reports next to nothing of any use. SDB:Nomodeset:Work_Around_Graphic_Upgrade&_Installation_Obstacles explains nomodeset usage. Another Xorg.0.log without having used nomodeset might provide useful clues.

Startx except as root doesn’t work in an unmodified configuration any more. That’s why you see the connection refused and setuid messages when trying.

If you know you have any optional repos enabled, please show output from:

zypper lr -d

Do you see any clues in journal or dmesg? e.g.:

dmesg | grep failed

or

journalctl | grep failed

for a cursory look.

Maybe it will correct itself upgrading to 20190417. :stuck_out_tongue:

It stil fails after “[OK] Started Locale Service”. For some reason all my Xorg..log files still contain “amdgpu.dc=1”, so I’m not sure whether it inserts amdgpu.dc=1 again without asking or whether the normal boot is so messed up that it can’t save a log. I wasn’t using nomodeset by default but I’ll try to get another boot log without nomodeset later today.

Startx except as root doesn’t work in an unmodified configuration any more. That’s why you see the connection refused and setuid messages when trying.

Just tried it with sudo and it still complains about gbm_format_get_name, the other startup messages are all the same as well.

If you know you have any optional repos enabled, please show output from:

zypper lr -d

https://pastebin.com/UfRAA2L4

Do you see any clues in journal or dmesg? e.g.:

dmesg | grep failed

or

journalctl | grep failed

for a cursory look.

Nothing came back from “dmesg | grep failed”, but here’s what I got from “journalctl | grep failed”: https://pastebin.com/xPG2SCB7

I haven’t tried Wayland on this system at all yet but perhaps I should, how would I configure it to switch to Wayland?

Re-identify your GPU to make sure you have correct information, then look for updated display drivers or just different drivers, and only then try upgrading again.

Forgive my ignorance, how would I go about doing this from the shell?

The xorg-repo is not refreshed…

So maybe do it and update:

zypper mr -f X11:Xorg
zypper dup

Because every possible graphics module failed because the file doesn’t exist,
I speculate that the problem can’t be fixed by anything you’d see in the journal which would more likely display errors related to faulty functionality.

When the modules don’t exist at all, and in particular a very generic VESA module,
I’d guess that the problem is in the module creation pre-boot, not anything after the system starts running.

That’s why I suggested looking closer at the GPU driver, if it’s not provided by the kernel then you need to look closely at what is currently installed and what else might be available.

Of course, my line of reasoning can be wrong, too but that’s my analysis.

TSU

Just did this along with a “zypper ref” to update to 20190418 and tried to graphical boot again, issue still exists. Running “sudo startx” from the shell also encounters the same issue as before with missing symbols.

I’m not entirely sure what I should be checking exactly, but here’s the output from a few commands I found online (all after updating to 20190418):


> find /dev -group video
/dev/fb0
/dev/dri/card0

> sudo lspci | grep -i vga
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 550 640SP / RX 560/560X] (rev cf)

> lsmod | grep -i "kms\|drm"
drm_kms_helper        204800  1 amdgpu
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
drm                   499712  5 gpu_sched,drm_kms_helper,amdgpu,ttm

> hwinfo --gfxcard

https://pastebin.com/mZjMZQ5b


> modinfo amdgpu

https://pastebin.com/eLmPyQrR

Update: I tried a graphical boot with my default boot options again (no “nomodeset”) after updating to 20190418 and it didn’t even create a new log in ~/.local/share/xorg; unless there’s another log somewhere else I should be checking, I guess the normal graphical boot is too messed up to save the Xorg log successfully?

The traditional location of Xorg.0.log is /var/log/. You may find it there and/or in ~/.local/share/xorg/, so you might have to check to see which is newest.

Why is X11 : Xorg configured and enabled? RX 560 (1002:67ff) isn’t so new you should have need of it in TW. It’s at least a year old. If this can’t be made to work without special cmdline options it’s probably past time to file a bug report. It should be sufficient to have xf86-video-ati uninstalled for the amdgpu to function satisfactorily. Do you have something in /etc/X11/xorg.conf.d/ fighting against it? You could try creating /etc/X11/xorg.conf.d/20-amd.conf containing

Section "Device"
    Identifier "DefaultDevice"
	Driver	"amdgpu"
EndSection

to force its use. You could also try forcing the modesetting driver, same file, except substitute modesetting for amdgpu as driver name.

You can also try this driver from AMD

https://www.amd.com/en/support/kb/faq/gpu-639

TSU

I just tried uninstalling xf86-video-ati via zypper (after updating to 20190419 first) then rebooting and it made no difference. Adding /etc/X11/xorg.conf.d/20-amd.conf to force it to “amdgpu” then rebooting again also made no difference, though it did get slightly farther in the boot process (though still ultimately ended up freezing mid-boot) when I forced it to “modesetting” instead; if I recall, that time around it made it up to “[OK] Starting Control Daemon” or something similar before completely freezing again.

I looked in /var/log and it keeps creating a new, empty (0-byte) Xorg.0.log each time I try to do a graphical boot, but I did find an earlier Xorg log in there from April 17, when I tried to reboot for the first time after my initial update to 20190415: https://pastebin.com/PCe8uhUA

Looking around online I’m not sure if AMDGPU-PRO is only intended to work with SLES, but I’ll try those install instructions and see where it leads. Failing that, where would I make this bug report?

bugzilla.opensuse.org

Bug Help and openSUSE:Bug reporting FAQ - openSUSE Wiki have instructions how to report. Be sure to include the URL of this thread.

Before you do, please confirm the problem isn’t your own configuration by trying to boot a live image.

I just tried the live image and it successfully booted a KDE session with the default boot options, here’s the Xorg.0.log: https://pastebin.com/LQAAEwxX

I guess this rules out a bug in the GPU drivers, though it’s a bit hard to say where to go next with this. Could I force-reinstall the existing GPU drivers?

I’d start by finding where amdgpu.dc=1 is coming from and get rid of it. It’s nowhere to be found in the new log.

Note that it’s running on the modesetting DDX.

I’m not sure why, but even after removing “amdgpu.dc=1” from the default boot options via YaST it’s still there in GRUB (along with “amdgpu.audio=1”). It seems that the full boot string is comprised of multiple parts configured in different places and YaST only lets me edit the last part, where else would the boot options typically be configured?

/etc/default/grub is the only place I know of, which AFAIK is what YaST2 uses. If there is any question in your mind about its content, paste it here inside code tags.

Another test to try: rename /etc/X11/xorg_pci_ids and see if it’s still in /proc/cmdline and Xorg.0.log’s kernel cmdline after rebooting.

I have a different problem with the same headline as OP.
I tried to update this morning and lost the graphical interface. The zypper dup was halted with an error on downloading the latest kernel. It said the file was not on the device. I tried all three options: abort, retry and ignore. It failed to update in each case.

I tried rebooting to see if that would help. Instead, the graphical reboot failed. I got an error message to the effect that there was no room on the device. Since this pc is dual boot, I booted to Windows. It booted normally and I checked the status of the disks. All had many gigs of available space. The kernel was around 35MB to download and expands to 500MB (or so) when installed. I never got to the install state, failing on retrieving. I have another problem that may have a relationship. I run BOINC and it is stalled because it says it needs 500MB to run an analysis and is 40-50 MB short. But, once again, there are gigs of room on these discs.

When I booted back to TW, I tried both Plasma and Plasma Wayland. Still, it failed and I had to press the power button to get control of the pc. I have now booted into TW using the Icewm. It works, but seems slow.

To recap: zypper dup stalled at the update of the kernel (the file was about 163 of 235 updates), Plasma won’t run, Icewm does run. Do I need to do a re-install of TW?

Uh-oh. I may have stumbled onto the problem. I ran df and saw that my sda6 is full. It contains /boot/grub2 /i386-pc and /boot/grub2/x86_64-efi and lots of /var files as well as /tmp. I don’t know how to copy the output of the terminal in icewm. The sda7 is /home and is only 9% used. Is there a way to “clean” that sda6? And will that help?

No luck unfortunately, but here is my /etc/default/grub after editing it to comment out a line that contained “amdgpu.dc=1 amdgpu.audio=1”: https://pastebin.com/REyaQ4Bh

Additionally, here is my last /var/log/Xorg.0.log after I edited /etc/default/grub, renamed /etc/X11/xorg_pci_ids and tried a graphical boot: https://pastebin.com/nCDnZRfz

That Xorg.0.log seems to have been cut short. Did you upload the whole file? It reports amdgpu.dc=1 absent from cmdline, but no EDID, display or input devices connected. I would change /etc/default/grub to contain GRUB_CMDLINE_LINUX="" instead of commenting it entirely away. Other than that I don’t know what to try except a live TW CD to see if the same problem occurs. Maybe try some other live media as well.