AMD RX 570 and kernel 5.3.18-lp152.41-default

On updating to kernel 5.3.18-lp152.41-default (from kernel 5.3.18-lp152.36-default) I experienced a `freeze’ during system boot-up. Using debug mode, the last message was:

Fb0: switching to amdgpudrmfb from EFI VGA

Something seems to be going from when switching graphics mode. Adding

nomodeset

to the kernel option allows the boot to go to completion, at the cost of not having an accelerated graphics driver!

The problem is perhaps the same as described on https://forum.level1techs.com/t/fb0-switching-to-amdgpudrmfb-from-efi-vga/152112 , some sort of problem with MMU and ATS?

I am currently booting on the previous kernel version (5.3.18-lp152.36-default) without any problem (except security…)

My hardware:
Asrock B550M
AMD Ryzen 5 3600
AMD RX 570 graphics card (MSI) - amdgpu-pro driver

Suggestions:

(1) Edit (as root) "/etc/zypp/zypp.conf’
Find the line:

multiversion.kernels = latest,latest-1,running

and change that to

multiversion.kernels = oldest,latest,latest-1,running

This is to make sure that your working kernel does not disappear before the problem is fixed.

(2) Continue booting to the kernel that works (5.3.18-lp152.36-default).

(3) Open a bug report:
https://en.opensuse.org/openSUSE:Submitting_bug_reports

The kernel team are usually pretty good at tracking down these problems.

I have had this type of problem also with both kernel updates to date of leap 15.2. Kernel updates break the proprietary amdgpu-pro drivers for some reason -
it appears the amdgpu kernel module is not rebuilt with a kernel update. The work-around is simple, basically uninstall and reinstall the amdgpu-pro
drivers as follows:

  1. Make sure you have the amdgpu-pro archive for your video card and unpack it to the directory of your choice.
  2. Open a terminal and run amdgpu-uninstall (enter root password when asked).
  3. From the directory where you unpacked amdgpu-pro, run either amdgpu-install or amdgpu-pro-install as appropriate for you driver choice.

Your drivers are now restored. Your Plymouth boot screen may be broken. If so, make a backup copy of initrd then run sudo mkinitrd.

Done.

The amdgpu-pro install doc is here if you need it.

enjoy…

Oops! There is an error in step 2, it should be:

  1. Open a terminal and run amdgpu-uninstall if installed with amdgpu-install or run amdgpu-pro-uninstall if installed with amdgpu-pro-install (enter root password when asked).

You may consider using a newer kernel or a switch to Tumbleweed. Upgrading to the latest version fix a freeze: https://forums.opensuse.org/showthread.php/544219-Amdgpu-Trouble

What version of AMDGPU All-Open and AMDGPU-Pro Driver you are using?

Version 20.30+ is needed for Leap 15.2.

Being inexperienced regarding amdgpu I wonder what is the benefit of using amdgpu-pro:

“AMD provides a proprietary, binary userland driver called AMDGPU PRO, which works on top of the open-source AMDGPU kernel driver. From Radeon Software 18.50 vs Mesa 19 benchmarks article: When it comes to OpenGL games, the RadeonSI Gallium3D driver simply dominates the proprietary AMD OpenGL driver.”

https://wiki.archlinux.org/index.php/AMDGPU#AMDGPU_PRO

localhost:~ # journalctl -b --grep amdgpu
-- Logs begin at Mon 2020-07-13 17:21:20 CEST, end at Wed 2020-09-16 08:30:47 CEST. --
Sep 16 08:18:56 localhost kernel: [drm] amdgpu kernel modesetting enabled.
Sep 16 08:18:56 localhost kernel: amdgpu: Topology: Add APU node [0x0:0x0]
Sep 16 08:18:56 localhost kernel: fb0: switching to amdgpudrmfb from EFI VGA
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: vgaarb: deactivate vga console
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: enabling device (0006 -> 0007)
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Sep 16 08:18:56 localhost kernel: amdgpu: ATOM BIOS: 113-PICASSO-115
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Sep 16 08:18:56 localhost kernel: [drm] amdgpu: 2048M of VRAM memory ready
Sep 16 08:18:56 localhost kernel: [drm] amdgpu: 3072M of GTT memory ready.
Sep 16 08:18:56 localhost kernel: amdgpu: hwmgr_sw_init smu backed is smu10_smu
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Sep 16 08:18:56 localhost kernel: amdgpu: Topology: Add APU node [0x15d8:0x1002]
Sep 16 08:18:56 localhost kernel: amdgpu 0000:06:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 11
Sep 16 08:18:56 localhost kernel: fbcon: amdgpudrmfb (fb0) is primary device
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: fb0: amdgpudrmfb frame buffer device
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Sep 16 08:18:57 localhost kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Sep 16 08:18:57 localhost kernel: [drm] Initialized amdgpu 3.38.0 20150101 for 0000:06:00.0 on minor 0
Sep 16 08:18:58 localhost kernel: snd_hda_intel 0000:06:00.1: bound 0000:06:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
localhost:~ # 

AMDGPU-Pro is needed for OpenCL.
You may install Mesa 3D + OpenCL part from AMDGPU-Pro.
For AMD RDNA architecture AMDGPU-Pro is the only available driver for OpenCL using.
With ACO driver Mesa 3D may outperform AMDGPU-Pro for Vulkan: https://www.phoronix.com/scan.php?page=article&item=mesa-201aco-amd&num=1

To OP: https://forums.opensuse.org/showthread.php/544534-Since-5-3-18-lp152-41-Unable-to-boot-fb0-switchting-to-amdgpudrmfb-from-EFI-VGA?p=2965192#post2965192

You are talking about https://9to5linux.com/amd-radeon-software-for-linux-20-30-released-with-support-for-ubuntu-20-04-1-lts ? From that and https://en.wikipedia.org/wiki/Mesa_(computer_graphics) I conclude that I need not be concerned as long as AMDGPU All-Open works well.

Thanks, I took this advice!

I also found kernel 5.3.18-lp152.44-default didn’t work.

I created a bugzilla report https://bugzilla.opensuse.org/show_bug.cgi?id=1177256

My version of amdgpu-pro is 20.30+

amdgpu-pro-20.30-1109584.x86_64
amdgpu-pro-core-20.30-1109584.noarch

As mentioned, I need the pro drivers because I want to use openCL drivers.
In my case, I want to use DaVinci Resolve. Which works very well with the openCL drivers on openSuSE!

Better than on a Windows machine with a (very old laptop) NVidia chipset.

David

Although I submitted this to the bugzilla, I’ll reproduce the most likely relevant section of the /var/log/messages here…

2020-10-03T16:11:09.966215+10:00 localhost kernel:     3.851924] mc: Failed to load firmware "amdgpu/polaris10_mc.bin"
2020-10-03T16:11:09.966216+10:00 localhost kernel:     3.851963] [drm:gmc_v8_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
2020-10-03T16:11:09.966217+10:00 localhost kernel:     3.851998] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
2020-10-03T16:11:09.966217+10:00 localhost kernel:     3.852000] amdgpu 0000:0c:00.0: amdgpu_device_ip_init failed
2020-10-03T16:11:09.966218+10:00 localhost kernel:     3.852002] amdgpu 0000:0c:00.0: Fatal error during GPU init
2020-10-03T16:11:09.966218+10:00 localhost kernel:     3.852003] [drm] amdgpu: finishing device.

Using ‘locate’ to find polaris10_mc.bin results in:

~> locate polaris10_mc.bin
/lib/firmware/5.3.18-lp152.36-default/amdgpu/polaris10_mc.bin
/lib/firmware/amdgpu/polaris10_mc.bin
/usr/src/amdgpu-5.6.5.24-1109584/firmware/amdgpu/polaris10_mc.bin



This bug seems to be a duplicate of

https://bugzilla.opensuse.org/show_bug.cgi?id=1077848
https://bugzilla.redhat.com/show_bug.cgi?id=1716138

and also mentioned in

https://forums.opensuse.org/showthread.php/529291-AMDGPU-broken-after-Kernel-Upgrade

in /etc/dracut.conf.d/amdgpu.conf was:

add_drivers+=" amdgpu "
fw_dir+="/lib/firmware/5.3.18-lp152.36-default"
  • Removing /etc/dracut.conf.d/amdgpu.conf and then
  • executing ‘mkinitrd’

appears to fix the problem.

Radeon™ Software for Linux® 20.40 is available.

Also yoy may use AMD ROCm to get OpenCL with RX 570.

Thanks! As mentioned in the bugzilla, version 20.40 fixes the problem!

I see various reports about whether ROCm OpenCL works with DaVinci Resolve.
It’ll be a future option to explore if amgdpu-pro doesn’t work!