amd driver not loaded during startup

Hi,

amd gpu driver won’t load at boot time - error is:

    3.054338] [drm] amdgpu kernel modesetting enabled.
    3.109109] amdgpu 0000:01:00.0: Direct firmware load for amdgpu/topaz_mc.bin failed with error -2
    3.109110] cik_mc: Failed to load firmware "amdgpu/topaz_mc.bin"
    3.109161] [drm:gmc_v7_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
    3.109179] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <gmc_v7_0> failed -2
    3.109180] amdgpu 0000:01:00.0: amdgpu_init failed
    3.109181] amdgpu 0000:01:00.0: Fatal error during GPU init
    3.109183] [drm] amdgpu: finishing device.
    3.252124] amdgpu: probe of 0000:01:00.0 failed with error -2

however if after boot finished I do

rmmod amdgpu && insmod /lib/modules/4.14.11-2.gc36893f-default/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko

driver is loaded successfully:

 2474.117342] [drm] amdgpu kernel modesetting enabled.
 2474.117378] vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
 2474.117661] ATPX version 1, functions 0x00000033
 2474.117866] ATPX Hybrid Graphics
 2474.118424] [drm] initializing kernel modesetting (TOPAZ 0x1002:0x6900 0x1028:0x0767 0xC3).
 2474.118569] [drm] register mmio base: 0xD0200000
 2474.118571] [drm] register mmio size: 262144
 2474.118597] [drm] probing gen 2 caps for device 8086:9d10 = 1724843/e
 2474.118599] [drm] probing mlw for device 8086:9d10 = 1724843
 2474.118629] vga_switcheroo: enabled
 2474.150152] ATOM BIOS: BR46858.006
 2474.150165] [drm] GPU post is not needed
 2474.150166] [drm] Changing default dispclk from 0Mhz to 600Mhz
 2474.150273] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 4-bit
 2474.151245] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
 2474.151246] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
 2474.151253] [drm] Detected VRAM RAM=4096M, BAR=256M
 2474.151254] [drm] RAM width 64bits GDDR5
 2474.151289] [TTM] Zone  kernel: Available graphics memory: 8122176 kiB
 2474.151290] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
 2474.151290] [TTM] Initializing pool allocator
 2474.151292] [TTM] Initializing DMA pool allocator
 2474.151351] [drm] amdgpu: 4096M of VRAM memory ready
 2474.151352] [drm] amdgpu: 4096M of GTT memory ready.
 2474.151360] [drm] GART: num cpu pages 65536, num gpu pages 65536
 2474.152053] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
 2474.152083] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 2474.152084] [drm] Driver supports precise vblank timestamp query.
 2474.152124] amdgpu 0000:01:00.0: amdgpu: using MSI.
 2474.152138] [drm] amdgpu: irq initialized.
 2474.395735] amdgpu: [powerplay] amdgpu: powerplay sw initialized
 2474.399108] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000000400080, cpu addr 0xffffaf3e82135080
 2474.399342] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000000400100, cpu addr 0xffffaf3e82135100
 2474.399572] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000000400180, cpu addr 0xffffaf3e82135180
 2474.399778] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000000400200, cpu addr 0xffffaf3e82135200
 2474.399933] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000000400280, cpu addr 0xffffaf3e82135280
 2474.400119] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000400300, cpu addr 0xffffaf3e82135300
 2474.400235] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000000400380, cpu addr 0xffffaf3e82135380
 2474.400327] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000000400400, cpu addr 0xffffaf3e82135400
 2474.400490] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000000400480, cpu addr 0xffffaf3e82135480
 2474.400548] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x0000000000400520, cpu addr 0xffffaf3e82135520
 2474.402510] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000000004005a0, cpu addr 0xffffaf3e821355a0
 2474.402670] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x0000000000400620, cpu addr 0xffffaf3e82135620
 2474.421145] amdgpu: [powerplay] can't get the mac of 5
 2474.426791] [drm] ring test on 0 succeeded in 18 usecs
 2474.427673] [drm] ring test on 9 succeeded in 8 usecs
 2474.427681] [drm] ring test on 1 succeeded in 2 usecs
 2474.427736] [drm] ring test on 2 succeeded in 22 usecs
 2474.427761] [drm] ring test on 3 succeeded in 10 usecs
 2474.427790] [drm] ring test on 4 succeeded in 11 usecs
 2474.427818] [drm] ring test on 5 succeeded in 11 usecs
 2474.427846] [drm] ring test on 6 succeeded in 11 usecs
 2474.427873] [drm] ring test on 7 succeeded in 10 usecs
 2474.427901] [drm] ring test on 8 succeeded in 11 usecs
 2474.427947] [drm] ring test on 10 succeeded in 7 usecs
 2474.427954] [drm] ring test on 11 succeeded in 6 usecs
 2474.428167] [drm] ib test on ring 0 succeeded
 2474.428327] [drm] ib test on ring 1 succeeded
 2474.428452] [drm] ib test on ring 2 succeeded
 2474.428487] [drm] ib test on ring 3 succeeded
 2474.428533] [drm] ib test on ring 4 succeeded
 2474.428579] [drm] ib test on ring 5 succeeded
 2474.428626] [drm] ib test on ring 6 succeeded
 2474.428669] [drm] ib test on ring 7 succeeded
 2474.428715] [drm] ib test on ring 8 succeeded
 2474.428736] [drm] ib test on ring 9 succeeded
 2474.428759] [drm] ib test on ring 10 succeeded
 2474.428780] [drm] ib test on ring 11 succeeded
 2474.430380] amdgpu 0000:01:00.0: kfd not supported on this ASIC
 2474.430383] [drm] Initialized amdgpu 3.19.0 20150101 for 0000:01:00.0 on minor 0
 2479.826611] amdgpu: [powerplay] VI should always have 2 performance levels
 2479.879729] amdgpu 0000:01:00.0: GPU pci config reset

I am running Kernel 4.14.11-2.gc36893f-default

Any ideas how to make amdgpu load at boot time?

I suppose the firmware is missing in the initrd (on boot the driver is probably loaded before the root filesystem is mounted).

Try running “sudo mkinitrd” and see if it helps.

Or maybe try to disable the boot splash with the “plymouth.enable=0” boot option, the graphics driver (amdgpu) should be loaded later then when hopefully / is already mounted/available.

Plymouth does not load anything - driver is loaded by udev in response to hardware enumeration. It is possible that not including plymouth in initrd will also skip adding GPU drivers, but once driver is there, it is too late.

thanks for the hint with mkinitrd - it is throwing errors:

dracut: *** Including module: drm ***
dracut: Possible missing firmware "i915/bxt_dmc_ver1_07.bin" for kernel module "i915.ko"
dracut: Possible missing firmware "i915/skl_dmc_ver1_26.bin" for kernel module "i915.ko"
dracut: Possible missing firmware "i915/kbl_dmc_ver1_01.bin" for kernel module "i915.ko"
dracut: Possible missing firmware "i915/kbl_guc_ver9_14.bin" for kernel module "i915.ko"
dracut: Possible missing firmware "i915/bxt_guc_ver8_7.bin" for kernel module "i915.ko"
dracut: Possible missing firmware "i915/skl_guc_ver6_1.bin" for kernel module "i915.ko"
dracut: Possible missing firmware "amdgpu/polaris11_smc_sk.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_smc_sk.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_k_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_k_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_smc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_mc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "radeon/hawaii_mc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "radeon/bonaire_mc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_mc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_mc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_mc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_mec2.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_mec2.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_mec2.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_mec2.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_rlc.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_mec2.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_mec.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_me.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_pfp.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_ce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_sdma1.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/topaz_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_sdma1.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_sdma1.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_sdma1.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_sdma1.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_sdma1.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_sdma.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_uvd.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_uvd.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_uvd.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_uvd.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_uvd.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_uvd.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris11_vce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/polaris10_vce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/stoney_vce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/fiji_vce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/carrizo_vce.bin" for kernel module "amdgpu.ko"
dracut: Possible missing firmware "amdgpu/tonga_vce.bin" for kernel module "amdgpu.ko"

I checked /var/log/YaST2/mkinitrd.log and discovered that this error is happening since Kernel 4.14.8-1.g674981b-default.

mkinitrd creates 3 initrds (all of them complaining about missing firmware)

Creating initrd: /boot/initrd-4.14.11-1.g58fec0f-default
/boot/initrd-4.14.11-2.gc36893f-default
/boot/initrd-4.4.103-36-default

I removed the 2 older versions (via Yast) - still the same error from initrd.

The files obviously exist in /lib/firmware (e.g. the one I need):

-rw-r--r-- 1 root root 32100 Jan  4 16:06 /lib/firmware/amdgpu/topaz_mc.bin

I found the following files in /etc/dracut.conf.d/

-rw-r--r-- 1 root root 100 Jan  1 22:44 amdgpu-4.14.0-rc4-1.g879f297-default.conf
-rw-r--r-- 1 root root  96 Dec 20 23:22 amdgpu-4.14.8-1.g674981b-default.conf
-rw-r--r-- 1 root root 100 Oct 13 17:57 amdgpu-pro-4.14.0-rc4-1.g879f297-default.conf


removed them now mkinitrd builds without errors.

after reboot amd driver loads at boot time:

    3.357611] [drm] amdgpu kernel modesetting enabled.
    3.370320] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
    3.370323] AMD IOMMUv2 functionality not available on this system
    3.426224] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
    3.426225] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
    3.430326] [drm] amdgpu: 4096M of VRAM memory ready
    3.430327] [drm] amdgpu: 4096M of GTT memory ready.
    3.431273] amdgpu 0000:01:00.0: amdgpu: using MSI.
    3.431290] [drm] amdgpu: irq initialized.
    3.675876] amdgpu: [powerplay] amdgpu: powerplay sw initialized
    3.676101] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000000400080, cpu addr 0xffffbf40c2009080
    3.676139] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000000400100, cpu addr 0xffffbf40c2009100
    3.676207] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000000400180, cpu addr 0xffffbf40c2009180
    3.676238] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000000400200, cpu addr 0xffffbf40c2009200
    3.676258] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000000400280, cpu addr 0xffffbf40c2009280
    3.676279] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000400300, cpu addr 0xffffbf40c2009300
    3.676298] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000000400380, cpu addr 0xffffbf40c2009380
    3.676319] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000000400400, cpu addr 0xffffbf40c2009400
    3.676341] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000000400480, cpu addr 0xffffbf40c2009480
    3.676358] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x0000000000400520, cpu addr 0xffffbf40c2009520
    3.676698] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x00000000004005a0, cpu addr 0xffffbf40c20095a0
    3.676726] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x0000000000400620, cpu addr 0xffffbf40c2009620
    3.688914] amdgpu: [powerplay] can't get the mac of 5
    3.698751] amdgpu 0000:01:00.0: kfd not supported on this ASIC
    3.698755] [drm] Initialized amdgpu 3.19.0 20150101 for 0000:01:00.0 on minor 1
   10.976204] amdgpu: [powerplay] VI should always have 2 performance levels
   11.021611] amdgpu 0000:01:00.0: GPU pci config reset
   37.261153] amdgpu: [powerplay] can't get the mac of 5
   43.815630] amdgpu: [powerplay] VI should always have 2 performance levels
   43.878213] amdgpu 0000:01:00.0: GPU pci config reset