QEMU/KVM GPU Passthrough: Attaching PCI host devices causes guest to not boot

(Note: The forum isn’t letting me hide my logs with spoilers, or post images, to keep things neat, so forgive me if this post sprawls a bit. I didn’t want it to be formatted that way.)

Problem
The virtual machine works just fine without my PCI host devices added, but when I add them, I can see this error when using Ramfb for video (the other options don’t give me any output at all):

BdsDxe: failed to load Boot003 “Windows Boot Manager” from HD(2,GPT,BC43C180-BD6D-4F6F-B7C1-09D1734A9AD9,0xFA000,0x31800)/\EFI\Microsoft\Boot\bootmgfw.efi: Not found
BdsDxe: No bootable option or device was found.
BdsDxe: Press any key to enter the Boot Manager Menu.

Pressing keys does nothing and the CPU usage of the VM goes to 0%. Trying to boot into a disc also doesn’t work and gives the same error.

Hardware Specs

  • OpenSUSE Leap 15.5
  • MSI TRX40 Pro WiFi
  • AMD Threadripper 3970X
  • 128 GB of RAM
  • 3x Ampere-gen Nvidia RTX A4000’s (the one I’m trying to pass through in particular is a Dell, if that’s important, and it’s not the primary GPU)

VM Configuration
Using QEMU/KVM with Virt Manager.

<domain type="kvm">
  <name>win10-test</name>
  <uuid>9a49a45c-5e74-4d17-b1a7-47370768fef7</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">33554432</memory>
  <currentMemory unit="KiB">33554432</currentMemory>
  <vcpu placement="static">8</vcpu>
  <os firmware="efi">
    <type arch="x86_64" machine="pc-q35-7.1">hvm</type>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode="custom">
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on">
    <topology sockets="1" dies="1" cores="8" threads="1"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" discard="unmap"/>
      <source file="/home/stdlogin/Documents/vmstuff/qemu-vms/win10-test.qcow2"/>
      <target dev="sda" bus="sata"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <disk type="block" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source dev="/dev/sr0"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0x18"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0x19"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/>
    </controller>
    <controller type="pci" index="11" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="11" port="0x1a"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x2"/>
    </controller>
    <controller type="pci" index="12" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="12" port="0x1b"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x3"/>
    </controller>
    <controller type="pci" index="13" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="13" port="0x1c"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x4"/>
    </controller>
    <controller type="pci" index="14" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="14" port="0x1d"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x5"/>
    </controller>
    <controller type="pci" index="15" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="15" port="0x1e"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x6"/>
    </controller>
    <controller type="pci" index="16" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:a5:92:c4"/>
      <source network="default"/>
      <model type="e1000e"/>
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </interface>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <channel type="spicevmc">
      <target type="virtio" name="com.redhat.spice.0"/>
      <address type="virtio-serial" controller="0" bus="0" port="1"/>
    </channel>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
      <image compression="off"/>
    </graphics>
    <sound model="ac97">
      <address type="pci" domain="0x0000" bus="0x10" slot="0x01" function="0x0"/>
    </sound>
    <audio id="1" type="spice"/>
    <video>
      <model type="ramfb" heads="1" primary="yes"/>
    </video>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
      </source>
      <rom file="/home/stdlogin/Documents/vmstuff/vbios/dell-rtx-a4000.bin"/>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x02" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
    </hostdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="1"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="2"/>
    </redirdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </memballoon>
  </devices>
</domain>

Steps taken

  • Verified that IOMMU was working
    sudo dmesg | grep -i -e DMAR -e IOMMU:
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.14.21-150500.55.83-default root=UUID=77ea720f-272e-430d-9a1a-51518361e51f splash=silent preempt=full mitigations=auto quiet security=apparmor iommu=pt amd_iommu=on rd.driver.pre=vfio-pci
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.14.21-150500.55.83-default root=UUID=77ea720f-272e-430d-9a1a-51518361e51f splash=silent preempt=full mitigations=auto quiet security=apparmor iommu=pt amd_iommu=on rd.driver.pre=vfio-pci
[    0.593746] iommu: Default domain type: Passthrough (set via kernel command line)
[    0.633089] pci 0000:60:00.2: AMD-Vi: IOMMU performance counters supported
[    0.633119] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    0.633145] pci 0000:20:00.2: AMD-Vi: IOMMU performance counters supported
[    0.633164] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.633226] pci 0000:00:01.0: Adding to iommu group 0
[    0.633255] pci 0000:00:01.1: Adding to iommu group 1
[    0.633281] pci 0000:00:01.3: Adding to iommu group 2
[    0.633321] pci 0000:00:02.0: Adding to iommu group 3
[    0.633362] pci 0000:00:03.0: Adding to iommu group 4
[    0.633404] pci 0000:00:04.0: Adding to iommu group 5
[    0.633445] pci 0000:00:05.0: Adding to iommu group 6
[    0.633486] pci 0000:00:07.0: Adding to iommu group 7
[    0.633513] pci 0000:00:07.1: Adding to iommu group 8
[    0.633556] pci 0000:00:08.0: Adding to iommu group 9
[    0.633582] pci 0000:00:08.1: Adding to iommu group 10
[    0.633640] pci 0000:00:14.0: Adding to iommu group 11
[    0.633665] pci 0000:00:14.3: Adding to iommu group 11
[    0.633825] pci 0000:00:18.0: Adding to iommu group 12
[    0.633850] pci 0000:00:18.1: Adding to iommu group 12
[    0.633876] pci 0000:00:18.2: Adding to iommu group 12
[    0.633901] pci 0000:00:18.3: Adding to iommu group 12
[    0.633926] pci 0000:00:18.4: Adding to iommu group 12
[    0.633951] pci 0000:00:18.5: Adding to iommu group 12
[    0.633976] pci 0000:00:18.6: Adding to iommu group 12
[    0.634001] pci 0000:00:18.7: Adding to iommu group 12
[    0.634027] pci 0000:01:00.0: Adding to iommu group 13
[    0.634085] pci 0000:02:00.0: Adding to iommu group 14
[    0.634112] pci 0000:02:00.1: Adding to iommu group 14
[    0.634138] pci 0000:03:00.0: Adding to iommu group 15
[    0.634168] pci 0000:04:00.0: Adding to iommu group 16
[    0.634195] pci 0000:04:00.3: Adding to iommu group 17
[    0.634236] pci 0000:20:01.0: Adding to iommu group 18
[    0.634277] pci 0000:20:02.0: Adding to iommu group 19
[    0.634320] pci 0000:20:03.0: Adding to iommu group 20
[    0.634345] pci 0000:20:03.1: Adding to iommu group 21
[    0.634386] pci 0000:20:04.0: Adding to iommu group 22
[    0.634429] pci 0000:20:05.0: Adding to iommu group 23
[    0.634471] pci 0000:20:07.0: Adding to iommu group 24
[    0.634497] pci 0000:20:07.1: Adding to iommu group 25
[    0.634540] pci 0000:20:08.0: Adding to iommu group 26
[    0.634565] pci 0000:20:08.1: Adding to iommu group 27
[    0.634624] pci 0000:21:00.0: Adding to iommu group 28
[    0.634733] pci 0000:21:00.1: Adding to iommu group 28
[    0.634758] pci 0000:22:00.0: Adding to iommu group 29
[    0.634785] pci 0000:23:00.0: Adding to iommu group 30
[    0.634812] pci 0000:23:00.1: Adding to iommu group 31
[    0.634839] pci 0000:23:00.3: Adding to iommu group 32
[    0.634865] pci 0000:23:00.4: Adding to iommu group 33
[    0.634907] pci 0000:40:01.0: Adding to iommu group 34
[    0.634933] pci 0000:40:01.1: Adding to iommu group 35
[    0.634975] pci 0000:40:02.0: Adding to iommu group 36
[    0.635016] pci 0000:40:03.0: Adding to iommu group 37
[    0.635042] pci 0000:40:03.1: Adding to iommu group 38
[    0.635083] pci 0000:40:04.0: Adding to iommu group 39
[    0.635125] pci 0000:40:05.0: Adding to iommu group 40
[    0.635166] pci 0000:40:07.0: Adding to iommu group 41
[    0.635191] pci 0000:40:07.1: Adding to iommu group 42
[    0.635233] pci 0000:40:08.0: Adding to iommu group 43
[    0.635259] pci 0000:40:08.1: Adding to iommu group 44
[    0.635285] pci 0000:41:00.0: Adding to iommu group 45
[    0.635334] pci 0000:42:02.0: Adding to iommu group 46
[    0.635383] pci 0000:42:04.0: Adding to iommu group 47
[    0.635432] pci 0000:42:05.0: Adding to iommu group 48
[    0.635480] pci 0000:42:06.0: Adding to iommu group 49
[    0.635522] pci 0000:42:08.0: Adding to iommu group 50
[    0.635564] pci 0000:42:09.0: Adding to iommu group 51
[    0.635607] pci 0000:42:0a.0: Adding to iommu group 52
[    0.635659] pci 0000:43:00.0: Adding to iommu group 53
[    0.635707] pci 0000:44:00.0: Adding to iommu group 54
[    0.635781] pci 0000:45:00.0: Adding to iommu group 55
[    0.635830] pci 0000:46:00.0: Adding to iommu group 56
[    0.635836] pci 0000:47:00.0: Adding to iommu group 50
[    0.635841] pci 0000:47:00.1: Adding to iommu group 50
[    0.635847] pci 0000:47:00.3: Adding to iommu group 50
[    0.635853] pci 0000:48:00.0: Adding to iommu group 51
[    0.635858] pci 0000:49:00.0: Adding to iommu group 52
[    0.635916] pci 0000:4a:00.0: Adding to iommu group 57
[    0.635954] pci 0000:4a:00.1: Adding to iommu group 57
[    0.635980] pci 0000:4b:00.0: Adding to iommu group 58
[    0.636005] pci 0000:4c:00.0: Adding to iommu group 59
[    0.636047] pci 0000:60:01.0: Adding to iommu group 60
[    0.636089] pci 0000:60:02.0: Adding to iommu group 61
[    0.636129] pci 0000:60:03.0: Adding to iommu group 62
[    0.636170] pci 0000:60:04.0: Adding to iommu group 63
[    0.636211] pci 0000:60:05.0: Adding to iommu group 64
[    0.636255] pci 0000:60:07.0: Adding to iommu group 65
[    0.636280] pci 0000:60:07.1: Adding to iommu group 66
[    0.636322] pci 0000:60:08.0: Adding to iommu group 67
[    0.636347] pci 0000:60:08.1: Adding to iommu group 68
[    0.636373] pci 0000:61:00.0: Adding to iommu group 69
[    0.636398] pci 0000:62:00.0: Adding to iommu group 70
[    0.637128] pci 0000:60:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.637140] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.637150] pci 0000:20:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.637159] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.639386] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.639408] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    0.639430] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    0.639452] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
  • Ensured that VFIO was loaded before Nvidia drivers
    /etc/modprobe.d/vfio.conf:
    softdep nvidia pre: vfio-pci
    /etc/modprobe.d/nvidia-default.conf:
blacklist nouveau
options nvidia-drm modeset=1
  • Disabled Model-Specific Register for windows guests
    /etc/modprobe.d/kvm.conf:
options kvm ignore_msrs=1
options kvm report_ignored_msrs=0
  • Enabled VFIO and KVM kernel modules
    /etc/dracut.conf.d/vfio-gpu-passthru.conf:
    add_drivers+=" vfio vfio_iommu_type1 vfio_pci vfio_virqfd kvm kvm_amd "
  • Made and installed a custom Dracut module that isolates a specific desired GPU to VFIO by PCI bus address, and no other GPU’s (this ensures others aren’t passed through accidentally, since they’re all the same kind of GPU.)
    /usr/lib/dracut/modules.d/40vfio-gpu-passthru/module-setup.sh:
#!/bin/bash

check() {
	return 0
}

install() {
	inst_hook cmdline 20 "$moddir/vfio-pci-override.sh"
}

/usr/lib/dracut/modules.d/40vfio-gpu-passthru/vfio-pci-override.sh:

# this script was found on Arch Wiki, I didn't make it myself.
#!/bin/sh

DEVS="0000:02:00.0"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
	for DEV in $DEVS; do
		for IOMMUDEV in $(ls /sys/bus/pci/devices/$DEV/iommu_group/devices) ; do
			echo "vfio-pci" > /sys/bus/pci/devices/$IOMMUDEV/driver_override
		done
	done
fi

# disabled the modprobe here because vfio-pci should be enabled somewhere else on boot
# modprobe -i vfio-pci
  • verified that my card was taken by VFIO
    sudo lspci -v -s “0000:02:00.0”:
02:00.0 VGA compatible controller: NVIDIA Corporation GA104GL [RTX A4000] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Dell Device 14ad
        Flags: fast devsel, IRQ 212
        Memory at e6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e800000000 (64-bit, prefetchable) [size=32G]
        Memory at f000000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 3000 [size=128]
        Expansion ROM at e7000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Capabilities: [bb0] #15
        Capabilities: [c1c] #26
        Capabilities: [d00] #27
        Capabilities: [e00] #25
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia
  • Made a Windows VM and verified it used Q35 as its chipset, had UEFI as its firmware, deleted the tablet hardware, and passed through the host’s CPU configuration
  • Made a fresh Alma Linux vm and tried passing the GPU’s through that just in case it was a Windows issue, then had the exact same problem.
  • Dumped and passed-through a vBIOS ROM to see if that would fix the problem, even though I shouldn’t have to do that with a card this new (I was desperate at this point.)

Additional logs
journalctl:

Oct 21 11:53:27 MyPC libvirtd[2755]: Domain id=8 name='win10-test' uuid=9a49a45c-5e74-4d17-b1a7-47370768fef7 is tainted: cdrom-passthrough
Oct 21 11:53:27 MyPC systemd-machined[1952]: New machine qemu-8-win10-test.
Oct 21 11:53:27 MyPC nscd[1946]: 1946 monitoring file `/etc/passwd` (1)
Oct 21 11:53:27 MyPC nscd[1946]: 1946 monitoring directory `/etc` (2)
Oct 21 11:53:27 MyPC nscd[1946]: 1946 monitoring file `/etc/group` (3)
Oct 21 11:53:27 MyPC nscd[1946]: 1946 monitoring directory `/etc` (2)
Oct 21 11:53:27 MyPC nscd[1946]: 1946 monitoring file `/etc/resolv.conf` (5)
Oct 21 11:53:27 MyPC nscd[1946]: 1946 monitoring directory `/etc` (2)
Oct 21 11:53:27 MyPC systemd[1]: Started Virtual Machine qemu-8-win10-test.
Oct 21 11:53:29 MyPC avahi-daemon[1910]: Joining mDNS multicast group on interface vnet7.IPv6 with address fe80::fc54:ff:fea5:92c4.
Oct 21 11:53:29 MyPC avahi-daemon[1910]: New relevant interface vnet7.IPv6 for mDNS.
Oct 21 11:53:29 MyPC avahi-daemon[1910]: Registering new address record for fe80::fc54:ff:fea5:92c4 on vnet7.*.
Oct 21 11:53:29 MyPC kded5[3149]: "Object does not exist at path “/org/freedesktop/NetworkManager/ActiveConnection/1”"
Oct 21 11:53:29 MyPC akonadi_sendlater_agent[3576]: "Object does not exist at path “/org/freedesktop/NetworkManager/ActiveConnection/1”"
Oct 21 11:53:29 MyPC akonadi_maildispatcher_agent[3566]: "Object does not exist at path “/org/freedesktop/NetworkManager/ActiveConnection/1”"
Oct 21 11:53:29 MyPC akonadi_followupreminder_agent[3558]: "Object does not exist at path “/org/freedesktop/NetworkManager/ActiveConnection/1”"
Oct 21 11:53:29 MyPC akonadi_mailmerge_agent[3568]: "Object does not exist at path “/org/freedesktop/NetworkManager/ActiveConnection/1”"
Oct 21 11:53:29 MyPC akonadi_notes_agent[3575]: "Object does not exist at path “/org/freedesktop/NetworkManager/ActiveConnection/1”"
Oct 21 11:53:29 MyPC kernel: virbr0: port 1(vnet7) entered learning state
Oct 21 11:53:31 MyPC kernel: virbr0: port 1(vnet7) entered forwarding state
Oct 21 11:53:31 MyPC kernel: virbr0: topology change detected, propagating
Oct 21 11:53:31 MyPC NetworkManager[2290]: <info>  [1729526011.7166] device (virbr0): carrier: link connected
Oct 21 11:53:32 MyPC kernel: vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Oct 21 11:53:32 MyPC kernel: vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Oct 21 11:53:32 MyPC kernel: vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Oct 21 11:53:32 MyPC kernel: vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Oct 21 11:53:32 MyPC kernel: vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Oct 21 11:53:32 MyPC kernel: vfio-pci 0000:02:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
Oct 21 11:53:37 MyPC systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Oct 21 11:53:38 MyPC sudo[62608]: stdlogin : TTY=pts/1 ; PWD=/home/stdlogin ; USER=root ; COMMAND=/usr/bin/journalctl
Oct 21 11:53:38 MyPC sudo[62608]: pam_kwallet5(sudo:setcred): pam_kwallet5: pam_sm_setcred
Oct 21 11:53:38 MyPC sudo[62608]: pam_unix(sudo:session): session opened for user root by stdlogin(uid=1000)
Oct 21 11:53:38 MyPC sudo[62608]: pam_kwallet5(sudo:session): pam_kwallet5: pam_sm_open_session
Oct 21 11:53:38 MyPC sudo[62608]: pam_kwallet5(sudo:session): pam_kwallet5: not a graphical session, skipping. Use force_run parameter to ignore this.

/etc/default/grub:

# If you change this file, run 'grub2-mkconfig -o /boot/grub2/grub.cfg' afterwards to update
# /boot/grub2/grub.cfg.

# Uncomment to set your own custom distributor. If you leave it unset or empty, the default
# policy is to determine the value from /etc/os-release
GRUB_DISTRIBUTOR=
GRUB_DEFAULT=saved
GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=8
GRUB_CMDLINE_LINUX_DEFAULT="splash=silent preempt=full mitigations=auto quiet security=apparmor iommu=pt amd_iommu=on rd.driver.pre=vfio-pci"
GRUB_CMDLINE_LINUX=""

# Uncomment to automatically save last booted menu entry in GRUB2 environment

# variable `saved_entry'
# GRUB_SAVEDEFAULT="true"
#Uncomment to enable BadRAM filtering, modify to suit your needs

# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
# GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
#Uncomment to disable graphical terminal (grub-pc only)

GRUB_TERMINAL="gfxterm"
# The resolution used on graphical terminal
#note that you can use only modes which your graphic card supports via VBE

# you can see them in real GRUB with the command `vbeinfo'
GRUB_GFXMODE="auto"
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
# GRUB_DISABLE_LINUX_UUID=true
#Uncomment to disable generation of recovery mode menu entries

# GRUB_DISABLE_RECOVERY="true"
#Uncomment to get a beep at grub start

# GRUB_INIT_TUNE="480 440 1"
GRUB_BACKGROUND=
GRUB_THEME=/boot/grub2/themes/openSUSE/theme.txt
SUSE_BTRFS_SNAPSHOT_BOOTING="true"
GRUB_USE_LINUXEFI="true"
GRUB_DISABLE_OS_PROBER="false"
GRUB_ENABLE_CRYPTODISK="y"
GRUB_CMDLINE_XEN_DEFAULT="vga=gfx-1024x768x16"

dmesg isn’t giving anything related to this so I didn’t post it.

@AncientRegret Hi and welcome to the Forum :smile:
So I’m on Tumbleweed but this also worked on a Leap 15.5 system way back…

I define my passthrough device via a /etc/modprobe.d/11-vfio_device.conf in this case it’s a Quadro K620;

alias pci:v000010DEd000013BBsv0000103Csd00001098bc03sc00i00 vfio-pci
alias pci:v000010DEd00000FBCsv0000103Csd00001098bc04sc03i00 vfio-pci
options vfio-pci ids=10de:13bb:103c:1098,10de:0fbc:103c:1098
options vfio-pci disable_vga=1

I would suggest you also pass the other item through (audio?) in group 14, for example cat /sys/bus/pci/devices/0000:02:00.0/modalias and cat /sys/bus/pci/devices/0000:02:00.1/modalias

I don’t use a script vfio-pci-override.sh I also don’t use that vfio.conf.

After adding/modifying file in modprobe.d make sure you rebuild initrd on Leap 15.5 I think mkinitrd is still present or better yet dracut -f --regenerate-all

Edit: I also don’t use rd.driver.pre=vfio-pci in the grub options.

1 Like

Hi, Malcom. Thank you for responding!

The unusual way that I bound my gpu and audio device with the vfio-pci-override script was done to ensure only my desired gpu was bound to vfio. I have multiple of the same kind of card, so I had to tweak my setup a bit differently from most guides. Using the vendor id to bind my gpu could risk binding the others as well (I had a bit of a headache trying to get dracut to load kernel modules for gpu passthrough a few days ago, and I suspect that was why.)

I tried adding some things to my vfio.conf file, inspired by some of the ways you did it, and unfortunately my problem persists. But during my tinkering, I realized I may have neglected to post a very important log file. Most of them never gave me any issues pertaining to this problem, but checking the libvirtd.service with systemctl gives me this:

internal error: child reported (status=125): unable to open /dev/sr0: No medium found
Oct 21 18:53:13 MyPC libvirtd[6185]: unable to open /dev/sr0: No medium found
Oct 21 18:53:13 MyPC libvirtd[6185]: Unable to remove disk metadata on vm win10-test from /dev/sr0 (disk target sdb)
Oct 21 18:53:14 MyPC libvirtd[6185]: Domain id=4 name='win10-test' uuid=9a49a45c-5e74-4d17-b1a7-47370768fef7 is tainted: cdrom-passthrough

Hopefully, this could be the clue I’ve spent all afternoon looking for. Have you had to deal with an error like this before?

@AncientRegret As long as they are all different PCI ID’s and Subsystem ID’s?

/sbin/lspci -nnk |grep -EA3 "VGA|Display|3D"

That’s why the alias and full ID’s…

So, it’s also likely running an almost EOL release… You should be able too (now) on Leap 15.6 with this bug fix is to look at what @Jniko is doing with ethernet?

Ref: https://forums.opensuse.org/t/problem-with-nic-intel-i350-kvm-passthrough-leap-15-6/177548
Ref: https://bugzilla.suse.com/show_bug.cgi?id=1229019

1 Like

@AncientRegret Hi, I had the same issue sometime before when I tried to passthrough a GPU ,specially a Quadro P4000 which the Nvidia audio needed to be passthrough also.

I would recommend to check the IOMMU groups of the Nvidia cards and passthrough what ever is inside and not the Video only,but the whole group.

Use this script to easily identify what each IOMMU group has inside:

#!/bin/bash
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

Best regards,
Jim

2 Likes

@AncientRegret Forgot to mention something.

As @malcolmlewis said previously check these threads:

https://bugzilla.suse.com/show_bug.cgi?id=1229019

Specially the above doc ^^^ worked for me like a charm for the P4000…

Regards.

1 Like

Thank you guys for the help, but this bug is proving to be very stubborn. I still can’t seem to boot into the system with this gpu passed through.

I tried to update to 15.6 and redo my gpu passthrough setup to see if that fixes anything. My new, leaner setup is as follows:

  1. A copy of Grub with the following parameters:
/etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="splash=silent preempt=full mitigations=auto quiet security=apparmor iommu=pt amd_iommu=on"

  1. A dracut module recycled from my last attempt that binds my desired gpu to vfio (all other methods of doing this, including ones linked to in this thread, locks up my computer on boot)
/usr/lib/dracut/modules.d/40vfio-gpu-passthru/module-setup.sh
#!/bin/bash

check() {
	return 0
}

install() {
	inst_hook cmdline 20 "$moddir/vfio-pci-override.sh"
}
/usr/lib/dracut/modules.d/40vfio-gpu-passthru/vfio-pci-override.sh
#!/bin/sh

DEVS="0000:02:00.0"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
	for DEV in $DEVS; do
		for IOMMUDEV in $(ls /sys/bus/pci/devices/$DEV/iommu_group/devices) ; do
			echo "vfio-pci" > /sys/bus/pci/devices/$IOMMUDEV/driver_override
		done
	done
fi

modprobe -i vfio-pci
  1. two modprobe.d configuration files which both start the vfio-pci kernel module and ensure it loads before my nvidia drivers. They also make kvm ignore msrs:
/etc/modprobe.d/vfio.conf
softdep nvidia pre: vfio-pci
options vfio-pci disable_vga=1
/etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
options kvm report_ignored_msrs=0
vfio confirmed working
(base) stdlogin@MyPC:~> sudo dmesg | grep -i vfio
[    2.045856] VFIO - User Level meta-driver version: 0.3
[    2.055037] vfio-pci 0000:02:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    3.174099] NVRM: GPU 0000:02:00.0 is already bound to vfio-pci.
[   87.725988] vfio-pci 0000:02:00.1: enabling device (0000 -> 0002)

(base) stdlogin@MyPC:~> sudo lspci -vnn -s "0000:02:00.0"
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104GL [RTX A4000] [10de:24b0] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Dell Device [1028:14ad]
        Flags: fast devsel, IRQ 220
        Memory at e6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e800000000 (64-bit, prefetchable) [size=32G]
        Memory at f000000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 3000 [size=128]
        Expansion ROM at e7000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Capabilities: [bb0] #15
        Capabilities: [c1c] #26
        Capabilities: [d00] #27
        Capabilities: [e00] #25
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia

(base) stdlogin@MyPC:~> sudo lspci -vnn -s "0000:02:00.1"
02:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
        Subsystem: Dell Device [1028:14ad]
        Flags: fast devsel, IRQ 221
        Memory at e7080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [160] #25
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
  1. a bash script that just copies/installs all these files to their intended locations and rebuilds grub and my initramfs (when my computer kept locking up, I had to rollback snapshots all the time, which meant I’d have to remake all these files so at some point I just made them all in a folder in my home directory and used this script to re-install them after tweaking them to save time. Shoutout to snapper and btrfs for saving my hide constantly throughout this whole ordeal. If I were still using Debian, I don’t think I’d have gotten this far):
passthru_install.sh
#!/bin/bash

# grub
echo "Attempting to copy Grub config..."
cp grub /etc/default/grub

# the actual vfio gpu bind script
echo "Attempting to copy dracut module to /usr/lib/dracut/modules.d/40vfio-gpu-passthru..."
cp -r 40vfio-gpu-passthru /usr/lib/dracut/modules.d

# modprobe
echo "Attempting to copy vfio.conf to /etc/modprobe.d/vfio.conf..."
cp vfio.conf /etc/modprobe.d/vfio.conf
echo "Attempting to copy kvm.conf to /etc/modprobe.d/kvm.conf..."
cp kvm.conf /etc/modprobe.d/kvm.conf

# regenerate boot stuff
echo "Attempting to regenerate Grub..."
grub2-mkconfig -o /boot/grub2/grub.cfg
echo "Attempting to regenerate initramfs..."
dracut -f --regenerate-all

I did all this, even remade my vm and reinstalled my virtualization tools, just to have the same problem by the end of it all:


It only does this when the host pci devices are added. It boots normally when I remove them.

One thing I tried was disabling secure boot, and I might have disabled it in the bios, or at least that’s what it says when I go into the bios, but when I look inside my virtual machine’s settings, there still seems to be settings that enable it and part of me wonders if that’s what is messing with the machine:


Virt manager WILL NOT let me change this. It keeps complaining about not having an efi firmware that’s compatible with fastboot being disabled. Am I doing this wrong? Is this a red herring?

The strangest part about all of this is I’m not seeing any errors in other logs. Journalctl, dmesg, and even virtlog.d turn up nothing. I only know I have a boot problem because switching video to ramfb will let me see the boot message in the vm window. If I switch the video device to anything else, I get a black screen. I don’t even get output from the gpu itself. The only other problem I could find close to this was when people get an error code 43, but they can still at least boot the machine up to check this. I can’t even get to that stage. It’s making me wonder if this is actually a hardware issue, whether it’s the gpu itself or maybe the motherboard, I really don’t know. I’m almost positive the A4000 can be passed through.
Here’s the machine settings, just in case there’s a clue in here:

win10-test
<domain type="kvm">
  <name>win10-test</name>
  <uuid>fbaa9c70-0208-4d4f-a746-711261524ac4</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">33554432</memory>
  <currentMemory unit="KiB">33554432</currentMemory>
  <vcpu placement="static">8</vcpu>
  <os firmware="efi">
    <type arch="x86_64" machine="pc-q35-8.2">hvm</type>
    <firmware>
      <feature enabled="yes" name="enrolled-keys"/>
      <feature enabled="yes" name="secure-boot"/>
    </firmware>
    <loader readonly="yes" type="pflash">/usr/share/qemu/ovmf-x86_64-ms-code.bin</loader>
    <nvram template="/usr/share/qemu/ovmf-x86_64-ms-vars.bin">/var/lib/libvirt/qemu/nvram/win10-test_VARS.fd</nvram>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode="custom">
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on">
    <topology sockets="1" dies="1" cores="8" threads="1"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" discard="unmap"/>
      <source file="/home/stdlogin/Documents/vmstuff/qemu-vms/win10-test.qcow2"/>
      <target dev="sda" bus="sata"/>
      <boot order="1"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/stdlogin/Documents/vmstuff/discs and stuff/A - System things/drivers/virtio-win-0.1.262.iso"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <boot order="2"/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0x18"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0x19"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/>
    </controller>
    <controller type="pci" index="11" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="11" port="0x1a"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x2"/>
    </controller>
    <controller type="pci" index="12" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="12" port="0x1b"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x3"/>
    </controller>
    <controller type="pci" index="13" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="13" port="0x1c"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x4"/>
    </controller>
    <controller type="pci" index="14" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="14" port="0x1d"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x5"/>
    </controller>
    <controller type="pci" index="15" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="15" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </controller>
    <controller type="pci" index="16" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <serial type="pty">
      <target type="isa-serial" port="0">
        <model name="isa-serial"/>
      </target>
    </serial>
    <console type="pty">
      <target type="serial" port="0"/>
    </console>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
      <image compression="off"/>
      <gl enable="no"/>
    </graphics>
    <sound model="ac97">
      <address type="pci" domain="0x0000" bus="0x10" slot="0x01" function="0x0"/>
    </sound>
    <audio id="1" type="none"/>
    <video>
      <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/>
      <address type="pci" domain="0x0000" bus="0x10" slot="0x02" function="0x0"/>
    </video>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <driver name="vfio"/>
      <source>
        <address domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x02" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </hostdev>
    <watchdog model="itco" action="reset"/>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </memballoon>
  </devices>
</domain>

@AncientRegret I have the following snippet on Tumbleweed for Windows 11;

  <os firmware='efi'>
    <type arch='x86_64' machine='pc-q35-8.1'>hvm</type>
    <firmware>
      <feature enabled='yes' name='enrolled-keys'/>
      <feature enabled='yes' name='secure-boot'/>
    </firmware>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/qemu/ovmf-x86_64-smm-ms-code.bin</loader>
    <nvram template='/usr/share/qemu/ovmf-x86_64-smm-ms-vars.bin'>/var/lib/libvirt/qemu/nvram/Windows_11_Pro_VARS.fd</nvram>

So your adding the virtio iso for the install?

Is it selecting the correct QEMU device in the boot menu?

1 Like

Alright, so I think I have made some progress. So to get a VM that boots with my gpu passed through, I had to make a new vm WITH BIOS AND THE Q35 CHIPSET. All of the guides that I followed explicitly said I needed to enable Q35+UEFI, but for whatever reason, the RTX A4000 really hates that.

I installed windows 10 on a Q35+BIOS system and, while the preview window is black and white and runs at like 5 fps (won’t matter when i get the drivers working,) the pc installed, booted, and ran perfectly fine, which is definitely farther than I got before. The NVIDIA drivers even installed correctly!

Now, I’m not out of the woods just yet. I have (hopefully) one last problem: The gpu is complaining about not getting enough resources (error code 12.) This makes sense to me, as the A4000 is a 16GB card and I had to enable 4G decoding on my host just to boot it up with all the cards when I first built it. The guest probably just doesn’t have that enabled.

The only issue is that I can’t seem to get into a BIOS screen to enable that. I could do it with the other UEFI vm, but not this one. Is there some XML I could put into my virt manager config to enable 4G decoding manually, or something?

@AncientRegret as in pressing F2 on boot to get to the BIOS screen, or no options? I suspect it needs to be a UEFI booting vm.

1 Like

I’ve reached another breakthrough and finally fixed the problem: I couldn’t figure out the memory problem for that SeaBIOS machine, but I went back to the original vm and tested more. If I could get the Q35+SeaBIOS machine to boot at all with the gpu, then maybe the problem with the first vm had something to do with TianoCore specifically. I looked up “gpu crashes TianoCore” and sure enough I find this forum thread:

A user named bastl posts this:

Try to start the VM with only 1 core. This issue often happens on fresh Windows VMs on first setup. After installing the OS with only 1 core and first boot to desktop you should be able to give the VM the rest of the cores you like and it should boot up.

So maybe it was never about TianoCore or the gpu at all: I kept giving my vm’s 8 cores with one thread each. I figured that, since my threadripper has 32 cores, it’d be able to handle a vm with that many. Maybe it can, but the guest CERTAINLY didn’t like that. I just switched the cores to 1 (1 socket, 1 core, 1 thread) on the original vm, the one using UEFI, and sure enough it found the bootloader just fine! I just had to switch the display back to virtio and there I was installing Nvidia drivers on a perfectly working system. I get output from the gpu now, so I can confirm I finally have a working gpu passthrough setup!

Took me nearly two weeks to figure this out but I’m glad I got it working in the end lol, and hopefully this thread saves someone else’s time. Just remember to be careful about how many cores you give a vm if it starts giving you trouble like that.

1 Like

Just a quick update because I can’t seem to edit my previous post: BE SURE TO DISABLE RESIZABLE BAR for the vm to work if you’re having the same problem in this thread. I was troubleshooting a weird microstutter issue on my guest and tried re-enabling re-sizable bar to see if that would help at all, and it just replicated the “no bootable option” problem instead. Turning it back off fixed the problem, so just something to keep in mind.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.