Problem with NIC Intel I350 kvm passthrough Leap 15.6

Hi,

I am having a serious problem here after the latest kernel upgrade which the NIC that I have can not pass through on KVM.

The message that I have on Virt manager is this:

Error starting domain: internal error: QEMU unexpectedly closed the monitor (vm=‘win10’): qxl_send_events: spice-server bug: guest stopped, ignoring
2024-08-08T01:45:51.824474Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:0b:00.0”,“id”:“hostdev0”,“bus”:“pci.2”,“addr”:“0x0”}: vfio 0000:0b:00.0: group 83 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

The kernel is:

Linux 192.168.88.179 6.4.0-150600.23.17-default #1 SMP PREEMPT_DYNAMIC Tue Jul 30 06:37:32 UTC 2024 (9c450d7) x86_64 x86_64 x86_64 GNU/Linux

All worker normally before, but after the kernel update something is not working.

P.S.

On lspci the device is enumerate properly but on Network manager it shows as a Intel chipset root port.

Yes I have checked on /etc/default/grub the ioommu for Intel is enabled and configured properly.

Best regards,
Jim

To add something more.

If I revert to previous kernel version is working perfect without any issues.

The kernel that work correct is:

6.4.0-150600.23.14-default #1 SMP PREEMPT_DYNAMIC Wed Jul 3 00:26:09 UTC 2024 (95fb0f8) x86_64 x86_64 x86_64 GNU/Linux

Regards.

This happens to the backup server I have.

Something is really broken on new kernel. Tried to passthrough an Intel I-217 NIC with the same problem.

A little help would be appreciated.

Regards.

@Jniko Hi and welcome to the Forum :smile:
So is there a particular reason to use vfio-pci for the network card as just using it on the host as a bridge device via wicked (or NetworkManager)?

So have you configured this card (it is a separate device, not part of the motherboard?) for passthrough, with and appropriate alias file and loading the vfio modules with dracut?

@malcolmlewis Hi,

Yes I need to passthrough for low latency application such as PTP and multicast.

Everything is configured properly and never had a problem, from Leap 15.5 and 5.14.y kernels. I had upgraded to Leap 15.6 and 6.4.y kernels and everything went smooth and worked perfect.

The error described previous started when I upgraded the kernel to the latest

6.4.0-150600.23.17-default.

So something must be broken… 2 different servers do the same thing is impossible all sudden after the kernel upgrade. :slight_smile:

The Intel I350- T2 is a PCI-e card and the Intel I-217 is built in card on motherboard. All have the same response.

By downgrade to previous kernel everything is working perfect.

The working correct kernel is:

6.4.0-150600.23.14-default

Regards.

@Jniko then it’s a regression for sure, I would submit a bug report and see what the kernel folks have to say… openSUSE:Submitting bug reports - openSUSE Wiki

1 Like

@malcolmlewis forgot to mention this.

This card is not isolated or configured with dracut. When the KVM VM’s are not powered on this card is not used but connected to the network as simple NIC card without any basic usage.

When I power on the KVM VM’s the card isolate properly and successfully from the host and passthrough to assigned VM . When I power of the VM the NIC card reassigned to the host correctly and without any single issue. (this was until 6.4.0-150600.23.14-default which was worked correct and without any single issue)

The VM’s are not in use 24/7.

Regards.

@Jniko, then I would consider binding to vfio-pci, it’s very easy. I also suspect if needed they can be unbound.

@malcolmlewis This is what I am using VFIO-PCI

0b:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter I350-T2
Kernel driver in use: vfio-pci

As you say bound / unbound is easy and perfect. On the last kernel something is really broke.

:slight_smile:

@Jniko then in you bug report, describe what your doing on the working kernel, then repeat with the failing kernel and attach the journal logs with the error(s).

1 Like

@malcolmlewis All done it.

BTW this is the error from journal logs:

Aug 09 05:42:53 virtqemud[2793]: Unable to read from monitor: Connection reset by peer
Aug 09 05:42:53 virtqemud[2793]: internal error: QEMU unexpectedly closed the monitor (vm=‘win10’): qxl_send_events: spice-s>
2024-08-09T02:42:53.163376Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:"0000:0>
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

I already post the bug and the logs thanks.

:slight_smile:

@Jniko Care to post the bug report number here?

@malcolmlewis Sure 1229019

:slight_smile:

1 Like

Still no fix for this issue. I hope to resolve soon… :pensive:

@malcolmlewis maybe do you have more information about the fix for this issue?

I have tried the quad port Intel I-350 and have the same symptom.

If the whole IOMMU group of the card is passed through on one VM is working fine.
If you try to attach only one port of the card the symptom above happens.

If you have any news would be appreciated.

Best regards.

@Jniko I only use the network ports as bridges, no passthrough :slightly_frowning_face:

The same thing happens and on the last kernel update, 6.4.0-150600.23.22-default.

Is there a change log or somehow to be fixed? Now the rest 11 servers have this problem that can not split the IOMMU group.

If someone have any news or a fix please post.

Best regards.

@Jniko Either via YaST Software Management and searc for kernel-default and there should be a changelog tab, or rpm -q kernel-default --changelog | less to see the current changes…

@malcolmlewis Thank you for your response.

Sadly nothing referring to this subject. I am seeing that on bugzilla other users have reported this issue…

I hope you can alert the correct people for this problem…

Best regards,
Jim

@Jniko Hi, so I noticed on the bug report, so your up to date with everything on the system(s)? There was a new kernel… not sure about kvm parts as don’t use it on Leap 15.6.