CPU: AMD Ryzen 7 3700X 8-Core Processor
Memory : 32 Gb
GPU1 Host System : AMD Radeon RX 480 Graphics - Monitor 1 and 2
GPU2 dedicated for Win11 Guest: AMD Radeon RX 480 Graphics - Monitor 3 and 4
The same problem when using the same Win11 Guest in an Host window without the dedicated GPU 2
A zypper dup update resulted in a change in the KVM Qemu system. I can’t say when this happened because I haven’t used the Windows guest for a while.
The fact is that I have a snapshot from September, everything works there, with the updates of the last four weeks the guest and the host keep hanging.
It’s not that the guest uses up all the resources, on the contrary, when the guest goes from about 200% CPU to 10% CPU (qemu-system-x86), the whole system starts to hang. Even mouse and keystrokes hang.
I have switched back and forth between the snapshots several times, it is definitely not the guest, but the host, i.e. the Tumbleweed update.
Now I have both snapshots, but I can’t narrow down what the problem might be. I can’t report a bug either because I don’t know what could be wrong.
Can anyone help me narrow it down?
I cannot help, but I can report the same issue without the complicated setup. I’ve checked the journal logs but nothing seems to have broken.
So it looks like a BUG.
The second GPU is completely inactive if you start the Windows guest in window mode, i.e. without GPU pass through.
Therefore, we have the same setup in this case and should consider where to report the BUG. Kernel or KVM/Qemu?
@etron770 I’m not seeing it here with Windows 11 Pro, I do have a Nvidia K620 passed through, but I only connect externally over spice or via console, no monitor connected. I do have qemu-ovmf-x86_64 202402-1.1
locked for a Rancher Desktop bug.
qemu-ovmf-x86_64 Version : 202408-1.1
is working with the September 2024 Rollback
@etron770 maybe AMD related then, I’m on Intel Xeon and Nvidia…
@etron770 I have 6 cores/12 threads and 32GB of ram allocated to the Windows vm.
free -h
total used free shared buff/cache available
Mem: 125Gi 4.4Gi 69Gi 114Mi 53Gi 121Gi
Swap: 0B 0B 0B
virsh start Windows_11_Pro
Domain 'Windows_11_Pro' started
free -h
total used free shared buff/cache available
Mem: 125Gi 36Gi 36Gi 124Mi 53Gi 88Gi
Swap: 0B 0B 0B
But what is the difference between the September version of KVM/Qumu/Kernel and the current version?
The waitstates show that something is being written again and again, wherever it is:
When I start Windows without swap, the complete system hangs:
If I still manage to switch on the swap, the swap fills up and I can work (slowly) again.
@etron770 sure it’s not something wrong with the Windows vm?
If the buf/cache is deleted before Windows is started, the problem is temporarily solved
echo 3 > /proc/sys/vm/drop_caches +
Start VM
but …
The system starts very quickly but after a while the swap starts to fill up again and the waitstates increase
Do you think it’s a BUG, and if so, where should it be reported?
It is completely absurd that the buff/cache continues to grow after deletion, before Windows starts, and the system swaps everything to the swap:
19Gb buff/cache and 12Gb swap:
@etron770 Probably against qemu…
free -h
total used free shared buff/cache available
Mem: 125Gi 4.0Gi 121Gi 140Mi 1.4Gi 121Gi
Swap: 0B 0B 0B
virsh start Windows_11_Pro
Domain 'Windows_11_Pro' started
free -h
total used free shared buff/cache available
Mem: 125Gi 36Gi 89Gi 142Mi 1.6Gi 89Gi
Swap: 0B 0B 0B
virsh shutdown Windows_11_Pro
Domain 'Windows_11_Pro' is being shutdown
virsh list
Id Name State
--------------------------------
1 Windows_11_Pro running
virsh list
Id Name State
--------------------
free -h
total used free shared buff/cache available
Mem: 125Gi 4.1Gi 80Gi 128Mi 41Gi 121Gi
Swap: 0B 0B 0B
but in your setup the buffer is smaller than the free memory.
In my case, the buffer grows and the free memory decreases.
Even without a VM being started without qemu.
total used free shared buff/cache
mem: 31Gi 8,6Gi 2,9Gi 351Mi 20Gi 22Gi
Swap: 147Gi 787Mi 147Gi
Something is wrong with my system …
@etron770 something is leaking somewhere then…
The distribution where the Windows guest runs without problems and the swap is not filled up is with the kernel 6.10.9-1-default
The only difference is a Zypper dup to the latest distribution
From this point on, the swap fills up
kernel 6.10.9-1-default memory:
free -h
total used free shared buff/cache available
Mem: 31Gi 21Gi 341Mi 127Mi 10Gi 10Gi
Swap: 157Gi 6,8Mi 157Gi
Probably solved
For historical reasons, the swap was too small in relation to the ram. I have neglected to adjust this over the years.
The new release then changed something so that the limit of the swap was apparently exceeded.
I have increased the swap and so far the system is running smoothly.
I don’t understand why the buff/cache was not reduced. But it looks as if this was already the case in September with the release at that time.
I have looked everywhere, except for the memory and/or swap
Partially solved
partially:
I need to su root: echo 3 > /proc/sys/vm/drop_caches
before starting the Windows guest - but I’m not sure if it always works
The previous solution was only sometimes successful.
I booted with the 6.10.9-1-default kernel.
As you can see from the screenprint, there are no more waitstates, the swap remains empty and the buff/cache do not increase.
The qemu-system-x86 gets the CPU time it needs.
Even if I start a second virtual machine that requires more RAM than is available in the buff/cache, the system is slower, but usable.
If I start a second virtual machine with RAM below the free available memory, there is no noticeable difference.
Now that the system has been running for an hour, with no change, and previously with multiple virtual machine starts and stops, I mark the thread as resolved.
The rest would have to be solved by the kernel developers, am I right?
cat /etc/os-release
NAME="openSUSE Tumbleweed"
# VERSION="20241104"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20241104"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
# CPE 2.3 format, boo#1217921
CPE_NAME="cpe:2.3:o:opensuse:tumbleweed:20241104:*:*:*:*:*:*:*"
#CPE 2.2 format
#CPE_NAME="cpe:/o:opensuse:tumbleweed:20241104"
BUG_REPORT_URL="https://bugzilla.opensuse.org"
SUPPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"