When writing to disk, my computer eventually starts to stutter and audio starts to crackle.
I can easily reproduce this by copying files over from my NAS via 10 Gbit/s network or even by downloading an update from Steam that writes a lot to disk.
Looking at a process monitor during the copying process reveals that the system’s “free” memory is turning into “cached” memory - something I think is pretty normal, right? However, once the system runs out of “free” memory (with many gigabytes still “available” because most of it is just “cached”), it starts to stutter. Video playback stutters, audio crackles (including for example voice chat, so not just data streamed from disk) etc.
I noticed a process called kswapd0 hogs an entire CPU core once the system runs out of “free” memory (100% CPU usage on one core), but it’s a system with 16 physical cores/32 hardware threads (Ryzen 9 7950X3D) so there’s plenty of CPU power to playback audio/video smoothly I feel like. I’m confused why kswapd0 would be running in this case though. I tried completely disabling swap, but the same thing happens (kswapd0 still runs).
I suspected btrfs at first, but the same thing happens when writing to an ext4 drive.
One of the drives is a Samsung 990 PRO, and the other two are 980 PROs, so throughput should be decent.
I’m on Tumbleweed, kernel 6.15.5, KDE 6.4.2 Wayland.
I think this started happening with kernel 6.15. I don’t really want to revert to an older kernel permanently as 6.15 brought significant stability improvements for my Radeon 9070 XT, but I’ll update this post once I tested 6.14.
Welcome to the openSUSE Forums!
You may start with showing details about your system, for instance copy/paste here between preformatted text tags (the </> button above editing area) the result of:
inxi -CDjmaz
To an extent, what you witness is normal. Video, audio files are on disk and if you are overloading the disk channel, or the 3 disks share the same channel, something might be queued beyond the length of audio or video buffers.
More thoughts after we better understand your system (computer output data please, not long narrative).
To an extent, what you witness is normal. Video, audio files are on disk and if you are overloading the disk channel, or the 3 disks share the same channel, something might be queued beyond the length of audio or video buffers.
I’d understand that if I was playing video from disk, but as I said this also happens with live streams in a web browser and even real-time voice chat for example. It’s like the entire system stutters, including the mouse cursor skipping a few frames here and there.
I’m no system tuning specialist, but with those specs I would rule out CPU or RAM problems.
A general note, current kernels have a better memory management with some swap space enabled, so I don’t think that disabling swap entirely does any better. There might be options to tune the swap system though.
SSD disks are slower at writing, so if the problem only shows when massively writing to disk maybe your network is far faster than your disks (when writing); that might explain why you are filling up 64 GB of RAM and then possibly clogging a system bus.
A clever use of the 3 disks might also help:
are they on different buses / controllers?
are data subject to heavy writing on a device different from the system and/or swap one?
is the network controller on a different bus?
To monitor what is happening on the sound channel you may watch pw-top , especially the ERR column, when stuttering happens. Watching top or the system monitor of your choice at the same time might also help understand where the clogging happens.
Just personal experience opinion here: even a small swap is better than no swap. It provides virtual memory when the system’s physical RAM is full (which yours shows getting close to full).
If you don’t have a dedicated partition for swap, then you should create a swap file for a test (using a disk that is least accessed).
A quick look at specs shows for the 980 PRO a max random write (4 KB, QD1) at 60000 IOPS and 80000 IOPS for the 990 PRO. That translates to something between 2 and 2.6 Gb/s in ideal conditions.
That means that your network at 10Gb/s is nominally 4 times faster than the fastest of your disks.
So, at least in theory, your network is able to fill up the RAM 4 times faster than your disk is able to drain it.
In principle a “sequential write” could be much faster but it seldom applies, like in cases where you are just making a direct copy of one disk image to another disk.
That means that if the NAS is fast enough and no special attempts were taken at optimizing file structure and copy strategy, data read over the network fill up the RAM, disks cannot write as fast and when “free” RAM reaches zero the system chokes.
I’m not sure, but I think that default settings privilege network and disk over listening to music and watching videos. If you feel that listening and watching have a higher priority than copying data there should be some system tunables to restrain the network driver when free RAM is running out.
I use large files to reproduce the issue, so writes should mostly be sequential. According to reviews the 990 PRO can sustain close to 2 GByte/s sequential write throughput once SLC cache is exhausted. Throughput copying from my NAS is around 1 GByte/s with the 10 Gbit/s connection.
I tried creating a swap partition (32 GB), but it barely gets any use when copying (only 3 MB used even when the system starts stuttering), so it doesn’t seem to help.
kswapd0 manages virtual memory. If memory runs low, kswapd0 moves process(es) not as active as other processes to swap … the side effect is severe lag on more active processes.
You’ve since disabled swap (not a good idea), but I would have suggested executing (before disabling swap):
# echo vm.swappiness=0 | tee -a /etc/sysctl.conf
Or maybe set to “10”.
(the default is 60).
.
So, dumb question from me:
Are you playing media files WHILE copying huge files?
Are you playing media files WHILE copying huge files?
Yeah, but not from the same drive. I either stream something off of YouTube for example or use voice chat, or do anything really, and the stutter happens. So it’s not (just) processes accessing the same disk that is being written to that stutter, the entire system stutters.
To be clear, no swap setting is expected to solve the problem at hand, but disabling swap completely is even worse. Use of swap, even if marginal, is a symptom, not the cause of the problem; when you get to using swap the system chokes, full-stop.
But without a swap, the OOM-killer kills inactive processes without warning, which might be even worse than temporary stuttering.
That said, apparently your system is unable to copy files at 1GB/s and still be responsive at the desktop level. Servers optimize file transfer, desktops optimize responsiveness, you are looking for the best of both worlds?
You may check if RAM, network and the write-to disk share some PCIe lanes and in case if it is possible to rearrange so that each has exclusive lanes.
If simultaneous file copy and video playing is only occasional, maybe trying to optimize is not worth the effort. If it is common for that system, you must choose which one has the most value to you and maybe a tuning expert can offer better insight.
It wasn’t always happening, that’s the thing. I’ve been using Tumbleweed for over a year now and it used to work until recently.
I just checked if I could reproduce the issue with kernel-longterm (currently at 6.12.36), and so far I wasn’t able to provoke a single stutter - kswapd0 never uses more than a fraction of a single CPU core. I’d use 6.12, but 6.13-6.15 introduced a lot of stability improvements and bugfixes for my GPU (9070 XT), where before it could crash or freeze the system in certain situations.
Maybe I have to bisect the issue comparing 6.12 with 6.15 or 6.14. I experience this issue frequently enough that it annoys me, because it’s not just when playing back video (although there it’s the most obvious), rather it happens in general with larger sequential writes to disk causing stutters everywhere.
Thank you guys for all the input so far!
And about the PCIe lanes: 2 of the 3 SSDs (the system SSD 990, and one of the 980s) is connected directly to the CPU, with one being connected via the chipset. The PCIe slot of the 10 GbE card is also connected via the chipset, so there’s some bandwidth sharing between these two devices (with more things like USB and whatnot also being connected via the chipset obviously). But the stutters happen no matter what SSD is being written to.
Doesn’t matter … drive access is drive access, no matter if one drive or four. The interrupts still happen.
It’s not as if you have two completely separate physical processors running on their own motherboard and with their own Linux OS, each dedicated to two independent drives.
With this commit reverted on top of kernel 6.15.6, kswapd0 uses a lot less CPU (~10% of a single core, not 100%) and stuttering doesn’t occur.
As this change is btrfs specific and stuttering can (if rarely) also trigger when Steam updates a game on an ext4 disk, I’m not sure whether I just removed a symptom or the actual cause of the issue (or if it’s two separate issues). I’ll do further testing.