observing excessive swapping when copying large files

robert_spitzenpfeil · December 24, 2019, 7:17pm

CPU: i7-7700T
RAM: 16GB
Storage: SSD + NVME

I’ve seen this behaviour on two other machines running Tumbleweed as well. One i5-8265U with 8GB + NVME and a very old Core2 Duo with just 2 GB of RAM + SSD.

I observe the follwoing:

When copying large files, the buffer / cache keeps expanding until the machine runs out of pyhsical ram and starts swapping. This comes with the expected speed penalty / lack of responsiveness, iostat shows above 90% iowait for kswapd0.
This problem goes away when disabling all swap space, transfer speed is high, the system is responsive.

I have a fundamental problem with this. I don’t see the point of increasing buffers / cache so much that the system starts swapping, just for copying a large file (say VM image or the likes).
I could be wrong, but I don’t think I’ve observed this behaviour in recent years. It doesn’t make any sense to me.

Puzzled.

mrmazda · December 24, 2019, 9:33pm

Just how large is “large”? Seconds ago I finished zypper dup on an E8400 Core2Duo with 4G RAM, SSD and swap size 2G. Copying from where to where? Copying how?

robert_spitzenpfeil · December 25, 2019, 3:15pm

File size is typically 2x RAM, say 40GB.

Copying from folder A to B on same device with “cp”.

Zypper dup is not critical, just a bunch of relatively small packages. Many, but relatively small. I have no issues there.

karlmistelberger · December 25, 2019, 4:58pm

robert_spitzenpfeil:

CPU: i7-7700T
RAM: 16GB
Storage: SSD + NVME

I’ve seen this behaviour on two other machines running Tumbleweed as well. One i5-8265U with 8GB + NVME and a very old Core2 Duo with just 2 GB of RAM + SSD.

I observe the follwoing:

When copying large files, the buffer / cache keeps expanding until the machine runs out of pyhsical ram and starts swapping. This comes with the expected speed penalty / lack of responsiveness, iostat shows above 90% iowait for kswapd0.
This problem goes away when disabling all swap space, transfer speed is high, the system is responsive.

I have a fundamental problem with this. I don’t see the point of increasing buffers / cache so much that the system starts swapping, just for copying a large file (say VM image or the likes).
I could be wrong, but I don’t think I’ve observed this behaviour in recent years. It doesn’t make any sense to me.

Puzzled.

You may need to adjust swappiness. A snappy machine with 8GB RAM became dead slow and unresponsive when rsyncing /home.

Screw data. Prioritize code

mrmazda · December 25, 2019, 7:34pm

I wrote about zypper to suggest I might be able to try to closely replicate your observation, but my system mentioned has no filesystem on it anywhere near as big as 40GB, much less any file that size.

malcolmlewis · December 25, 2019, 7:50pm

robert_spitzenpfeil:

CPU: i7-7700T
RAM: 16GB
Storage: SSD + NVME

I’ve seen this behaviour on two other machines running Tumbleweed as well. One i5-8265U with 8GB + NVME and a very old Core2 Duo with just 2 GB of RAM + SSD.

I observe the follwoing:

When copying large files, the buffer / cache keeps expanding until the machine runs out of pyhsical ram and starts swapping. This comes with the expected speed penalty / lack of responsiveness, iostat shows above 90% iowait for kswapd0.
This problem goes away when disabling all swap space, transfer speed is high, the system is responsive.

I have a fundamental problem with this. I don’t see the point of increasing buffers / cache so much that the system starts swapping, just for copying a large file (say VM image or the likes).
I could be wrong, but I don’t think I’ve observed this behaviour in recent years. It doesn’t make any sense to me.

Puzzled.

Hi
So are you copying SSD -> NVMe, what speed is the NVMe running at, eg;


inxi -Dxxz

Drives:    Local Storage: total: 1.36 TiB used: 166.96 GiB (11.9%) 
           ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS250G1B0C-00S6U0 size: 232.89 GiB speed: 15.8 Gb/s lanes: 2 
           serial: <filter> 
           ID-2: /dev/sda vendor: Western Digital model: WDS250G2B0B-00YS70 size: 232.89 GiB speed: 6.0 Gb/s serial: <filter>

What scheduler is in use, eg;


cat /sys/block/nvme0n1/queue/scheduler
[mq-deadline] kyber bfq none

cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

What file system are you copying from to?

robert_spitzenpfeil · December 28, 2019, 10:58pm

Everything is Ext4.

Currently I run tests on my old laptop (2G RAM) like so:

dd if=/dev/sda of=/dev/null status=progress

This performs as expected for such an old machine (1.5Gbs SATA-I). At some point it starts hitting SWAP and speeds go down to about 50%, same device 2x the IO.
Swap space usage is very low, just a few 10s of MB, but the activity is constant. I run these tests with no other applications, except a terminal on X.
I really don’t see what should be swapped out here. I’ve played with vm.swappiness, but that just offsets the point at which swapping starts.

I think this behaviour is pathological, nothing should be swapped out to increase buffers / cache, ever.

The HW in my main laptop is similar to your inxi results.

**Drives: Local Storage:**total: 698.65 GiB used: 401.71 GiB (57.5%)
ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 960 EVO 250GB size: 232.89 GiB speed: 31.6 Gb/s lanes: 4
serial: <filter>
ID-2: /dev/sda vendor: Western Digital model: WDS500G1B0A-00H9H0 size: 465.76 GiB speed: 6.0 Gb/s serial: <filter>

Scheduler settings are identical.

malcolmlewis · December 28, 2019, 11:25pm

robert_spitzenpfeil:

Everything is Ext4.

Currently I run tests on my old laptop (2G RAM) like so:
dd if=/dev/sda of=/dev/null status=progress
This performs as expected for such an old machine (1.5Gbs SATA-I). At some point it starts hitting SWAP and speeds go down to about 50%, same device 2x the IO.
Swap space usage is very low, just a few 10s of MB, but the activity is constant. I run these tests with no other applications, except a terminal on X.
I really don’t see what should be swapped out here. I’ve played with vm.swappiness, but that just offsets the point at which swapping starts.

I think this behaviour is pathological, nothing should be swapped out to increase buffers / cache, ever.

The HW in my main laptop is similar to your inxi results.

Scheduler settings are identical.

Hi
I would surmise that might be a dd issue/bug, I see the same here with your command.

karlmistelberger · December 29, 2019, 5:49am

robert_spitzenpfeil:

Everything is Ext4.

Currently I run tests on my old laptop (2G RAM) like so:
dd if=/dev/sda of=/dev/null status=progress
This performs as expected for such an old machine (1.5Gbs SATA-I). At some point it starts hitting SWAP and speeds go down to about 50%, same device 2x the IO.
Swap space usage is very low, just a few 10s of MB, but the activity is constant. I run these tests with no other applications, except a terminal on X.
I really don’t see what should be swapped out here. I’ve played with vm.swappiness, but that just offsets the point at which swapping starts.

I think this behaviour is pathological, nothing should be swapped out to increase buffers / cache, ever.

The HW in my main laptop is similar to your inxi results.

Scheduler settings are identical.

Same behavior here: Started dd if=/dev/sdc of=/dev/null bs=4M with no swap being active. Everything was fine until I turned swap on. The machine would freeze immediately. Swapoff runs for minutes until finally succeeding. I think it is a kernel bug.

robert_spitzenpfeil · December 29, 2019, 3:35pm

Thanks for testing.

Reported: https://bugzilla.opensuse.org/show_bug.cgi?id=1159882

gelotress · December 30, 2019, 3:23am

Try:


echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes
echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes

To revert:


echo 0 > /proc/sys/vm/dirty_background_bytes
echo 0 > /proc/sys/vm/dirty_bytes

If that’s the case, then it might be due to an outstanding kernel bug.
As it stands, this appears to be a 2 part case. The commands above is just half of the fix.
Essentially:

Too much memory dirty pages to writeback
Databloat

The commands above will reset back to default values on the next boot.

For more details:
https://unix.stackexchange.com/questions/107703/why-is-my-pc-freezing-while-im-copying-a-file-to-a-pendrive/107722#107722

robert_spitzenpfeil · December 31, 2019, 1:56am

On the old laptop with 2GB RAM, nothing what I do to these settings changes really anything, it starts swapping a few 10 MB at one point or another (nonsensically to me) and that kills performance.

mrmazda · December 31, 2019, 5:14am

What do you do with the Core2Duo PC besides copy large files? Most old Core2Duos with only 2G RAM have 4 RAM slots with 2 unused. You could improve behavior if necessary for more extensive activity by adding a pair of 2G sticks rather cheaply. Maybe you don’t need swap enabled. You could run swapoff prior to copy, then swapon after. Or, just leave the swap partition disabled, and let the OS use a swap file when it thinks it needs swap. The bug report should get attention soon after the new year if not before.

robert_spitzenpfeil · December 31, 2019, 5:50pm

That laptop has been demoted to a YouTube & Netflix viewer since some time ago something bad has happened to graphics performance of i915 + GM945.

I use it to test this issue, as it is most apparent on it, but not limited to it. My i7 laptop shows similar behaviour when copying VM images, they’re just a whole lot larger.
It is still not acceptable that copying stuff can slow GUI response to thick molasses.

The old laptop has 2 SO-DIMM slots, the BIOS (recent version) only supports up to 3GB of RAM, I tried upgrading, it doesn’t work. The device is one step away from trashing it, but it still “works” and the DVD drive is region free.

I’m quite annoyed that I have to “suffer” from this on Linux, wheres on W10 (office PC) nothing of that sort happens - similar CPU, same amount of RAM.

mrmazda · December 31, 2019, 7:33pm

GM945, Intel Gen3 graphics, only had hardware acceleration for MPEG2. Intel added it for VC-1 and AVC halfway through the Gen4 graphics series the next year.

robert_spitzenpfeil · December 31, 2019, 8:10pm

I know it’s old as heck.

I used to run Leap 15.0 on it and it was snappy, GUI was always fast, with 15.1 something changed. I see the same on about 13 old PCs at work. Ran 15.0, switched to TW and BOOM, same graphics badness on recent Debian, Ubuntu …

But that is a completely different issue.

mrmazda · January 1, 2020, 1:42am

I have 2 i945Gs (desktops, not GM945 laptops). Both run P4s (single core) @3.0Ghz. They’ve always seemed like toads compared to other Intel hardware, so I never noticed any change in later distro releases.

robert_spitzenpfeil · January 4, 2020, 10:48am

Interesting read related to this issue

https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1149881-fedora-32-looking-at-using-earlyoom-by-default-to-better-deal-with-low-memory-situations/

robert_spitzenpfeil · January 4, 2020, 10:51am

Interesting read related to this issue. Some posters there are affected as well.

https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1149881-fedora-32-looking-at-using-earlyoom-by-default-to-better-deal-with-low-memory-situations/page6

unix111 · January 4, 2020, 3:05pm

There has been an interesting discussion on the LKML (Linux Kernel Mailing List) last summer:
https://lkml.org/lkml/2019/8/4/15
(Subject: »Let’s talk about the elephant in the room - the Linux kernel’s inability to gracefully handle low memory pressure«)

This type of out-of-memory problem continues to be a hard nut to crack.