plasmashell freezes under IO load [EXT4 on LVM on LUKS]

I’ve recently installed Leap 15.2 and saw that the UI freezes in various scenarios:

  • while creating new docker images and containers
  • while importing VirtualBox OVA files
  • and in various other cases like downloading large files
  • while copying or moving large files (more than a few GB)

After a while, I’ve managed to consistently reproduce this issue while using DD. If I add the oflag=direct on the DD command, in order to skip the disk cache, the UI no longer freezes, but I’m have no clue what’s happening.

A 2nd OS (dual-boot of Leap 15.1) on the same machine using a similar partition setup doesn’t have this issue and I don’t see this issue on a different device with Leap 15.2 and the same EXT4 on LVM on LUKS setup.

I’ve seen a similar issue reported on this thread, but it’s happening because of a btrfs cronjob, which I don’t u
https://forums.opensuse.org/showthread.php/547748-Why-does-UI-freeze-when-only-one-CPU-core-maxes-out-(BTRFS-issues)

Here are some details about the setup in question:


$ uname -a
Linux ja 5.3.18-lp152.57-default #1 SMP Fri Dec 4 07:27:58 UTC 2020 (7be5551) x86_64 x86_64 x86_64 GNU/Linux

$ inxi -zFm
System:    Kernel: 5.3.18-lp152.57-default x86_64 bits: 64 Desktop: KDE Plasma 5.18.6 Distro: openSUSE Leap 15.2
Machine:   Type: Desktop Mobo: ASUSTeK model: P8Z77-V LX2 v: Rev X.0x serial: <filter> UEFI: American Megatrends v: 2303
           date: 12/05/2013
Memory:    RAM: total: 14.58 GiB used: 1.18 GiB (8.1%)
           Array-1: capacity: 32 GiB slots: 4 EC: None
           Device-1: ChannelA-DIMM0 size: 4 GiB speed: 2133 MT/s
           Device-2: ChannelA-DIMM1 size: 4 GiB speed: 2133 MT/s
           Device-3: ChannelB-DIMM0 size: 4 GiB speed: 2133 MT/s
           Device-4: ChannelB-DIMM1 size: 4 GiB speed: 2133 MT/s
CPU:       Topology: Quad Core model: Intel Core i5-3570 bits: 64 type: MCP L2 cache: 6144 KiB
           Speed: 1991 MHz min/max: 1600/4000 MHz Core speeds (MHz): 1: 3372 2: 1950 3: 1749 4: 2415
Graphics:  Device-1: Intel Xeon E3-1200 v2/3rd Gen Core processor Graphics driver: i915 v: kernel
           Display: x11 server: X.Org 1.20.3 driver: modesetting unloaded: fbdev,vesa resolution: 1: 1920x1080~60Hz
           2: 1920x1080~60Hz
           OpenGL: renderer: Mesa DRI Intel Ivybridge Desktop v: 4.2 Mesa 19.3.4
Audio:     Device-1: Intel 7 Series/C216 Family High Definition Audio driver: snd_hda_intel
           Device-2: Microdia Camera type: USB driver: snd-usb-audio,uvcvideo
           Device-3: C-Media type: USB driver: hid-generic,snd-usb-audio,usbhid
           Sound Server: ALSA v: k5.3.18-lp152.57-default
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:    Local Storage: total: 476.94 GiB used: 187.55 GiB (39.3%)
           ID-1: /dev/sda vendor: Samsung model: SSD 860 PRO 512GB size: 476.94 GiB
Partition: ID-1: / size: 62.50 GiB used: 16.63 GiB (26.6%) fs: ext4 dev: /dev/dm-3
           ID-2: /boot size: 975.9 MiB used: 48.5 MiB (5.0%) fs: ext4 dev: /dev/dm-0
           ID-3: /home size: 124.99 GiB used: 103.25 GiB (82.6%) fs: ext4 dev: /dev/dm-5
Swap:      ID-1: swap-1 type: partition size: 16.00 GiB used: 28.5 MiB (0.2%) dev: /dev/dm-2
Sensors:   System Temperatures: cpu: 29.8 C mobo: 27.8 C
Fan Speeds (RPM): cpu: 0
Info:      Processes: 207 Uptime: 1h 10m Shell: bash inxi: 3.1.00

I’ve been using a LVM on LUKS with EXT4 setup for quite some time now, and I didn’t have this kind of problem until now.


# lsblk
NAME                      MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                         8:0    0  477G  0 disk
├─sda1                      8:1    0  952M  0 part  /boot/efi
├─sda2                      8:2    0    6G  0 part
│ ├─XBoot-NEW             254:0    0    1G  0 lvm   /boot
│ └─XBoot-OLD             254:7    0    1G  0 lvm
└─sda3                      8:3    0  470G  0 part
  └─cr-auto-1             254:1    0  470G  0 crypt
    ├─XCrypt-NEWSwap      254:2    0   16G  0 lvm   [SWAP]
    ├─XCrypt-NEW          254:3    0   64G  0 lvm   /
    ├─XCrypt-OLD          254:4    0   64G  0 lvm
    ├─XCrypt-Home         254:5    0  128G  0 lvm   /home
    ├─XCrypt-Workspace    254:6    0  128G  0 lvm   /workspace
    └─XCrypt-OLDSwap      254:8    0   16G  0 lvm

The UI freezes when I boot into the “NEW” OS, but it works just fine on the “OLD” setup.


# cryptsetup --version
cryptsetup 2.0.6

# cryptsetup -v status cr-auto-1
/dev/mapper/cr-auto-1 is active and is in use.
type:    LUKS1
cipher:  aes-xts-plain64
keysize: 256 bits
key location: dm-crypt
device:  /dev/sda3
sector size:  512
offset:  4096 sectors
size:    985663489 sectors
mode:    read/write
Command successful.
 

I saw that after the UI freezes the plasmashell process uses 99.99% of the IO. And I sometimes see X or KWin with a high load.


# Konsole tab#1
$ iotop -b > iotop.log

# Konsole tab#2
$ dd bs=1M count=4096 status=progress if=/dev/urandom of=TESTFILE.bin
# ^^ NOTE: There's no oflag=direct here.

The, I’m grepping the log file with:


$ grep -E " [0-9]{2}\.[0-9]+\ \%" iotop.log 
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 13.47 % [dmcrypt_write/2]
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.69 % [dmcrypt_write/2]
3497 be/3 root        0.00 B/s    0.00 B/s  0.00 % 44.67 % [jbd2/dm-6-8]
9654 be/4 root        0.00 B/s   19.02 K/s  0.00 % 41.71 % python3 /usr/sbin/iotop -b
3504 be/3 root        0.00 B/s  965.71 K/s  0.00 % 34.64 % [jbd2/dm-5-8]
4788 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 32.65 % kwin_x11 -session 1028c1d320b210000160868798600000027440001_1609355473_983437
4795 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 19.67 % plasmashell
4788 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 99.99 % kwin_x11 -session 1028c1d320b210000160868798600000027440001_1609355473_983437
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 97.02 % [dmcrypt_write/2]
3497 be/3 root        0.00 B/s   10.80 K/s  0.00 % 92.48 % [jbd2/dm-6-8]
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 59.04 % [kworker/u8:2-kcryptd/254:1]
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [dmcrypt_write/2]
6320 be/4 root        0.00 B/s    0.00 B/s  0.00 % 10.41 % [kworker/u8:1+kcryptd/254:1]
4788 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 99.99 % kwin_x11 -session 1028c1d320b210000160868798600000027440001_1609355473_983437
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [dmcrypt_write/2]
7794 be/4 mainuser    0.00 B/s    3.62 K/s  0.00 % 68.55 % konsole
7715 be/4 root        0.00 B/s    0.00 B/s  0.00 % 26.45 % [kworker/u8:7+kcryptd/254:1]
6320 be/4 root        0.00 B/s    0.00 B/s  0.00 % 26.07 % [kworker/u8:1+kcryptd/254:1]
8105 be/4 root        0.00 B/s    0.00 B/s  0.00 % 19.86 % [kworker/u8:3+kcryptd/254:1]
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 15.53 % [kworker/u8:2+kcryptd/254:1]
3005 be/3 root        0.00 B/s    4.92 K/s  0.00 % 91.65 % [jbd2/dm-3-8]
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 90.32 % [dmcrypt_write/2]
3497 be/3 root        0.00 B/s 1372.81 B/s  0.00 % 89.49 % [jbd2/dm-6-8]
3504 be/3 root        0.00 B/s  142.11 K/s  0.00 % 88.32 % [jbd2/dm-5-8]
9654 be/4 root        0.00 B/s    4.02 K/s  0.00 % 87.68 % python3 /usr/sbin/iotop -b
7794 be/4 mainuser    0.00 B/s   16.09 K/s  0.00 % 78.93 % konsole
4360 be/4 root        0.00 B/s  457.60 B/s  0.00 % 78.93 % X -nolisten tcp -auth /run/sddm/{1c00d56e-65c7-4cbe-b82a-cd306a904278} -background none -noreset -displayfd 17 -seat seat0 vt7 [InputThread]
9658 be/4 mainuser    0.00 B/s   68.60 M/s  0.00 % 76.39 % dd bs=1M count=4096 status=progress if=/dev/urandom of=TESTFILE.bin
9518 be/4 root        0.00 B/s    0.00 B/s  0.00 % 57.80 % [kworker/u8:4-i915]
7715 be/4 root        0.00 B/s    0.00 B/s  0.00 % 18.63 % [kworker/u8:7-events_unbound]
6320 be/4 root        0.00 B/s    0.00 B/s  0.00 % 18.30 % [kworker/u8:1-events_unbound]
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 18.15 % [kworker/u8:2-kcryptd/254:1]
8105 be/4 root        0.00 B/s    0.00 B/s  0.00 % 18.02 % [kworker/u8:3-kcryptd/254:1]
4795 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 11.97 % plasmashell
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 83.95 % [dmcrypt_write/2]
4788 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 47.61 % kwin_x11 -session 1028c1d320b210000160868798600000027440001_1609355473_983437
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 19.31 % [kworker/u8:2+kcryptd/254:1]
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 98.41 % [dmcrypt_write/2]
9518 be/4 root        0.00 B/s    0.00 B/s  0.00 % 26.28 % [kworker/u8:4+kcryptd/254:1]
6320 be/4 root        0.00 B/s    0.00 B/s  0.00 % 24.17 % [kworker/u8:1+kcryptd/254:1]
7715 be/4 root        0.00 B/s    0.00 B/s  0.00 % 24.01 % [kworker/u8:7+kcryptd/254:1]
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 20.96 % [kworker/u8:2+kcryptd/254:1]
4795 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 99.99 % plasmashell
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [dmcrypt_write/2]
3504 be/3 root        0.00 B/s    0.00 B/s  0.00 % 61.41 % [jbd2/dm-5-8]
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 25.08 % [kworker/u8:2+kcryptd/254:1]
7715 be/4 root        0.00 B/s    0.00 B/s  0.00 % 24.58 % [kworker/u8:7+kcryptd/254:1]
6320 be/4 root        0.00 B/s    0.00 B/s  0.00 % 22.98 % [kworker/u8:1+kcryptd/254:1]
9518 be/4 root        0.00 B/s    0.00 B/s  0.00 % 21.89 % [kworker/u8:4+kcryptd/254:1]
2910 be/4 root        0.00 B/s    0.00 B/s  0.00 % 75.81 % [dmcrypt_write/2]
3504 be/3 root        0.00 B/s  193.30 K/s  0.00 % 75.77 % [jbd2/dm-5-8]
3497 be/3 root        0.00 B/s 1950.09 B/s  0.00 % 73.83 % [jbd2/dm-6-8]
9654 be/4 root        0.00 B/s    8.57 K/s  0.00 % 73.77 % python3 /usr/sbin/iotop -b
3005 be/3 root        0.00 B/s    0.00 B/s  0.00 % 68.00 % [jbd2/dm-3-8]
4788 be/4 mainuser    0.00 B/s    0.00 B/s  0.00 % 65.06 % kwin_x11 -session 1028c1d320b210000160868798600000027440001_1609355473_983437
7794 be/4 mainuser    0.00 B/s 1759.65 K/s  0.00 % 23.49 % konsole
7715 be/4 root        0.00 B/s    0.00 B/s  0.00 % 15.11 % [kworker/u8:7-flush-254:6]
6320 be/4 root        0.00 B/s    0.00 B/s  0.00 % 14.47 % [kworker/u8:1-kcryptd/254:1]
9518 be/4 root        0.00 B/s    0.00 B/s  0.00 % 14.33 % [kworker/u8:4-events_unbound]
9259 be/4 root        0.00 B/s    0.00 B/s  0.00 % 13.67 % [kworker/u8:2-i915]

While creating that log, I’ve tried to stop most irrelevant processes (user applications, all sorts of background processes like file indexing, etc). dmesg doesn’t show anything while this is happening and nothing that might be relevant on startup. And in journalctl, there’s nothing before or after the freeze.

Thank you!

**@****averothe:
**

  • Which GUI?
  • Is “elevator=???” present in the Kernel Command Line?

 > cat /proc/cmdline
 > find -H /sys/block/* -iname scheduler -print0 -printf ': ' -exec cat '{}' \;

This is well known issue that has been there for as long as I remember. Large amount of writes will fill RAM cache and evict current content which includes executable programs, data that these programs are working with etc. So when program needs to run it must be first read back from disk - this is slow and it itself competes with write activity.

When you use “direct” flag dd runs as fast as device allows and does not consume additional memory.

That would have to be the default KDE desktop installation:

# zypper info plasma5-desktop
Loading repository data...
Reading installed packages...


Information for package plasma5-desktop:
----------------------------------------
Repository     : openSUSE-Leap-15.2-1
Name           : plasma5-desktop
Version        : 5.18.5-lp152.2.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 13.5 MiB
Installed      : Yes
Status         : up-to-date
Source package : plasma5-desktop-5.18.5-lp152.2.1.src
Summary        : The KDE Plasma Workspace Components
Description    :
This package contains the basic packages for a Plasma workspace.

I don’t see anything on those lines:


# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.3.18-lp152.57-default  root=/dev/mapper/XCrypt-NEW splash=silent resume=/dev/XCrypt/NEWSwap  quiet mitigations=auto

# find -H /sys/block/* -iname scheduler -print0 -printf ': ' -exec cat '{}' \;
/sys/block/dm-0/queue/scheduler: none
/sys/block/dm-1/queue/scheduler: none
/sys/block/dm-2/queue/scheduler: none
/sys/block/dm-3/queue/scheduler: none
/sys/block/dm-4/queue/scheduler: none
/sys/block/dm-5/queue/scheduler: none
/sys/block/dm-6/queue/scheduler: none
/sys/block/dm-7/queue/scheduler: none
/sys/block/dm-8/queue/scheduler: none
/sys/block/sda/queue/scheduler: [mq-deadline] kyber bfq none

Interesting. In the inxi output from above, you can see that there’s plenty of RAM to go around, so in this case, it shouldn’t be a problem. And there’s also swap, which was practically empty: 28 MiB used out of 16 GiB.

But I’ve never encountered any kind of issues before, and I’m almost always out of RAM, I usually have 2-3 browsers, plus 2 electron apps and a couple of Java apps, so there are plenty of times when there isn’t much RAM to go around, and I’ve never seen this kind of issue.

I used to see this kind of behavior with my previous desktop. I haven’t seen it with my current desktop.

My own conclusion is that it is, at least in part, hardware dependent.

I’ve kept that system and I’ve tried a few more installations on different partitions, and after some time, I realized that it was an X access control issue.

It turns out, I have changed the hostname in the “YaST Network - Network Settings” UI and I’ve probably done something else at the time and for some reason the OS was left in an inconsistent state. Restaring the OS didn’t solve the problems, even though the problems were most probably caused by xhost (server access control program for X).

To properly solve the issues, I’ve had to use hostnamectl set-hostname to properly reset the hostname and I’ve then deleted ~/.Xauthority, and after a restart, the issue went away.

Also, I think that the issue that I’ve encountered might be related to TUMBLEWEED Possible Yast issue affecting slow applications startup, hostname configuration.

P.S. I’ve also updated /etc/hosts to include an 127.0.0.1 entry for the new hostname, but this might not have had any effect.

Yes, indeed, issues such as this can be caused by low level I/O – “dd” is not the only culprit – other applications can also cause this if, they’re attempting to “improve I/O throughput” …

  • Back to “dd” – the “man” page is fairly terse – the “info” content provides more information:
 ‘direct’           Use direct I/O for data, avoiding the buffer cache.  Note that the kernel may impose restrictions on read or write buffer sizes. * For example, with an ext4 destination file system and a Linux-based kernel, using ‘oflag=direct’ will cause writes to fail with ‘EINVAL’ if the output buffer size is not a multiple of 512.*

Reading from a file and writing to another file “as quickly as possible” can often produce unwanted side-effects if, the write-buffer isn’t being managed properly –

  • Typical systems usually read faster than they can write and, modern hardware often has various caching queues implemented to improve I/O performance …

Bottom line – if a system suddenly becomes unresponsive (freezes), search for I/O intensive processes and, consider tuning their run-time priorities …
[HR][/HR]BTW – on real-time systems, this behaviour is “normal” – the human user is usually “2nd class citizen” – the processes executing the “job to be done” have absolute priority, to meet the requirements on timely completion of the compute task(s) …