Hardware sensors are randomly freezing

I’m experience a bizarre problem since upgrading my motherboard. At random times, the hardware sensors appear to freeze and no longer deliver updates for several seconds at a time. I can see this in KSysGuard where I have a tab with a few charts showing CPU and GPU temperatures updated every second. Every now and then, the updates simply stop and the charts freeze in place. If I go to the default Process Table tab I still see activity there, but everything else (including the System Load tab, with CPU and RAM / SWAP history) stops. Switching between tabs or restarting the system monitor doesn’t solve it. But if I wait somewhere over 10 seconds, they start working again. This never happened before and I’d like to know if the issue is at least known and being worked on.

Hi
Open a terminal session and watch the output from;


watch -n1 sensors

Does this output freeze as well, if not the then the issue will likely be DE related.

Interesting, the freezes don’t seem to occur in the console. It must be KDE related then. I’ll bring this up on their forum as well, but if anyone here knows what it might be let me know please.

Hi
One other check to make is create a test user and login as that user and configure the sensor tool, does it duplicate the issue?

Here’s a screenshot showing the oddity of the issue: I quickly restarted KSysGuard upon noticing the issue, then went to the System Load tab. Upon doing that I found not only the charts frozen, but the values didn’t load at all and showed nothing (not even 0). I had to wait for roughly 5 seconds before it unfroze and the numbers appeared.

What’s even worse is that if I shut down KSysGuard while this is happening, the CPU History tab is permanently cleared and no cores ever appear in it any more. I have to go to ~/.local/share/ksysguard/ and delete the file SystemLoad2.sgrd to make the system monitor generate a new one in which case the entries return.

https://i.imgur.com/bc8KA1o.png

I just noticed one more thing: This also affects the system monitor widgets placed on the desktop. Ever since this issue started, I’ve been noticing that plasmoids such as CPU Load Monitor, Memory Status, Hard Disk Space Usage, Hard Disk IO Monitor, Network Monitor show fixed and incorrect values for several seconds at a time. For instance: The network monitor may say that I’m downloading at precisely 50 KB/s for 5 seconds in a row, although I’m clearly transferring nothing during this time… exact same thing with HDD I/O usage. For the first few days I thought some program started networking or using the HDD in an unusual way; It just now clicked that this is the same problem, after I noticed two different system monitors freezing and unfreezing this way at the exact same time.

https://i.imgur.com/VkCLyqQ.png

I just discovered another important clue related to this issue. It’s not just the sensors that are freezing: Other processes are too. I’ve been noticing this since I switched motherboards, but was convinced it was an entirely unrelated problem. I just now spotted that new applications will not start up while the sensors are frozen, but will start the moment they unfreeze.

Some practical examples: If I open a new tab in Firefox which requires opening a new Web process, Firefox will freeze during the 5 second period that sensors don’t update. Or if I write “sudo zypper dup” in the console to do an update, my cursor moves to the next line when pressing enter, but nothing happens if the sensors are frozen at that time… the line asking me to input my password appears the moment they unfreeze. It also appears I can’t close certain applications during a freeze: If I try closing Dolphin in such a moment, the window becomes gray and I’m asked if I want to terminate the unresponsive process… however it disappears and Dolphin closes normally at unfreeze.

So it seems something is blocking both the hardware sensors and some processes starting up or shutting down, though it seems not to affect processes that are already running. What could cause such strange behavior? Traditionally those things used to happen due to the disk I/O scheduler causing processes to go into disk sleep mode, but nothing seems to be using the hard drive while the problem occurs nor does KSysGuard show affected processes as being in “disk sleep”. I already disabled SWAP with “swapoff -a” and it’s not related to it.

I wonder if this might be related to a Ryzen specific issue I managed to find in search. Several threads suggest it might have something to do with the CPU going into idle power mode:

https://community.amd.com/thread/244175
https://bbs.archlinux.org/viewtopic.php?id=245608
https://forum.manjaro.org/t/amd-ryzen-problems-and-fixes/55533
https://forum.level1techs.com/t/random-freezes-on-ryzen-in-linux-even-if-linux-is-in-vm/138913

They suggest trying a few Kernel parameters, as well as changing some settings in the UEFI. If no other ideas come up I might do that and see if it changes anything.

idle=nomwait processor.max_cstate=5 rcu_nocbs=0-11

Eh, usual question: Your repos?


zypper lr -d

All core packages are installed from the official repos.

mircea@linux-qz0r:~> zypper lr -d
#  | Alias                  | Name                               | Enabled | GPG Check | Refresh | Priority | Type   | URI                                                                                     | Service
---+------------------------+------------------------------------+---------+-----------+---------+----------+--------+-----------------------------------------------------------------------------------------+--------
 1 | non-oss-addon_2014-0   | openSUSE-Tumbleweed-Non-Oss        | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/tumbleweed/repo/non-oss/                                   |        
 2 | non-oss-addon_2014-0_1 | openSUSE-Tumbleweed-Source-Non-Oss | No      | ----      | ----    |   99     | rpm-md | http://download.opensuse.org/source/tumbleweed/repo/non-oss/                            |        
 3 | openSUSE_20181107-0    | openSUSE-Tumbleweed-Oss            | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/tumbleweed/repo/oss/                                       |        
 4 | openSUSE_20181107-0_1  | openSUSE-Tumbleweed-Source-Oss     | No      | ----      | ----    |   99     | rpm-md | http://download.opensuse.org/source/tumbleweed/repo/oss/                                |        
 5 | openSUSE_20181107-0_2  | openSUSE-Tumbleweed-Debug          | No      | ----      | ----    |   99     | rpm-md | http://download.opensuse.org/debug/tumbleweed/repo/oss/                                 |        
 6 | openSUSE_Factory       | openSUSE-Factory-Network-Telephony | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/repositories/network:/telephony/openSUSE_Factory/          |        
 7 | openSUSE_Leap_15.0     | openSUSE-Tumbleweed-Snowglobe      | Yes     | ( p) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/repositories/home:/lemmy04:/snowglobe/openSUSE_Tumbleweed/ |        
 8 | openSUSE_Tumbleweed    | openSUSE-Tumbleweed-Games          | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/repositories/games/openSUSE_Tumbleweed/                    |        
 9 | openSUSE_Tumbleweed_1  | openSUSE-Tumbleweed-Games-Tools    | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/repositories/games:/tools/openSUSE_Tumbleweed/             |        
10 | openSUSE_Tumbleweed_2  | openSUSE-Tumbleweed-KDE-Unstable   | No      | ----      | ----    |   99     | rpm-md | http://download.opensuse.org/repositories/KDE:/Unstable:/Extra/openSUSE_Tumbleweed/     |        
11 | openSUSE_Tumbleweed_3  | openSUSE-Tumbleweed-Emulators      | No      | ----      | ----    |   99     | rpm-md | http://download.opensuse.org/repositories/Emulators/openSUSE_Tumbleweed/                |        
12 | openSUSE_Tumbleweed_5  | openSUSE-Tumbleweed-Packman        | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://packman.inode.at/suse/openSUSE_Tumbleweed/                                       |        
13 | openSUSE_Tumbleweed_6  | openSUSE-Tumbleweed-Filesystems    | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | https://download.opensuse.org/repositories/filesystems/openSUSE_Tumbleweed              |        
14 | tumbleweed             | openSUSE-Tumbleweed-Update         | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/update/tumbleweed/

Sigh. KDE_Unstable/Extra … , some home: repos. What can I say that I haven’t said before? That using these is bound to break something?

KDE_Unstable is disabled and not used, I believe I only used it to install one package that was too out of date long ago (KVIrc). From the home: repos I only install custom software that isn’t available in the official one, I don’t replace any of the core components or processes running in the background.

Hi
If you ignore the widgets and bling, run the watch -n 1 sensors command, if there are no freezes seen, then it’s a desktop issue…

Run htop in a terminal as well and look for anomalies…

There is no freeze with that command. It seems to be process related though: It even causes Firefox to become temporarily frozen, and that’s not part of KDE nor a QT application.

Running htop doesn’t seem to turn up anything unusual: I can see the system monitor widget values freezing, but htop continues updating itself correctly as if nothing is happening.

Turns out this might be related to disk sleeping after all: If I run “sudo zypper dup” in the console then look at zypper in KSysGuard, the sudo process does say “disk sleep” during the duration of the freeze.

https://i.imgur.com/ZmpNhvS.png

My bet now is that the new motherboard is causing a new process to be spawned which is freezing the drive for other processes. Or could it be a disk scheduler bug? I seem to be running the proper BFQ scheduler just in case this matters.

mircea@linux-qz0r:~> cat /sys/block/sdb/queue/scheduler
mq-deadline kyber [bfq] none

I seem to be onto something! When sorting by process status in htop, I see a load of processes going into “disk sleep” mode during those freezes. Those shown with the red D are included:

https://i.imgur.com/qdA3StI.png

Question now is what’s triggering it. Nothing appears to be causing unusual drive I/O… that’s is why I didn’t suspect this in the first place, even though I knew the behavior is associated with the drive scheduler.

Hi
Configure to none and see if that helps, also check the output from the hdparm -I /dev/sdX.

After some more testing I have finally found the culprit: The freezes are being caused by the Network Manager and / or the WPA supplicant process. They are the first to go into disk sleep each freeze as shown by htop, dragging other processed down with them in the next second. Further more, clicking the NetworkManager icon in the system tray to list my connections often triggers such a freeze on the spot before the panel shows, identical to the one occurring automatically every minute!

Looking at /var/log/wpa_supplicant.log I’m seeing one message being constantly logged there (every few minutes). Not sure if it’s related to the freeze but just in case:

1573327565.215899: wlan0: Reject scan trigger since one is already pending

I can confirm that disabling Wifi causes the problem to finally go away, a workaround I’ll stick to for now as I don’t currently need wireless internet on my desktop. This likely has something having to do with the Network Manager service.