Display Freeze on Laptop and attached HDMI monitor / AMD graphic card

FlyingWhale · May 6, 2021, 4:15pm

First I have to apologies, if I used a wrong subcategory or make another mistake. This is my first post and I am not so used to forums. But I am eager to learn.

My problem:
Over the last 3 weeks my the display on my laptop and the attached HDMI monitor freeze after random times. Mostly it happens while I am using Firefox. During the freeze, I still can see and move the mouse point and can switch with CTRL+ALT+F1 to the console (only once it was completely frozen).

Due to this, I think the problem is correlated with the graphic driver and Xorg. I already tried the ideas in this post (http://forums.opensuse.org/showthread.php/544132-Video-freezes-using-HDMI-monitor?p=2962984) and remove the f86-video-ati as well as the f86-video-amdgpu driver.

Some information about my system:


maximilian@linux-nllo:~> inxi -GS
System:    Host: linux-nllo Kernel: 5.3.18-lp152.72-default x86_64 bits: 64 Desktop: KDE Plasma 5.18.6 
           Distro: openSUSE Leap 15.2 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel 
           Device-2: Chicony type: USB driver: uvcvideo 
           Display: x11 server: X.Org 1.20.3 driver: modesetting unloaded: fbdev,vesa resolution: 1: 1920x1080~60Hz 
           2: 1920x1080~60Hz 
           OpenGL: renderer: AMD RAVEN (DRM 3.33.0 5.3.18-lp152.72-default LLVM 9.0.1) v: 4.5 Mesa 19.3.4

If you need more information, please let me know.

Thanks in advance

mrmazda · May 6, 2021, 9:38pm

inxi -GS falls a little short of enough information. inxi -GSxxy or inxi -GSay would be best, but only after first running inxi -U. -U should update inxi to a less broken version than the antique provided by 15.2. The broken older versions do not support -y.

I suspect Picasso is new enough that the amdgpu driver is likely to be preferable to the modesetting, so I would reinstall xf86-video-amdgpu, but not right away. First:

log out of Plasma, delete the content of ~/.cache/, then log back into Plasma to see if it helped. If not,
go into desktop settings > display & monitor > compositor, deselect enable on startup, apply, then log out and back into Plasma
if cache deletion helped, use it long enough to get a reasonable feel for overall performance, then consider reinstalling xf86-video-amdgpu

This list is not exhaustive, just starting points.

marel · May 6, 2021, 10:12pm

Although this could well a graphic driver/Xorg problem I would not rule out yet the problem of running low on memory.

You can monitor that using the KSysGuard “Sytem Load” tab and it gives you also the CPU usage.
Keep it running somewhere on the screen where it remains visible.

mrmazda · May 6, 2021, 10:47pm

Speaking of low on memory, is this laptop using any type of SSD? If yes, it could be an issue of fstrim configuration not optimized for the use case. If fstrim gets triggered while the system is busy, everything could come to a stop until the fstrim process(es) is/are finished. Or it could be because fstrim is needed but not getting triggered when appropriate.

FlyingWhale · May 7, 2021, 11:16am

Thanks for the great support. You are all awesome.

mrmazda
inxi -GS falls a little short of enough information. inxi -GSxxy or inxi -GSay would be best, but only after first running inxi -U. -U should update inxi to a less broken version than the antique provided by 15.2. The broken older versions do not support -y.


maximilian@linux-nllo:~> inxi -GSxxy
System:
  Host: linux-nllo Kernel: 5.3.18-lp152.72-default x86_64 bits: 64 
  compiler: gcc v: 7.5.0 Desktop: KDE Plasma 5.18.6 tk: Qt 5.12.7 wm: kwin_x11 
  dm: SDDM Distro: openSUSE Leap 15.2 
Graphics:
  Device-1: AMD Picasso vendor: Lenovo ThinkPad E595 driver: amdgpu v: kernel 
  bus-ID: 05:00.0 chip-ID: 1002:15d8 
  Device-2: Chicony type: USB driver: uvcvideo bus-ID: 3-2:3 
  chip-ID: 04f2:b604 
  Display: server: X.Org 1.20.3 compositor: kwin_x11 driver: 
  loaded: modesetting unloaded: fbdev,vesa alternate: ati resolution: 
  1: 1920x1080~60Hz 2: 1920x1080~60Hz s-dpi: 96 
  OpenGL: renderer: AMD RAVEN (DRM 3.33.0 5.3.18-lp152.72-default LLVM 9.0.1) 
  v: 4.5 Mesa 19.3.4 direct render: Yes

log out of Plasma, delete the content of ~/.cache/, then log back into Plasma to see if it helped.

I did this. Also, while I observed the system load via KSysGuard, I noticed that CPU usage remains around 20% after starting firefox (or other applications e.g. matlab). Then I discovered /tmp was 26 GiB large. So, I removed the containing (I have no idea why it was so full). As expected, my CPU usage is now much smaller (between 0-2%). Maybe this solves the problem.

When I don’t run into new freezes, I will reinstall xf86-video-amdgpu.

mrmazda Speaking of low on memory, is this laptop using any type of SSD? If yes, it could be an issue of fstrim configuration not optimized for the use case. If fstrim gets triggered while the system is busy, everything could come to a stop until the fstrim process(es) is/are finished. Or it could be because fstrim is needed but not getting triggered when appropriate.

Yes, my laptop has two SSD. If the first solution wouldn’t resolve the problem, I will take a deeper look in this direction. Thank you.

mrmazda · May 7, 2021, 6:49pm

/tmp needs to be watched, and if it’s still growing, the cause needs to be found and corrected.

FlyingWhale · May 12, 2021, 10:17am

Seems that I still running into freezes.

Once after 4 hours of working and today directly after the start (The freeze occur in the login screen from plasma).

During the first freeze I monitored it with KSysGuard. No spike in the CPU occupancy was observable (~3%). Between the first and the second freeze I reinstalled xf86-video-amdgpu, which improves the graphic performance (especially in Zoom meetings), but did not solve the problem.

Also I disabled the compositor on start, but also without observable changes.

mrmazda
If yes, it could be an issue of fstrim configuration not optimized for the use case.

Any idea how I can look into this?


maximilian@linux-nllo:~> inxi -GSxxy
System:
  Host: linux-nllo Kernel: 5.3.18-lp152.72-default x86_64 bits: 64 
  compiler: gcc v: 7.5.0 Desktop: KDE Plasma 5.18.6 tk: Qt 5.12.7 wm: kwin_x11 
  dm: SDDM Distro: openSUSE Leap 15.2 
Graphics:
  Device-1: AMD Picasso vendor: Lenovo ThinkPad E595 driver: amdgpu v: kernel 
  bus-ID: 05:00.0 chip-ID: 1002:15d8 
  Device-2: Chicony type: USB driver: uvcvideo bus-ID: 3-2:3 
  chip-ID: 04f2:b604 
  Display: x11 server: X.Org 1.20.3 compositor: kwin_x11 driver: 
  loaded: amdgpu unloaded: fbdev,modesetting,vesa alternate: ati resolution: 
  1: 1920x1080~60Hz 2: 1920x1080~60Hz s-dpi: 96 
  OpenGL: renderer: AMD RAVEN (DRM 3.33.0 5.3.18-lp152.72-default LLVM 9.0.1) 
  v: 4.5 Mesa 19.3.4 direct render: Yes

mrmazda · May 12, 2021, 11:14am

Has /tmp/ been growing obscenely again?

[quote="“FlyingWhale,post:7,topic:145604”]

Any idea how I can look into this?[/QUOTE]Start with:

systemctl cat fstrim.timer
systemctl status fstrim.timer

FlyingWhale · May 12, 2021, 1:15pm

Has /tmp/ been growing obscenely again?

Partly. It isn’t as big as last week befor I cleared it (in the moment ~800 KiB). But, there is still data, which shouldn’t be there (e.g. a PDF, I opened in the browser on Monday).

Start with:


maximilian@linux-nllo:~> systemctl cat fstrim.timer
# /usr/lib/systemd/system/fstrim.timer
[Unit]
Description=Discard unused blocks once a week
Documentation=man:fstrim

[Timer]
OnCalendar=weekly
AccuracySec=1h
Persistent=true

[Install]
WantedBy=timers.target
maximilian@linux-nllo:~> systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
   Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Wed 2021-05-12 13:05:01 CEST; 1min 3s ago
  Trigger: Mon 2021-05-17 00:00:00 CEST; 4 days left
     Docs: man:fstrim

I am really thankful for your help.

FlyingWhale · May 12, 2021, 5:42pm

I had again a crash some minuets ago. Now I can provide you some log information using journalctl -k:


Mai 12 16:54:20 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:54:20 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:54:20 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:54:20 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:54:20 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:54:20 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:54:22 linux-nllo kernel: input: 04:CB:88:80:95:86 as /devices/virtual/input/input24
Mai 12 16:54:25 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 257
Mai 12 16:54:25 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 257
Mai 12 16:54:25 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 257
Mai 12 16:55:09 linux-nllo kernel: gmc_v9_0_process_interrupt: 38 callbacks suppressed
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107383000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107385000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010737b000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010737d000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107381000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107387000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107388000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010737f000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107379000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010737a000 from 27
Mai 12 16:55:09 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: gmc_v9_0_process_interrupt: 281614 callbacks suppressed
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073c3000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107397000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073c5000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107389000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107396000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073c1000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073bd000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010737f000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107379000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073ae000 from 27
Mai 12 16:55:14 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: gmc_v9_0_process_interrupt: 282070 callbacks suppressed
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073a8000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073aa000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073b0000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x00008001073b2000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010737b000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107388000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010738c000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x000080010738a000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107394000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 1900 thread X:cs0 pid 1901)
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000800107392000 from 27
Mai 12 16:55:19 linux-nllo kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Mai 12 16:55:20 linux-nllo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Mai 12 16:56:38 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:56:38 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:56:38 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:56:38 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:56:38 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:56:38 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:57:51 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:57:51 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:57:51 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:57:51 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 0
Mai 12 16:57:58 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 257
Mai 12 16:57:58 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 257
Mai 12 16:57:58 linux-nllo kernel: Bluetooth: hci0: SCO packet for unknown connection handle 257

Interestingly, there are some Bluetooth issues directly before the freeze, which never turned up in the log prior to this point. But I do not really see a connection.

mrmazda · May 12, 2021, 7:02pm

Please show input/output from:

ls -Gg /var/lib/systemd/timers/

Also run:

mount | grep ' / '

If output shows btrfs, also paste input/output from:

sudo btrfs filesystem df /

Otherwise, paste:

df /

FlyingWhale · May 12, 2021, 9:11pm


maximilian@linux-nllo:~> ls -Gg /var/lib/systemd/timers/
insgesamt 0
-rw-r--r-- 1 0 12. Mai 09:12 stamp-backup-rpmdb.timer
-rw-r--r-- 1 0 12. Mai 09:12 stamp-backup-sysconfig.timer
-rw-r--r-- 1 0 10. Mai 07:34 stamp-btrfs-balance.timer
-rw-r--r-- 1 0 14. Okt 2019  stamp-btrfs-defrag.timer
-rw-r--r-- 1 0  3. Mai 08:46 stamp-btrfs-scrub.timer
-rw-r--r-- 1 0 14. Okt 2019  stamp-btrfs-trim.timer
-rw-r--r-- 1 0 12. Mai 09:12 stamp-check-battery.timer
-rw-r--r-- 1 0 10. Mai 07:34 stamp-fstrim.timer
-rw-r--r-- 1 0 12. Mai 09:12 stamp-logrotate.timer
-rw-r--r-- 1 0 12. Mai 09:12 stamp-mandb.timer


maximilian@linux-nllo:~> mount | grep ' / '
/dev/nvme0n1p6 on / type ext4 (rw,noatime)
maximilian@linux-nllo:~> df /
Dateisystem    1K-Blöcke Benutzt Verfügbar Verw% Eingehängt auf
/dev/nvme0n1p6  49630092 8729056  38350236   19% /

mrmazda · May 13, 2021, 12:11am

The times in the output show when the timer last fired. Does this fstrim time correlate to a freeze instance? Are you normally busy with your laptop at that time of day and that day of the week? If yes, it could be the, or a, problem, so you might want to adjust the fire time.

/dev/nvme0n1p6  49630092 8729056  38350236   19% /

Clearly freespace on / is not your problem.

Because they might either be a problem, or normal amdgpu noise, I searched for those messages, but found nothing.

marel · May 13, 2021, 12:18am

The bluetooth problems have almost sure nothing to do with the freeze but the “[gfxhub] retry page fault - VM_L2_PROTECTION_FAULT_STATUS:0x00101031” looks to me like the culprit, see for example:

mrmazda · May 13, 2021, 1:27am

Reading those two makes me wonder if ucode-amd is installed, or if the installed BIOS is the latest available.

FlyingWhale · May 17, 2021, 9:31am

Reading those two makes me wonder if ucode-amd is installed

ucode-amd is installed and also received an update yesterday. Maybe this will solve it.

marel
The bluetooth problems have almost sure nothing to do with the freeze but the “[gfxhub] retry page fault - VM_L2_PROTECTION_FAULT_STATUS:0x00101031” looks to me like the culprit, see for example:

https://arstechnica.com/civis/viewto…f=16&t=1442897

[Solved] System freezing every now and then - Linux Mint Forums

Reading those, and also these two threads:

I could also imagine, that the problem is related to the graphic library. I have Mesa 19.3.4-lp152.27.1. Open Suse 15.3 should support the newer version of Mesa 20.2. Therefore, I will upgrade my system to 15.3 as soon as I run in a freeze again.

FlyingWhale · May 17, 2021, 1:34pm

After the first freeze today, I upgraded the system to 15.3 with the newer graphic library.

But again, I had a freeze some minutes ago.

mrmazda
The times in the output show when the timer last fired. Does this fstrim time correlate to a freeze instance? Are you normally busy with your laptop at that time of day and that day of the week? If yes, it could be the, or a, problem, so you might want to adjust the fire time.

As last time, I checked ‘ls -Gg /var/lib/systemd/timers/’, but the time stamps are not correlated.

FlyingWhale · June 3, 2021, 10:53am

Now, after one and a half week without a freeze, I think I found a solution.

First I tried to add some boot parameters according to other solutions I found:

amd_iommu=pt iommu=soft http://www.reddit.com/r/thinkpad/comments/ckkbej/t495_linux_avoid/
But that didn’t work either.

In the end the solution was to upgrade the kernel, according to this: http://bugs.mageia.org/show_bug.cgi?id=25882

Now, I am using the 5.12.8-1.g9404e18-default x86_64 Kernel and everything works fine.

Thanks for your help.

Svyatko · June 3, 2021, 11:08am

Consider to file a bug report for this.

Leap 15.3 is using kernel 5.3 patched up to 5.9.