Constant plasma segfault Leap 42.3 + possible hardware fault?

This is one of those weird mutating errors…

Since a couple of days I started having freezes (but with a moving mouse cursor) and error messages during shutdown/reboot from console. These freezes occurred after the computer was idle for a long time, for example I left a torrent downloading and in the morning the desktop was frozen, but the computer was still downloading, according to the ethernet switch light. It happens both on the desktop and the lock screen. Mouse pointer moves, can’t type or use keyboard shortcuts but can switch to a console with ALT+CTRL+F1.

So At first I thought it could be a problem with plasma, 42.3 not supported anymore, should upgrade, etc.

First reboot/shutdown attempts from terminal, however, would repeat dmsg-like usb connection/disconnection messages, endlessly, of a logitech basic usb wired mouse, that however worked OK in another machine. I have a screenshot if necessary.

Then this message, repeated about 9 times in sequence before shutdown:

nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state

Video is GFX 1050, nvidia blob, both DP and HDMI connected to 4K displays.

Searching nvidia forum it seems this could be related to hardware degradation, but not necessarily. And this is a relatively new video board (1 year+). However their attitude regarding linux is “you’re on your own”, so no definite reason.

I changed the mouse and disconnected the HDMI monitor, just to keep things simple. No more GPU errors at shutdown, but I think this happened before disconnecting the monitor.

Since the beginning Firefox started to close and recover, at first once in a while but then it got more frequent. When this happened plasma would restart, with the segfault report (that window with the ladybug tray icon).

Now it is segfaulting for all and nothing, for example just moving the mouse over a kwrite window does it.

After this dmesg show this segment repeated numerous times, with the BUG lines on a red background:


**  280.828837] BUG: Bad page map in process plasmashell  pte:000000fd pmd:5efe6c067**
  280.828838] page:ffffea0000000000 count:0 mapcount:-59 mapping:          (null) index:0x0
  280.828839] flags: 0x410(dirty|reserved)
  280.828839] page dumped because: bad pte
  280.828840] addr:00007f2ab246c000 vm_flags:08000070 anon_vma:          (null) mapping:ffff8805ed0f68f0 index:f5
  280.828853] file:libcomposeplatforminputcontextplugin.so fault:filemap_fault mmap:btrfs_file_mmap [btrfs] readpage:btrfs_readpage [btrfs]
  280.828853] CPU: 2 PID: 4420 Comm: plasmashell Tainted: P    B      O     4.4.180-102-default #1
  280.828854] Hardware name: ASUS All Series/H87M-E, BIOS 2201 06/18/2015
  280.828855]  0000000000000000 ffffffff8134c867 00007f2ab246c000 00000005f05e1000
  280.828857]  ffffffff811cd2dc
  280.828857]  00000000000000f5
  280.828858]  ffff8805efe6c360 00007f2ab246c000
  280.828858]  ffffea0000000000 00007f2ab246d000 00007f2ab2581000 ffff8805c4a37c60
  280.828860] Call Trace:
  280.828862]  <ffffffff8101b0c9>] dump_trace+0x59/0x350
  280.828864]  <ffffffff8101b4ba>] show_stack_log_lvl+0xfa/0x180
  280.828865]  <ffffffff8101c2b1>] show_stack+0x21/0x40
  280.828867]  <ffffffff8134c867>] dump_stack+0x5c/0x85
  280.828870]  <ffffffff811cd2dc>] print_bad_pte+0x1ec/0x290
  280.828871]  <ffffffff811cf02b>] unmap_page_range+0x85b/0x8b0
  280.828873]  <ffffffff811cfa62>] unmap_vmas+0x42/0x90
  280.828874]  <ffffffff811d94f8>] exit_mmap+0x88/0x130
  280.828876]  <ffffffff81083c50>] mmput+0x60/0x120
  280.828878]  <ffffffff81089c38>] do_exit+0x288/0xbd0
  280.828879]  <ffffffff8108a5f9>] do_group_exit+0x39/0xa0
  280.828880]  <ffffffff810966eb>] get_signal+0x19b/0x860
  280.828882]  <ffffffff81018393>] do_signal+0x23/0x700
  280.828883]  <ffffffff8108140c>] exit_to_usermode_loop+0x70/0xc2
  280.828885]  <ffffffff81003b0b>] syscall_return_slowpath+0x8b/0xa0
  280.828887]  <ffffffff81652eb4>] int_ret_from_sys_call+0x8/0x6d
  280.829813] DWARF2 unwinder stuck at int_ret_from_sys_call+0x8/0x6d

  280.829814] Leftover inexact backtrace:

**  280.973260] BUG: Bad rss-counter state mm:ffff880602dc3040 idx:0 val:-224
  280.973262] BUG: Bad rss-counter state mm:ffff880602dc3040 idx:2 val:-1**


HOWEVER I’ve opened a gnome session a few times with no problems, I’m typing this from one. SO WTF???

I’ll upgrade to 15.1 anyway and see if it improve things, but I’d like to ask if anyone have experienced these errors.

Thanks,

Bruno

It’s a hardware issue. this box dualboot with windows 10 and it goes BSOD with ATTEMPTED WRITE TO READONLY MEMORY error.

Also as soon as I finished the post above in gnome firefox crashed I couldn’t shut down cleanly.

I think it’s not related to firefox - chrome also crashes in w10.

I’m running w10 memory diag tool now, no errors so far (this box has 24 GB RAM).

Next step is remove the nvidia gfx and to switch to the intel onboard video. We’ll see…

You are probably in the right track, looking at hardware issues.

After exchanging all parts (PSU, GPU, CPU and RAM) one by one with a same-socket backup box, everything worked except the motherboard. I had to transfer the good MB to the main computer, as the backup cabinet won’t support all features like number of drive bays, custom cooler, fans, etc. Both boxes are gpt/uefi and dualboot openSUSE (42.2 in the main and 15.2 in the backup) and windows 10 and have nvidia cards with the proprietary blob. However, somewhere during the swapping both installs (i.e., with the backup and the main box system drives), grub got lost, and now it’s booting directly to windows, although all drives are seen correctly in BIOS/UEFI. But this is a subject to a new thread.