Sudden reboot issue

I have been experiencing the sudden random issue for about 4 years using Linux
(first Ubuntu, then openSUSE).

I bought components (CPU, RAM, motherboard, etc.) in 2019 and installed Ubuntu
(switched to openSUSE just last week),
hardware information:

  • ASUS ROG Strix z390-H motherboard
  • Intel i9 9900K
  • 64 GB 3000 MHz
  • 750 W powersource
  • 1 TB WD SSD SATA

I started to experience the random reboot issue right after I put everything
together. Reboots often happen between several hours to several days after the
machine is turned on. The funny thing is when I tried to stress test the CPU or
memtest the RAM, everything is fine, and there is no reboot during the testing
process. I also notice that if I just ssh into the machine without actually
using the graphical user interface, reboots happen less frequently (one reboot
every couple of days). I tried a lot of different approach at that time but
could not solve the problem. I have been thinking that maybe it is just because
I bought the CPU and RAM with very bad quality, and I basically just ignored the
issue and lived with it.

This year, I got a new Hynix PCI-E 1 TB SSD, and installed Windows 10 on it. I
am surprised that there is no reboot issue on Windows, I can leave the machine
open for 15 days without any problem. So I thought Ubuntu is causing the reboot
, and I switched to openSUSE last week.

I am again disappointed that openSUSE still have the reboot issue. I tried
following approaches and none of them works:

  • I don’t think there is any overheating problem, CPU temperature is normally
    35 - 55
  • check the output of “journal -b -1 -e”, no relevant error information happens,
    it seems like the reboot is not caused by the operating system. If anyone is
    interested, I can upload a screenshot of the output.
  • Tried to record kernel log using the command “dmesg --follow > dmesg.log”.
    after the reboot, I checked the file and there is no relevant error
    information.
  • Set kernel.panic to 0. Didn’t work.
  • Turn off every energe saving technique for CPU including: Speed Shift,
    SpeedStep, C-states. No luck.
  • Install the Intel graphical card driver: xf86-video-intel
  • Install the microcode package: ucode-intel.

I really cannot proceed with the debug process because I can see conflicts here:

  1. If the reboot is caused by hardware, then why can Windows run normally?
  2. If the reboot is caused by software, why can’t I find any relevant
    information in the logging? My assumption is that if the reboot is caused by
    software, e.g. a kernel panic or something, at least I should see some related
    information in the log.

Could this be related to the desktop environment? My personal experience is that
if I just lock the screen and ssh into the machine, or just using terminal and
open webpages without multimedia content, then the time to reboot is much
longer than watching movies or having video conference.

Any information would be useful, thanks.

BTW, I also upgraded BIOS to the latest version, still cannot solve the issue.

Additional information, when run command “last”, the reason for reboot is “crash”

Spurious errors are hard to track down and fix: https://www.youtube.com/watch?v=M7Q-4cLyRzw

When infamous host erlangen exhibited kernel oops the root cause was readily detected:

Regarding Windows and Tumbleweed a single bit can cause the different behaviour. You presented a nice narrative, but little factual information. You may try harder.