System freezes/crash in brand new instalation

Greetings!

Recently I built a machine with these Specifications:

Processor: AMD Ryzen 5 3600
Motherboard: Biostar B450MH
Memory: 8GB DDR4 Kingston
Storage: SSD M2 Western Digital 240GB NVMe and HD Western Digital 3TB (for user data)
Graphics Card: AMD Radeon RX550 4GB

I installed OpenSUSE Leap 15.4 with kernel 5.14.21-150400.22-default** . **********I used the following partitioning scheme:

NVMe1p1 - 600 MB for UEFI mounted on /boot/uefi (formated in FAT)
NVMe1P2 - 8 GB for Swap
NVMe1P3 - 100 for the system mounted in
/ (formated in btrfs)
NVMe1P4 - 114 GB mounted in
/var
(formated in btrfs)
sda1 mounted in /home (formated in btrfs)

The problem is that the system is unstable and occasionally freezes, occasionally loses the graphical interface and black screens, woth only the cursor (in these cases, I can log on the user with Ctrl+Alt+F3, but only in terminal). I never tried to reboot the graphical interface because I don’t know which commands to use.

I searched by YaST for the system logs, but I can’t identify the cause of the problem. I ask:

In which log files do i search for the problem?

I executed Memtest, and it found no issues with RAM

I executed autotest from the SSD using the BIOS of the motherboard, and again no problem found.

I swapped the motherboard for another of the same model and it didn’t fix the issue.

I’d be really thankful for all help.

Carlos Teixeira.

This may be connected to https://forums.opensuse.org/showthread.php/572707-Problem-with-kernel-5-3-18-150300-59-81?p=3143338#post3143338?

Well, I admit that the others are about 15.3 and another kernel. So better forget my suggestion. Sorry.

@cabteixeira:

After my upgrade from Leap 15.3 to Leap 15.4 on this machine –


Operating System: openSUSE Leap 15.4
KDE Plasma Version: 5.24.4
KDE Frameworks Version: 5.90.0
Qt Version: 5.15.2
Kernel Version: 5.14.21-150400.22-default (64-bit)
Graphics Platform: X11
Processors: 8 × AMD Ryzen 5 3400G with Radeon Vega Graphics
Memory: 13.5 GiB of RAM
Graphics Processor: AMD Radeon™ Vega 11 Graphics

– when I log into a KDE Plasma Wayland session, the Plasmashell intermittently crashes with a Segment Fault …

  • Logging into a KDE Plasma X11 session doesn’t suffer this issue.

[HR][/HR]Everyone else:

  • Yes, yes, I attempted to submit a KDE Bug Report via DrKonqi but, it pointed to many Bug Reports with the same Stack Trace and, the same Segment Fault …
    Maybe, I’ll re-address this issue in a week or two …

I’m using Gnome… But in one of the times I was able to look at the system logs, I did document a Segmentation Error as well. My question is: Where are the log files that document this error? I can only find a single log using YaST, located at /var/log. Are there other locations that these logs are stored?

Logged in as root on any vtty:

systemctl restart xdm

I have various installations which segfault preventing the greeter from appearing on boot. This works on all (I think - I haven’t been tracking which have this issue).

The first time I encountered this routinely resulted in my filing this still open bug, where consensus seems to be a timing issue, with greeter trying to start before Xorg is far enough along in its start process, or Xorg trying to start before the GPU’s kernel driver has loaded. Comment 17 there has a workaround that might be helpful here.

I’ve raised this additional KDE Plasma Bug #456990 against this issue.

The systemd tool to deal with Core Dumps is “coredumpctl” –


 > coredumpctl 
Hint: You are currently not seeing messages from other users and the system.
      Users in the 'systemd-journal' group can see all messages. Pass -q to
      turn off this notice.
TIME                           PID  UID GID SIG     COREFILE EXE                                   SIZE
Wed 2022-06-15 17:38:20 CEST 26091 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Wed 2022-06-15 18:25:03 CEST  5079 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Wed 2022-06-15 18:44:19 CEST  5059 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Wed 2022-06-15 20:45:59 CEST 11335 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Fri 2022-06-17 12:42:35 CEST 17949 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Sat 2022-06-18 13:43:12 CEST 11040 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Mon 2022-06-20 18:15:28 CEST  1338 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Tue 2022-06-21 18:46:50 CEST  3998 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Wed 2022-06-22 18:22:27 CEST 21644 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Thu 2022-06-23 12:02:42 CEST 11328 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Sat 2022-06-25 18:36:43 CEST 31065 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
 .
 .
 .
Wed 2022-07-13 17:02:12 CEST 16254 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Fri 2022-07-15 19:17:09 CEST 27370 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Fri 2022-07-15 19:17:11 CEST  2887 1000 100 SIGSEGV missing  /usr/bin/plasmashell                   n/a
Fri 2022-07-15 20:20:43 CEST  6825 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Sat 2022-07-16 15:53:21 CEST 12141 1000 100 SIGABRT missing  /usr/bin/kglobalaccel5                 n/a
Mon 2022-07-18 18:22:32 CEST 12578 1000 100 SIGABRT present  /usr/bin/kglobalaccel5              501.3K
Tue 2022-07-19 15:04:11 CEST  6836 1000 100 SIGABRT present  /usr/lib64/firefox/crashreporter      1.0M
Tue 2022-07-19 15:04:38 CEST  6958 1000 100 SIGABRT present  /usr/lib64/firefox/crashreporter      1.0M
Tue 2022-07-19 15:05:08 CEST  7135 1000 100 SIGABRT present  /usr/lib64/firefox/crashreporter      1.0M
Wed 2022-07-20 10:16:29 CEST 32063 1000 100 SIGSEGV present  /usr/sbin/mariadbd                    6.3M
Wed 2022-07-20 10:17:45 CEST  6665 1000 100 SIGSEGV present  /usr/bin/plasmashell                 35.1M
Wed 2022-07-20 10:18:51 CEST  6834 1000 100 SIGSEGV present  /usr/sbin/mariadbd                    3.2M
Wed 2022-07-20 10:20:44 CEST  8015 1000 100 SIGSEGV present  /usr/bin/plasmashell                 34.4M
Wed 2022-07-20 10:28:12 CEST  9203 1000 100 SIGSEGV present  /usr/bin/plasmashell                 61.4M
Wed 2022-07-20 10:28:32 CEST  9373 1000 100 SIGSEGV present  /usr/sbin/mariadbd                    3.0M
Thu 2022-07-21 15:53:17 CEST 20233 1000 100 SIGSEGV present  /usr/sbin/mariadbd                    3.4M
Thu 2022-07-21 19:45:30 CEST 28254 1000 100 SIGSEGV present  /usr/bin/plasmashell                 62.1M
Thu 2022-07-21 19:46:31 CEST 30064 1000 100 SIGSEGV present  /usr/bin/plasmashell                 34.5M
lines 29-76/76 (END)

– type “q” to quit the pager.

The Core Dumps are stored in the user’s systemd Journal file located in a sub-directory of ‘/var/log/’.

This is a screenshot of /var/log on the machine. I don’t see “systemd” file here…
https://drive.google.com/file/d/15mP2MsPaRF_I6a1HBocVtm18zfgnMJIz/view?usp=sharing

I believe that I didn’t properly explain what my problem was in the first Post.

Explaining Better: I built a computer, that has been crashing/freezing. In various moments, It appeared to me that only the graphical part of the sistem broke, since the screen would go black with a cursor on the top left corner. When this black screen appeared, if i typed Ctrl+Alt+F3, I could type the username and password in Localhost, but I didn’t know which command to reload the graphic interface.

I believe the problem is in the hardware. I installed Windows 10 on the machine for testing purposes, and it gets blue screen of death. I updated to Windows 11, and even then the blue screen problems persist.** Windows’s error logs indicate something related to “Kernel-Power”. **

Searching on google about this Kernel-Power error, the majority of forums point to a Power Supply Unit issue. I swapped the Unit and reinstalled OpenSUSE, but the issue stays exactly the same! I already swapped the motherboard, ran MemTest (to test RAM memory module) and **executed the SSD’s auto test **(in the motherboard BIOS), but the issue persists.

I’m lost, because everything indicate a Hardware issue, but where? In what Component?

The components are all under warranty, and I need to find out which one of them is defective. I swapped the mother board and Power Supply Unit, and it didn’t solve the issue.

What OpenSUSE Log Files can help me?

I’ll be thankful for any help!

https://drive.google.com/file/d/15mP2MsPaRF_I6a1HBocVtm18zfgnMJIz/view?usp=sharinghttps://drive.google.com/file/d/15mP2MsPaRF_I6a1HBocVtm18zfgnMJIz/view?usp=sharing

You won’t. Systemd’s journal is in /var/log/journal//, accessed by journalctl command. If /var/log/journal does not exist, create it, and then the journal will persist across future boots. You can create it either manually, or via /etc/systemd/journal.conf.

I didn’t know which command to reload the graphic interface.
I answered this already, but it won’t help. With two versions of Windows similarly behaving badly, it’s quite obviously hardware-related, though-hardware related often means a BIOS upgrade can provide a solution. Have you confirmed you have the latest BIOS installed? Have you reported this trouble to Biostar, or was it your vendor that exchanged your motherboard?

ran MemTest (to test RAM memory module)
How? I have no faith in memtest86**+** for DDR4 RAM, or running any memory tester for merely one pass. RAM tests should run for hours. Memtest86 (not +) defaults to 4 passes. That should be the minimum RAM test. Overnight is better.

Modern motherboards support multi-channel RAM, most commonly dual-channel, such as yours. Dual-channel operation requires RAM sticks installed as matched pairs. Performance suffers greatly when only one RAM stick is installed, up to nearly 50%. It could be in your case, if you only have one stick, it’s sporadically fatal to not have dual channel enabled.

What OpenSUSE Log Files can help me?
Ordinarily, journal, via journalctl, and dmesg. However, these aren’t always helpful with hardware diagnosis. Often the only way forward is parts swapping.

Swapping CPU is another option with your motherboard, which supports AMD APUs, a CPU with a GPU on same die, working through the HDMI and VGA ports on your B450MH motherboard. Using an APU instead of the Ryzen™ 5 3600 might pinpoint an incompatibility between the motherboard and discrete GPU, or a problem with that model motherboard’s PCIeX16 slot. You might find AMD’s or Biostar’s RMA department will rent a loaner for several weeks for the purpose of diagnosis, if you ask. IIRC, AMD once made an offer to do this for a customer I was helping, but we identified the problem first.

@cabteixeira:

There was a Kernel patch released yesterday evening (21 Jul 2022 18:35:09 CEST) which may well help to alleviate your problems: openSUSE-SLE-15.4-2022-2520

  • This patch contains repairs relevant to AMD Hardware and, a security repair:
    - CVE-2021-26341: Some AMD CPUs may transiently execute beyond unconditional direct branches, which may potentially result in data leakage (bsc#1201050).

    - drm/amd: Add USBC connector ID (git-fixes).
    - drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj (git-fixes).
    - drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled (git-fixes).
    - drm/amd: Check if ASPM is enabled from PCIe subsystem (git-fixes).
    - drm/amd/display: Add affected crtcs to atomic state for dsc mst unplug (git-fixes).
    - drm/amd/display: Add pstate verification and recovery for DCN31 (git-fixes).
    - drm/amd/display: Add signal type check when verify stream backends same (git-fixes).
    - drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT (git-fixes).
    - drm/amd/display: Cap OLED brightness per max frame-average luminance (git-fixes).
    - drm/amd/display: Cap pflip irqs per max otg number (git-fixes).
    - drm/amd/display: Check if modulo is 0 before dividing (git-fixes).
    - drm/amd/display: DCN3.1: do not mark as kernel-doc (git-fixes).
    - drm/amd/display: Disabling Z10 on DCN31 (git-fixes).
    - drm/amd/display: do not ignore alpha property on pre-multiplied mode (git-fixes).
    - drm/amd/display: Do not reinitialize DMCUB on s0ix resume (git-fixes).
    - drm/amd/display: Enable power gating before init_pipes (git-fixes).
    - drm/amd/display: FEC check in timing validation (git-fixes).
    - drm/amd/display: Fix allocate_mst_payload assert on resume (git-fixes).
    - drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes() (git-fixes).
    - drm/amd/display: fix audio format not updated after edid updated (git-fixes).
    - drm/amd/display: Fix memory leak (git-fixes).
    - drm/amd/display: Fix memory leak in dcn21_clock_source_create (bsc#1190786)
    - drm/amd/display: Fix OLED brightness control on eDP (git-fixes).
    - drm/amd/display: Fix p-state allow debug index on dcn31 (git-fixes).
    - drm/amd/display: fix yellow carp wm clamping (git-fixes).
    - drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels (git-fixes).
    - drm/amd/display: For vblank_disable_immediate, check PSR is really used (git-fixes).
    - drm/amd/display: Protect update_bw_bounding_box FPU code (git-fixes).
    - drm/amd/display: Read Golden Settings Table from VBIOS (git-fixes).
    - drm/amd/display: Remove vupdate_int_entry definition (git-fixes).
    - drm/amd/display: Revert FEC check in validation (git-fixes).
    - drm/amd/display: Update VTEM Infopacket definition (git-fixes).
    - drm/amd/display: Update watermark values for DCN301 (git-fixes).
    - drm/amd/display: Use adjusted DCN301 watermarks (git-fixes).
    - drm/amd/display: Use PSR version selected during set_psr_caps (git-fixes).
    - drm/amd/display: watermark latencies is not enough on DCN31 (git-fixes).
    - drm/amdgpu: add beige goby PCI ID (git-fixes).
    - drm/amdgpu: bypass tiling flag check in virtual display case (v2) (git-fixes).
    - drm/amdgpu: check vm ready by amdgpu_vm->evicting flag (git-fixes).
    - drm/amdgpu: conduct a proper cleanup of PDB bo (git-fixes).
    - drm/amdgpu/cs: make commands with 0 chunks illegal behaviour (git-fixes).
    - drm/amdgpu: disable MMHUB PG for Picasso (git-fixes).
    - drm/amdgpu/display: add support for multiple backlights (git-fixes).
    - drm/amdgpu: do not do resets on APUs which do not support it (git-fixes).
    - drm/amdgpu: do not enable asic reset for raven2 (git-fixes).
    - drm/amdgpu: do not set s3 and s0ix at the same time (git-fixes).
    - drm/amdgpu: do not use BACO for reset in S3 (git-fixes).
    - drm/amdgpu: do not use passthrough mode in Xen dom0 (git-fixes).
    - drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count (git-fixes).
    - drm/amdgpu: Enable gfxoff quirk on MacBook Pro (git-fixes).
    - drm/amdgpu: Ensure HDA function is suspended before ASIC reset (git-fixes).
    - drm/amdgpu: explicitly check for s0ix when evicting resources (git-fixes).
    - drm/amdgpu: fix amdgpu_ras_block_late_init error handler (bsc#1190497)
    - drm/amdgpu: fix logic inversion in check (git-fixes).
    - drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() (git-fixes).
    - drm/amdgpu: Fix recursive locking warning (git-fixes).
    - drm/amdgpu: fix suspend/resume hang regression (git-fixes).
    - drm/amdgpu/sdma: Fix incorrect calculations of the wptr of the doorbells (git-fixes).
    - drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix (git-fixes).
    - drm/amdgpu/smu10: fix SoC/fclk units in auto mode (git-fixes).
    - drm/amdgpu: suppress the warning about enum value 'AMD_IP_BLOCK_TYPE_NUM' (git-fixes).
    - drm/amdgpu/ucode: Remove firmware load type check in amdgpu_ucode_free_bo (git-fixes).
    - drm/amdgpu: unify BO evicting method in amdgpu_ttm (git-fixes).
    - drm/amdgpu: update VCN codec support for Yellow Carp (git-fixes).
    - drm/amdgpu/vcn: Fix the register setting for vcn1 (git-fixes).
    - drm/amdgpu/vcn: improve vcn dpg stop procedure (git-fixes).

Try to install this Leap 15.4 patch and, everything else in the Patch List and then, reboot (you’ll have to anyway) and see if, the issue has been resolved.

OK. I’ll research on google how I do that. Thank you!

I already upgraded to the latest BIOS version on Biostar’s website. I exchanged the motherboard for another of the same model at the same vendor, and the issue stays the same, so I don’t think the motherboard is the issue. The BIOS upgrade also didn’t fix the issue.

I used Memtest86. There were hours of tests, with 4 phases. I was already able to diagnose a memory issue with it on another machine, a long time ago.
I didn’t know Memtest86 had trouble with DDR4 Memory…

I’m using only 1 8GB stick, on single channel.

I already swapped the Motherboard and Power Supply Unit. The problem stayed the same. Swapping parts is very complicated in Brazil because the warranty is only valid after the vendor has verified the problem… and the problem is hard to replicate (intermittent defect), and by Murphy’s Law, the issue never shows itself when the vendor is testing the specific part… and that’s not even counting WHAT tests the vendors do - on the Motherboard’s case, all the vendor did was install a processor, a memory stick, plug the Power Supply and observe if the motherboard arrived at the BIOS configuration screen. When I complained about the efficacy of the test,since there wasn’t even an Operating System running, the vendor decided to swap the board…

Everything in Brazil is complicated… changing the CPU for an APU is only possible if I buy the component, because I don’t believe AMD will ship one of these components here… Exactly because of this that i’m trying desperately to identify the problem by using system logs, but so far I haven’t been able to do that. Incompatibility with the graphics card doesn’t seem to be the case, because I tested it with another card (a RX460 from xfx) and the problem still happened.

I don’t see a way out, if you can’t get a rental or loaner from Biostar and/or AMD, for needing to buy a different motherboard and/or APU, then sell (or return to vendor within free return period) after diagnosis is complete. This smells like a BIOS interrupt mishandling of the PCIeX16 slot or PCIe bus, or outright design defect in the motherboard or its chipset.

You could try NVME removal, running the OS off the HDD, to see whether it has an impact on the issue; and (about which I have extreme doubt) vice versa.

Ok.

I used Memtest86. There were hours of tests, with 4 phases. I was already able to diagnose a memory issue with it on another machine, a long time ago. I didn’t know Memtest86 had trouble with DDR4 Memory… I’m using only 1 8GB stick, on single channel.

Try this one: https://www.memtest86.com/downloads/memtest86-usb.zip Run all of the default tests.

I already swapped the Motherboard and Power Supply Unit. The problem stayed the same. Swapping parts is very complicated in Brazil because the warranty is only valid after the vendor has verified the problem… and the problem is hard to replicate (intermittent defect), and by Murphy’s Law, the issue never shows itself when the vendor is testing the specific part… and that’s not even counting WHAT tests the vendors do - on the Motherboard’s case, all the vendor did was install a processor, a memory stick, plug the Power Supply and observe if the motherboard arrived at the BIOS configuration screen. When I complained about the efficacy of the test,since there wasn’t even an Operating System running, the vendor decided to swap the board…

You need to test another processor.

Before trying this you may want to make sure the freeze is not caused by the power line: https://forums.opensuse.org/showthread.php/561241-Segfault-Trouble-Shooting Use a power strip with a filter.

Something I rarely think about, as I always have mine plugged into UPSes.

I never thought of using UPSes. Electric power has been extremely reliable since ever at Erlangen. Some years ago failure to properly operate the coffeemaker triggered the residual-current device. But the circuitry of homes can emit challenging levels of electromagnetic radiation. Older users tend to have fan heaters with a typical capacity of 3 kW.

Thank you! Thank you very much. I will install this patch today.

************** UPDATE ******************

In the last few days, I noticed that, when the computer runs Minecraft ONLY, it displays NO ISSUES AT ALL. And those problems ONLY show themselves when I try to open ANY WEBBROWSER. (Mozilla Firefox and/or Google Chrome) I don’t believe this to be a hardware issue because of this strange detail.

The system crashed two times today…

Yesterday I already changed the memory - I put two band new sticks of DDR4 memory (Patriot with 8GB each, in dual channel).
I really don’t know what else to do… I’ll try, on Monday, change the processor on the seller, in the warranty…

Thanks a lot to everyone!

I don’t know what to do anymore…I replaced the Processor, and the problem remains!

Now I can say all that remained was the Case and the Discs (SSD and HD) from the original computer that i built…All the other components were replaced, and the issue remains unaltered. And to make matters stranger, Today I booted the machine with Ubuntu 22.04 LTS in a Pendrive (connected directly to one of the motherboard’s USB slots), and even then the system rebooted itself!

Could it be hardware incompatibility? Such as Processor X Chipset? Or Chipset X GPU?

I think you should get in communication with a highly experienced support tech @Biostar for suggestions. Before you do that, take the mobo out of the case. Put it on the plastic mat it was on in the box it came in, and hook everything up except the front panel wires. If there was no plastic mat, use an unpainted board or board, such as paint mixing sticks, or some hard plastic, cardboard, or poster board, to ensure the bottom of the board is insulated against any possible shorting. You can start it by jumping briefly the two conductors where the start switch attaches. Only use SSD or HDD at a time, to ensure problem is not either of them or the ports to which they connect. Make sure it’s isolated against child, dog and cat interference.

I have seen several systems with B450 and Ryzen 5 3600, but never heard about issues with the two components. This machine was freezing when idle: https://forums.opensuse.org/showthread.php/519529-Konsole-With-quot-su-quot-Freezing-The-System Issue was solved by moving to Tumbleweed. You may give it a try.