systerm crash ... I ca't see anything in the logs

Hi.

I have changed the hardware on a system running Opensuse Leap 42.3 with 4 disk raid 5.
The new hardware is:

AMD RYZEN 7 1700 ( I have checked, it is not one of the pre-25 week)
G.Skill Ripjaws V Series F4-3200C16D-16GVR 16 GB (2x8GB) DDR4
Asrock AM4 AB350 Pro 4

The systems seems to run fine for hours or days and then It crash.
In the system log I can’t see anything it is just working… and then it stop working.
I have set a crontab each minute dumping sensors logs and it dos not show any problem with overheating or something similar.

I don’t know where can I look for any clue.

any help?

regards

I changed CPU and motherboard and Then it seems solved but it wasn’t.
I had other issues and I thought It was a different problem but it is the same. I will document any findings here

Please check that you have the “ucode-amd” package installed.
AFAICS, ‘microcode_amd_fam17h.bin’ is needed for AMD Ryzen 3, 5 and 7 : <https://wiki.gentoo.org/wiki/AMD_microcode>.
Check the CPU family with “ grep -F -m 1 “cpu family” /proc/cpuinfo ”.
Check that the latest microcode is being loaded at boot by means of “ dmesg | grep -i ‘microcode’ ” …


tutatis:~ # zypper se ucode-amd
Loading repository data...
Reading installed packages...

S  | Name      | Summary                        | Type   
---+-----------+--------------------------------+--------
i+ | ucode-amd | Microcode updates for AMD CPUs | package
tutatis:~ # grep -F -m 1 "cpu family" /proc/cpuinfo
cpu family      : 23
You have new mail in /var/spool/mail/root
tutatis:~ # dmesg | grep -i 'microcode' 
    4.324367] microcode: CPU0: patch_level=0x08001105
    4.324373] microcode: CPU1: patch_level=0x08001105
    4.324388] microcode: CPU2: patch_level=0x08001105
    4.324393] microcode: CPU3: patch_level=0x08001105
    4.324398] microcode: CPU4: patch_level=0x08001105
    4.324403] microcode: CPU5: patch_level=0x08001105
    4.324417] microcode: CPU6: patch_level=0x08001105
    4.324423] microcode: CPU7: patch_level=0x08001105
    4.324448] microcode: CPU8: patch_level=0x08001105
    4.324455] microcode: CPU9: patch_level=0x08001105
    4.324471] microcode: CPU10: patch_level=0x08001105
    4.324477] microcode: CPU11: patch_level=0x08001105
    4.324484] microcode: CPU12: patch_level=0x08001105
    4.324489] microcode: CPU13: patch_level=0x08001105
    4.324494] microcode: CPU14: patch_level=0x08001105
    4.324499] microcode: CPU15: patch_level=0x08001105
    4.324529] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba




I think it is saying the CPUs are patched.

It is working well since 5 days ago, since I disabled C6 state as I posted here.

best regards

@fperal:
Yes, as you mention in the other thread you’ve started, the power supply unit (PSU) may well be the root cause …

Bottom line, if the PSU cannot supply enough amperes, the voltages will drop and, the CPU will become unstable …

  • Always, always, when you install a new (high performance) CPU or, indeed, any other component, check the sum of all the amperes need and, then, check that the PSU is rated for the watts needed to supply the amperes at every used voltage
    needed to keep everything running happily … - Especially, check the amount of +12 V current needed – there are more than a few PSUs marketed which, fail to supply enough watts for the +12 V rails …
  • Also check the +3.3 V and +5 V current needed – not usually an issue but, there a quite a few PSUs which claim a capability of 100s of watts but, fail to deliver the amperes …

The only people who will be happy are, the “really good” «read expensive» PSU vendors and, the electricity supply authorities …