Since kernel 6.9, running on only 1 core

Hello!

Since upgrading to kernel 6.9.x on opensuse tumbleweed, all my apps run on mostly only 1 core of the CPU.

CPU is AMD FX 8350 8 core(*)
Motherboard is GA-990FXA-UD7 (rev 1.x) (**)

(Yes, I know this is very ancient hardware)

htop while firefox loads with currently latest 6.9.1:


(Only CORE0 sees most of activity)

cpupower monitor:

 CPU| C0   | Cx   | Freq  || POLL | C1   | C2    
   0| 13.87| 86.13|  2418||  0.00|  1.92| 84.38
   1|  0.35| 99.65|  1414||  0.00|  0.01| 99.65
   2|  0.05| 99.95|  1656||  0.00|  0.01| 99.93
   3|  0.06| 99.94|  1458||  0.00|  0.00| 99.93
   4|  0.18| 99.82|  1677||  0.00|  0.02| 99.80
   5|  0.02| 99.98|  2398||  0.00|  0.00| 99.98
   6|  0.45| 99.55|  2733||  0.00|  0.02| 99.55
   7|  0.08| 99.92|  2446||  0.00|  0.00| 99.97

htop while firefox loads with previous kernel 6.8.9:


(threads are spread across all cores).

cpupower monitor:

    | Mperf              || Idle_Stats         
 CPU| C0   | Cx   | Freq  || POLL | C1   | C2    
   0|  2.86| 97.14|  1430||  0.00|  0.86| 96.39
   7|  8.78| 91.22|  1406||  0.00|  2.31| 89.06
   3|  4.07| 95.93|  1839||  0.00|  0.06| 95.96
   1|  3.40| 96.60|  1575||  0.00|  0.82| 95.95
   6|  2.46| 97.54|  1396||  0.00|  0.02| 97.59
   5|  3.51| 96.49|  1396||  0.00|  1.24| 95.36
   2|  3.19| 96.81|  1472||  0.00|  0.00| 96.87

this is the suspicous entry I find in the logs (journalctl --boot):

on kernel 6.9.1, while smpboot is initialising the other cores, I get a bunch of errors:

jun 03 10:02:33 saturn kernel: smpboot: CPU0: AMD FX(tm)-8350 Eight-Core Processor (family: 0x15, model: 0x2, stepping: 0x0)
jun 03 10:02:33 saturn kernel: Performance Events: Fam15h core perfctr, AMD PMU driver.
jun 03 10:02:33 saturn kernel: ... version:                0
jun 03 10:02:33 saturn kernel: ... bit width:              48
jun 03 10:02:33 saturn kernel: ... generic registers:      6
jun 03 10:02:33 saturn kernel: ... value mask:             0000ffffffffffff
jun 03 10:02:33 saturn kernel: ... max period:             00007fffffffffff
jun 03 10:02:33 saturn kernel: ... fixed-purpose events:   0
jun 03 10:02:33 saturn kernel: ... event mask:             000000000000003f
jun 03 10:02:33 saturn kernel: signal: max sigframe size: 1776
jun 03 10:02:33 saturn kernel: rcu: Hierarchical SRCU implementation.
jun 03 10:02:33 saturn kernel: rcu:         Max phase no-delay instances is 1000.
jun 03 10:02:33 saturn kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
jun 03 10:02:33 saturn kernel: smp: Bringing up secondary CPUs ...
jun 03 10:02:33 saturn kernel: smpboot: x86: Booting SMP configuration:
jun 03 10:02:33 saturn kernel: .... node  #0, CPUs:      #2 #4 #6
jun 03 10:02:33 saturn kernel: __common_interrupt: 2.55 No irq handler for vector
jun 03 10:02:33 saturn kernel: __common_interrupt: 4.55 No irq handler for vector
jun 03 10:02:33 saturn kernel: __common_interrupt: 6.55 No irq handler for vector
jun 03 10:02:33 saturn kernel:  #1 #3 #5 #7
jun 03 10:02:33 saturn kernel: __common_interrupt: 1.55 No irq handler for vector
jun 03 10:02:33 saturn kernel: ------------[ cut here ]------------
jun 03 10:02:33 saturn kernel: WARNING: CPU: 3 PID: 0 at kernel/sched/core.c:6482 sched_cpu_starting+0x193/0x250
jun 03 10:02:33 saturn kernel: Modules linked in:
jun 03 10:02:33 saturn kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.9.1-1-default #1 openSUSE Tumbleweed c5471a56f12c40709b95530f47f6c0b39e75f136
jun 03 10:02:33 saturn kernel: Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD7/GA-990FXA-UD7, BIOS F11d 07/09/2013
jun 03 10:02:33 saturn kernel: RIP: 0010:sched_cpu_starting+0x193/0x250
jun 03 10:02:33 saturn kernel: Code: 00 8b 0d 80 33 fd 01 39 c8 0f 83 6c ff ff ff 48 63 d0 48 8b 3c d5 00 be 4f 8b 4c 01 e7 39 c3 75 c7 4c 89 b7 68 0c 00 00 eb c7 <0f> 0b eb c3 be 04 00 00 00 89 df e8 dd 51 02 00 84 c0 0f 85 71 ff
jun 03 10:02:33 saturn kernel: RSP: 0000:ffffb279800e3e28 EFLAGS: 00010006
jun 03 10:02:33 saturn kernel: RAX: 0000000000000001 RBX: 0000000000000003 RCX: 0000000000000008
jun 03 10:02:33 saturn kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff94586eabc740
jun 03 10:02:33 saturn kernel: RBP: ffff94554004fc98 R08: 0000000000000003 R09: ffff94586eb80000
jun 03 10:02:33 saturn kernel: R10: ffff94554004fc98 R11: 0000000000000006 R12: 000000000003c740
jun 03 10:02:33 saturn kernel: R13: 000000000003c740 R14: ffff94586eb3c740 R15: 0000000000000003
jun 03 10:02:33 saturn kernel: FS:  0000000000000000(0000) GS:ffff94586eb80000(0000) knlGS:0000000000000000
jun 03 10:02:33 saturn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jun 03 10:02:33 saturn kernel: CR2: 0000000000000000 CR3: 00000001aba36000 CR4: 00000000000406f0
jun 03 10:02:33 saturn kernel: Call Trace:
jun 03 10:02:33 saturn kernel:  <TASK>
jun 03 10:02:33 saturn kernel:  ? sched_cpu_starting+0x193/0x250
jun 03 10:02:33 saturn kernel:  ? __warn.cold+0xa8/0x102
jun 03 10:02:33 saturn kernel:  ? sched_cpu_starting+0x193/0x250
jun 03 10:02:33 saturn kernel:  ? report_bug+0xd8/0x150
jun 03 10:02:33 saturn kernel:  ? handle_bug+0x3c/0x80
jun 03 10:02:33 saturn kernel:  ? exc_invalid_op+0x17/0x70
jun 03 10:02:33 saturn kernel:  ? asm_exc_invalid_op+0x1a/0x20
jun 03 10:02:33 saturn kernel:  ? sched_cpu_starting+0x193/0x250
jun 03 10:02:33 saturn kernel:  ? sched_cpu_starting+0x16a/0x250
jun 03 10:02:33 saturn kernel:  ? __pfx_sched_cpu_starting+0x10/0x10
jun 03 10:02:33 saturn kernel:  cpuhp_invoke_callback+0xf8/0x450
jun 03 10:02:33 saturn kernel:  __cpuhp_invoke_callback_range+0x67/0xb0
jun 03 10:02:33 saturn kernel:  start_secondary+0x9c/0x140
jun 03 10:02:33 saturn kernel:  common_startup_64+0x13e/0x141
jun 03 10:02:33 saturn kernel:  </TASK>
jun 03 10:02:33 saturn kernel: ---[ end trace 0000000000000000 ]---
jun 03 10:02:33 saturn kernel: __common_interrupt: 3.55 No irq handler for vector
jun 03 10:02:33 saturn kernel: __common_interrupt: 5.55 No irq handler for vector
jun 03 10:02:33 saturn kernel: __common_interrupt: 7.55 No irq handler for vector
jun 03 10:02:33 saturn kernel: smp: Brought up 1 node, 8 CPUs
jun 03 10:02:33 saturn kernel: smpboot: Total of 8 processors activated (64306.06 BogoMIPS)
jun 03 10:02:33 saturn kernel: ------------[ cut here ]------------
jun 03 10:02:33 saturn kernel: WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2408 build_sched_domains+0x724/0x1310
jun 03 10:02:33 saturn kernel: Modules linked in:
jun 03 10:02:33 saturn kernel: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.9.1-1-default #1 openSUSE Tumbleweed c5471a56f12c40709b95530f47f6c0b39e75f136
jun 03 10:02:33 saturn kernel: Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD7/GA-990FXA-UD7, BIOS F11d 07/09/2013
jun 03 10:02:33 saturn kernel: RIP: 0010:build_sched_domains+0x724/0x1310
jun 03 10:02:33 saturn kernel: Code: 04 41 89 56 3c 48 8b 15 72 2e 8f 02 48 63 4d 14 39 34 8a 0f 8e 9c fe ff ff 25 e9 ef ff ff 80 cc 04 41 89 46 3c e9 8b fe ff ff <0f> 0b 41 be f4 ff ff ff 48 8b 44 24 70 8b 10 85 d2 0f 84 09 02 00
jun 03 10:02:33 saturn kernel: RSP: 0018:ffffb27980033d88 EFLAGS: 00010202
jun 03 10:02:33 saturn kernel: RAX: 00000000ffffff01 RBX: 0000000000000000 RCX: 00000000ffffff01
jun 03 10:02:33 saturn kernel: RDX: 00000000fffffff8 RSI: 0000000000000003 RDI: ffff94554004f660
jun 03 10:02:33 saturn kernel: RBP: ffff945540234a00 R08: ffff94554004f660 R09: 0000000000000000
jun 03 10:02:33 saturn kernel: R10: ffffb27980033d50 R11: 0000000039b6461e R12: 0000000000000001
jun 03 10:02:33 saturn kernel: R13: ffff94554004f018 R14: 0000000000000001 R15: ffff94554004f2c0
jun 03 10:02:33 saturn kernel: FS:  0000000000000000(0000) GS:ffff94586ea00000(0000) knlGS:0000000000000000
jun 03 10:02:33 saturn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jun 03 10:02:33 saturn kernel: CR2: ffff9455eca01000 CR3: 00000001aba36000 CR4: 00000000000406f0
jun 03 10:02:33 saturn kernel: Call Trace:
jun 03 10:02:33 saturn kernel:  <TASK>
jun 03 10:02:33 saturn kernel:  ? build_sched_domains+0x724/0x1310
jun 03 10:02:33 saturn kernel:  ? __warn.cold+0xa8/0x102
jun 03 10:02:33 saturn kernel:  ? build_sched_domains+0x724/0x1310
jun 03 10:02:33 saturn kernel:  ? report_bug+0xd8/0x150
jun 03 10:02:33 saturn kernel:  ? handle_bug+0x3c/0x80
jun 03 10:02:33 saturn kernel:  ? exc_invalid_op+0x17/0x70
jun 03 10:02:33 saturn kernel:  ? asm_exc_invalid_op+0x1a/0x20
jun 03 10:02:33 saturn kernel:  ? build_sched_domains+0x724/0x1310
jun 03 10:02:33 saturn kernel:  ? build_sched_domains+0x35b/0x1310
jun 03 10:02:33 saturn kernel:  ? alloc_cpumask_var_node+0x23/0x40
jun 03 10:02:33 saturn kernel:  ? alloc_cpumask_var_node+0x23/0x40
jun 03 10:02:33 saturn kernel:  ? __pfx_kernel_init+0x10/0x10
jun 03 10:02:33 saturn kernel:  sched_init_smp+0x3e/0xc0
jun 03 10:02:33 saturn kernel:  ? stop_machine+0x30/0x40
jun 03 10:02:33 saturn kernel:  ? __pfx_kernel_init+0x10/0x10
jun 03 10:02:33 saturn kernel:  kernel_init_freeable+0x137/0x2a0
jun 03 10:02:33 saturn kernel:  ? __pfx_kernel_init+0x10/0x10
jun 03 10:02:33 saturn kernel:  kernel_init+0x1a/0x130
jun 03 10:02:33 saturn kernel:  ret_from_fork+0x34/0x50
jun 03 10:02:33 saturn kernel:  ? __pfx_kernel_init+0x10/0x10
jun 03 10:02:33 saturn kernel:  ret_from_fork_asm+0x1a/0x30
jun 03 10:02:33 saturn kernel:  </TASK>
jun 03 10:02:33 saturn kernel: ---[ end trace 0000000000000000 ]---

whereas with kernel 6.8.9, smpboot seems to run without problems:

jun 03 11:03:48 saturn kernel: smpboot: CPU0: AMD FX(tm)-8350 Eight-Core Processor (family: 0x15, model: 0x2, stepping: 0x0)
jun 03 11:03:48 saturn kernel: RCU Tasks: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
jun 03 11:03:48 saturn kernel: RCU Tasks Rude: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
jun 03 11:03:48 saturn kernel: RCU Tasks Trace: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
jun 03 11:03:48 saturn kernel: Performance Events: Fam15h core perfctr, AMD PMU driver.
jun 03 11:03:48 saturn kernel: ... version:                0
jun 03 11:03:48 saturn kernel: ... bit width:              48
jun 03 11:03:48 saturn kernel: ... generic registers:      6
jun 03 11:03:48 saturn kernel: ... value mask:             0000ffffffffffff
jun 03 11:03:48 saturn kernel: ... max period:             00007fffffffffff
jun 03 11:03:48 saturn kernel: ... fixed-purpose events:   0
jun 03 11:03:48 saturn kernel: ... event mask:             000000000000003f
jun 03 11:03:48 saturn kernel: signal: max sigframe size: 1776
jun 03 11:03:48 saturn kernel: rcu: Hierarchical SRCU implementation.
jun 03 11:03:48 saturn kernel: rcu:         Max phase no-delay instances is 1000.
jun 03 11:03:48 saturn kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
jun 03 11:03:48 saturn kernel: smp: Bringing up secondary CPUs ...
jun 03 11:03:48 saturn kernel: smpboot: x86: Booting SMP configuration:
jun 03 11:03:48 saturn kernel: .... node  #0, CPUs:      #2 #4 #6 #1 #3 #5 #7
jun 03 11:03:48 saturn kernel: smp: Brought up 1 node, 8 CPUs
jun 03 11:03:48 saturn kernel: smpboot: Max logical packages: 1
jun 03 11:03:48 saturn kernel: smpboot: Total of 8 processors activated (64313.13 BogoMIPS)
> grep 'smp' boot.6.*
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smpboot: Allowing 8 CPUs, 0 hotplug CPUs
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smpboot: CPU0: AMD FX(tm)-8350 Eight-Core Processor (family: 0x15, model: 0x2, stepping: 0x0)
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smp: Bringing up secondary CPUs ...
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smpboot: x86: Booting SMP configuration:
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smp: Brought up 1 node, 8 CPUs
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smpboot: Max logical packages: 1
boot.6.8.txt:jun 03 11:03:48 saturn kernel: smpboot: Total of 8 processors activated (64313.13 BogoMIPS)
boot.6.9.txt:jun 03 10:02:33 saturn kernel: smpboot: CPU0: AMD FX(tm)-8350 Eight-Core Processor (family: 0x15, model: 0x2, stepping: 0x0)
boot.6.9.txt:jun 03 10:02:33 saturn kernel: smp: Bringing up secondary CPUs ...
boot.6.9.txt:jun 03 10:02:33 saturn kernel: smpboot: x86: Booting SMP configuration:
boot.6.9.txt:jun 03 10:02:33 saturn kernel: smp: Brought up 1 node, 8 CPUs
boot.6.9.txt:jun 03 10:02:33 saturn kernel: smpboot: Total of 8 processors activated (64306.06 BogoMIPS)
boot.6.9.txt:jun 03 10:02:33 saturn kernel:  sched_init_smp+0x3e/0xc0

Does anybody has an idea how to further investigate / report this bug?


(*): 4 modules of 2 half-cores each, sharing L2 cache and FPU between both half-cores.
(**): The rev 1.x uses a BIOS-based firmware (with some EFI compatibility layer), not a UEFI-based firmware unlike later rev 3.x,

Looking around for this specific sequence of errors:

Seems to bring these discussions:
https://bbs.archlinux.org/viewtopic.php?id=262899

Seems that this has been fixed with firmware updates in the past, but I am not holding my breath for 13 years old motherboard to be replaced.

It depends on your frame of reference. I don’t go for the 3 years and buy a new one philosophy. My newest AMD PC is only 15 months newer than yours (tech intro date, not purchase date, and 5 years old when I bought mine; my newest 39 months). Maybe give kernel-longterm-6.6.32 a try? Why does a “very ancient” CPU need the very latest kernel? :slight_smile:

Then it is a regression and you need to submit bug report. https://bugzilla.opensuse.org/, same user/password as here.

1 Like

Submitting Bug Reports should help with what to do there.

Bug 1225968 opened.

Even the previous 6.8.9 works correctly. It’s 6.9.x specific.

The latest kernel comes with latests bugfixes in several drivers such as file systems (i use BTRSF extensively, and that one still gets bugfixes and improvement very regularily) and comes as part of the rolling distro on which I rely.
But yes, I could switch to a LTS version.

(Also cool to find another fellow antique hardware hoarder :sweat_smile: )

Is SMT/HT involved? I’d disable it if it is.

I deployed Tumbleweed recently on Phenom II X3 and X4 for a NAS and haven’t noticed anything odd about CPU load, but haven’t had any real reason to check.

6.9.3-1-default

dmesg | grep smpboot
[    0.252212] [    T1] smpboot: CPU0: AMD Phenom(tm) II X4 965 Processor (family: 0x10, model: 0x4, stepping: 0x3)
[    0.253231] [    T1] smpboot: x86: Booting SMP configuration:
[    0.257250] [    T1] smpboot: Total of 4 processors activated (27302.69 BogoMIPS)

Indeed, but a longterm kernel is a currently maintained kernel, similar security fixes in both it and latest stable, so should be more secure than using 6.8.9 until issue solved in 6.9.x.

This comes from my second newest obtained new, 5.5 years old this month, only moved to 24/7 service 11 months ago, when the previous 24/7 turned 10. :slight_smile:

Hi. I wanted to chime in here since I seem to be having this exact same problem on Endeavour with an FX6300 cpu. Everything is constrained to one core. It definitely seems to be Linux 6.9-specific. The 6.8 kernel series runs just fine, but I have that same behavior on 6.9.1 - 6.9.3 so far. I ran 6.8.9 for a couple weeks, hoping that it would be fixed in a dot release, but have since fallen back to the LTS kernel.

The thread over on Endeavour’s forums:

My hardware, from inxi -FAZ:

Machine:
Type: Desktop Mobo: Gigabyte model: GA-970A-UD3 serial:
BIOS: Award v: F8f date: 12/16/2013
CPU:
Info: 6-core model: AMD FX-6300 bits: 64 type: MT MCP cache: L2: 6 MiB
Speed (MHz): avg: 1400 min/max: 1400/3500 cores: 1: 1400 2: 1400 3: 1400
4: 1400 5: 1400 6: 1400

Bulldozer / Piledriver architecture doesn’t have SMT / HT.
Are cores are physical (separate ALU, etc.), but share some components (cache, FPU) in pairs inside module.

Phenom II are K10 architecture. I presume that cores initialise differently.

Try:

dmesg | grep mtrr

I got this, only on 6.9.x:

jun 03 10:02:33 saturn kernel: mtrr: your CPUs had inconsistent variable MTRR settings
jun 03 10:02:33 saturn kernel: mtrr: probably your BIOS does not setup all CPUs.
jun 03 10:02:33 saturn kernel: mtrr: corrected configuration.

Interesting because it’s a very closerly related hardware setup.
Could you check the revision of your motherboard?
The various rev 1.nn run on what Gigabyte calls Hybrid EFI i.e.: BIOS with an EFI compatibility layer (like mine), whereas rev 3.0 runs on UEFI (like most of modern motherboard).

(see this page about the motherboard, the revision is printed on the corner close to the bottomest PCI connector)

The AMD FX-6300 is the mid-range cousine of the same Piledrive family as mine.

I should have posted that in my earlier post. It’s the 1.1 board with the horrid faux-EFI setup.

(Incidentally, there’s no EFIVARS exposed the way they would be for a more modern mobo.)

Also, I’ve tried with both an earlier version of the bios (F4) and the ‘current’ version, F8f, with no change to the ‘locked on one core’ behavior. It’s worth noting that the F8f bios is listed as a ‘beta’ bios, despite the fact that it’s the last bios for this board published by Gigabyte many years ago now.

Yup, so it’s the same class as mine.
And yup there’s no real EFI, just a small shim so EFI-based bootloaders could find just enough of the API to boot on this BIOS.

When both of our board went to the market, AMD CPUs were at K10 era (Phenom I & II, etc.)
The “beta” BIOS that Gigabyte released was just a quick fix to make sure that Bulldozer/Piledriver (FX) could boot on it (has the latest microcode, etc.). AFAIK Linux used to successfully boot even without that (it can pack updated firmware in the initrd).

For the current specific problem, the firmware might actually play now role.

Update

on the bugzilla discussion, Smith has point to this discussion on LKLM.

Tim Teichman and Christian Heusel have git bisect-ed it (while handling an Arch Linux issue, and there are patches, apparently.

Given the patch mentionned (Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater) apprently it’s unrelated to the BIOS and related to a recent update in kernel 6.9.x causing the initialisation to use a Zen-specific SMT Topology option on the previous Piledriver/Bulldozer (AMD FX). Hopefully the patch will eventually make it to upstream and arrive in some updated kernel package.

Yay! :confetti_ball:

2 Likes

Praise be unto the amazing Linux community! :trophy:

Thank you everyone who’s looked into this from the Suse side! I truly appreciate your help and expertise!

Reply by Christian Heusel on the Arch issue:

Fix is now queued for 6.9.5: Re: [PATCH 6.9.y] x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater - Greg KH

So very soon we could get an update!

1 Like

I have the same ancient hardware and notice the same problem. I was just waiting for the release of Leap 15.6 to make a reinstall.

Glad that I bumped into this topic. Now I’ll wait a few days or weeks.

Takashi Iwai mentions in the bugzilla threads that Kernel 6.9.5 is compiled and available for testing in the kernel:stable OBS repository.

I’ve tested it on my machine: it solves the multi core initialisation problem, all cores run normally.
Yuhu! :confetti_ball:

(If somebody else wants to test this experimental build: why not. It’s also showing up with a 1-Click convenience link in the “Show experimental packages” section of Suse Software site).

So now we just have to wait until this package is released in the main channel.