CPUs are offline

Hi,

Found something weird today - half my CPUs are offline. I found it by accident through journalctl -

Aug 04 16:02:37 asus-roc systemd-udevd[649]: cpu6: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu6/online}, ignoring: Operation not permi>
Aug 04 16:02:37 asus-roc systemd-udevd[637]: cpu7: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu7/online}, ignoring: Operation not permi>
Aug 04 16:02:37 asus-roc systemd-udevd[656]: cpu8: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu8/online}, ignoring: Operation not permi>
Aug 04 16:02:37 asus-roc systemd-udevd[641]: cpu9: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu9/online}, ignoring: Operation not permi>


asus-roc:~ # lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-5
  Off-line CPU(s) list:  6-11
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
    BIOS Model name:     Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  1
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            10
    CPU max MHz:         4600.0000
    CPU min MHz:         0.0000
    BogoMIPS:            6399.96
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss 
                         ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
                          arch_perfmon pebs bts rep_good nopl xtopology nonstop_
                         tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp
                         l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss
                         e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes 
                         xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f
                         ault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_sh
                         adow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adj
                         ust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx sma
                         p clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dt
                         herm ida arat pln pts hwp hwp_notify hwp_act_window hwp
                         _epp md_clear flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   192 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    1.5 MiB (6 instances)
  L3:                    12 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-5
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushe
                         s, SMT disabled
  Mds:                   Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT disabled
  Retbleed:              Mitigation; IBRS
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; IBRS, IBPB conditional, RSB filling
  Srbds:                 Mitigation; Microcode
  Tsx async abort:       Mitigation; TSX disabled
asus-roc:~ #

and

asus-roc:~ # cat /sys/devices/system/cpu/offline
6-11
asus-roc:~ #

Bizarre!

I read that to put them back online I do this -

asus-roc:~ # echo 1 > /sys/devices/system/cpu/cpu6/online
-bash: echo: write error: Operation not permitted
asus-roc:~ # echo 1 > /sys/devices/system/cpu/cpu7/online
-bash: echo: write error: Operation not permitted
asus-roc:~ #
asus-roc:~ # ls -al /sys/devices/system/cpu/cpu6/
total 0
drwxr-xr-x  4 root root    0 Aug  4 16:02 .
drwxr-xr-x 22 root root    0 Aug  4 16:02 ..
-r--------  1 root root 4096 Aug  4 16:14 crash_notes
-r--------  1 root root 4096 Aug  4 16:14 crash_notes_size
lrwxrwxrwx  1 root root    0 Aug  4 16:14 driver -> ../../../../bus/cpu/drivers/processor
lrwxrwxrwx  1 root root    0 Aug  4 16:14 firmware_node -> ../../../LNXSYSTM:00/LNXCPU:06
drwxr-xr-x  2 root root    0 Aug  4 16:06 hotplug
lrwxrwxrwx  1 root root    0 Aug  4 16:14 node0 -> ../../node/node0
-rw-r--r--  1 root root 4096 Aug  4 16:22 online
drwxr-xr-x  2 root root    0 Aug  4 16:06 power
lrwxrwxrwx  1 root root    0 Aug  4 16:02 subsystem -> ../../../../bus/cpu
-rw-r--r--  1 root root 4096 Aug  4 16:02 uevent
asus-roc:~ #

I’m at a loss to understand what’s happening here and why this it occurred.

How do I get all my CPUs back online?

Thanks.

You have hyperthreading disabled so that only the physical CPU cores are available. You have SMT disabled (Simultaneous Multithreading) as you can see at your lscpu output. Check your BIOS settings and enable SMT/Hyperthreading if you want it.

So how I turn hyperthreading back on?

You need to check your BIOS settings. This can occur when you update your BIOS or you have changed BIOS settings without knowing what the setting is for. Search for SMT/HT/Hyperthreading/Intel/Multithreading/…specific stuff in your BIOS.

Bingo!!! The clue about SMT lead me to reverse a change a ‘fix’ I did yesterday based on https://forums.opensuse.org/showthread.php/572135-Journalctl-messages?p=3147023#post3147023 .

I removed the mds setting in the kernel parameters and I have my 12 cpus back.

Thank you very much!!!