Hi,
Found something weird today - half my CPUs are offline. I found it by accident through journalctl -
Aug 04 16:02:37 asus-roc systemd-udevd[649]: cpu6: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu6/online}, ignoring: Operation not permi>
Aug 04 16:02:37 asus-roc systemd-udevd[637]: cpu7: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu7/online}, ignoring: Operation not permi>
Aug 04 16:02:37 asus-roc systemd-udevd[656]: cpu8: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu8/online}, ignoring: Operation not permi>
Aug 04 16:02:37 asus-roc systemd-udevd[641]: cpu9: /usr/lib/udev/rules.d/80-hotplug-cpu-mem.rules:6 Failed to write ATTR{/sys/devices/system/cpu/cpu9/online}, ignoring: Operation not permi>
asus-roc:~ # lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-5
Off-line CPU(s) list: 6-11
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BIOS Model name: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
CPU family: 6
Model: 158
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 1
Stepping: 10
CPU max MHz: 4600.0000
CPU min MHz: 0.0000
BogoMIPS: 6399.96
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
arch_perfmon pebs bts rep_good nopl xtopology nonstop_
tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp
l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss
e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f
ault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_sh
adow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adj
ust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx sma
p clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dt
herm ida arat pln pts hwp hwp_notify hwp_act_window hwp
_epp md_clear flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 1.5 MiB (6 instances)
L3: 12 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-5
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushe
s, SMT disabled
Mds: Mitigation; Clear CPU buffers; SMT disabled
Meltdown: Mitigation; PTI
Mmio stale data: Mitigation; Clear CPU buffers; SMT disabled
Retbleed: Mitigation; IBRS
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer
sanitization
Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling
Srbds: Mitigation; Microcode
Tsx async abort: Mitigation; TSX disabled
asus-roc:~ #
and
asus-roc:~ # cat /sys/devices/system/cpu/offline
6-11
asus-roc:~ #
Bizarre!
I read that to put them back online I do this -
asus-roc:~ # echo 1 > /sys/devices/system/cpu/cpu6/online
-bash: echo: write error: Operation not permitted
asus-roc:~ # echo 1 > /sys/devices/system/cpu/cpu7/online
-bash: echo: write error: Operation not permitted
asus-roc:~ #
asus-roc:~ # ls -al /sys/devices/system/cpu/cpu6/
total 0
drwxr-xr-x 4 root root 0 Aug 4 16:02 .
drwxr-xr-x 22 root root 0 Aug 4 16:02 ..
-r-------- 1 root root 4096 Aug 4 16:14 crash_notes
-r-------- 1 root root 4096 Aug 4 16:14 crash_notes_size
lrwxrwxrwx 1 root root 0 Aug 4 16:14 driver -> ../../../../bus/cpu/drivers/processor
lrwxrwxrwx 1 root root 0 Aug 4 16:14 firmware_node -> ../../../LNXSYSTM:00/LNXCPU:06
drwxr-xr-x 2 root root 0 Aug 4 16:06 hotplug
lrwxrwxrwx 1 root root 0 Aug 4 16:14 node0 -> ../../node/node0
-rw-r--r-- 1 root root 4096 Aug 4 16:22 online
drwxr-xr-x 2 root root 0 Aug 4 16:06 power
lrwxrwxrwx 1 root root 0 Aug 4 16:02 subsystem -> ../../../../bus/cpu
-rw-r--r-- 1 root root 4096 Aug 4 16:02 uevent
asus-roc:~ #
I’m at a loss to understand what’s happening here and why this it occurred.
How do I get all my CPUs back online?
Thanks.