I was running linux 5.13.8 upgraded tumbleweed from August 2021.
I upgraded tumbleweed yesterday to 20211215 5.15.7.
Because of earlier issues, I boot to command line and then start system with:
systemctl start graphical.target
My system is an HP [FONT=arial]zbook 15 intel[/FONT][FONT=arial]i9-8950HK laptop.
[/FONT]
system rebooted normally after upgrade.
When i started graphical.target, system locked up hard and I powered off and rebooted.
While trying to diagnose the issue, I found out the first time I execute ‘lspci’ I get the output, but
if I run it again, the system locks up hard. The fan spins up a bit but the keyboard is unresponsive, and
the remote login is dead as well.
I rebooted to the previous kernel 5.13.8 and the system also locks up the same way.
I disconnected the laptop from the dock and have no external devices attached.
The syskey request works until the lockup. I’m able to dump some info to the log with syskey request before the lockup.
I’m looking for help to diagnose the issue.
I plan to test memory next, but I don’t suspect that as the issue as it has been operating normally for many months.
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 07)
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.3 Network controller: Intel Corporation Cannon Lake PCH CNVi WiFi (rev 10)
00:15.0 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #3 (rev f0)
00:1c.4 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P1000 Mobile] (rev ff)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
3a:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
6f:00.0 Non-Volatile memory controller: Silicon Motion, Inc. SM2262/SM2262EN SSD Controller (rev 03)
I am not that fluent with systemctl, but shoudn’t that be
systemctl isolate graphical.target
BTW:
There is an important, but not easy to find feature on the forums.
Please in the future use CODE tags around copied/pasted computer text in a post. It is the # button in the tool bar of the post editor. When applicable copy/paste complete, that is including the prompt, the command, the output and the next prompt.
-- Journal begins at Fri 2021-06-11 19:01:21 CDT, ends at Sun 2021-12-19 12:00:13 CST. --
Dec 19 10:20:28 gojira kernel: **x86/cpu: SGX disabled by BIOS.**
Dec 19 10:20:28 gojira kernel: **ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [CAP1] at bit offset/length 64/32 exceeds size of target Buffer (64 bits) (20210730/dsopcode-198)**
Dec 19 10:20:28 gojira kernel: **ACPI Error: Aborting method \_SB._OSC due to previous error (AE_AML_BUFFER_LIMIT) (20210730/psparse-529)**
Dec 19 10:20:28 gojira systemd-modules-load[254]: **Failed to find module 'bbswitch'**
Dec 19 10:20:32 gojira kernel: **ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210730/exoparg2-393)**
Dec 19 10:20:32 gojira kernel: **ACPI Error: Aborting method \_TZ.GETP due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)**
Dec 19 10:20:32 gojira kernel: **ACPI Error: Aborting method \_TZ.CHGZ._CRT due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)**
Dec 19 10:20:32 gojira kernel: **ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210730/exoparg2-393)**
Dec 19 10:20:32 gojira kernel: **ACPI Error: Aborting method \_TZ.GETP due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)**
Dec 19 10:20:32 gojira kernel: **ACPI Error: Aborting method \_TZ.CHGZ._CRT due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)**
Dec 19 10:20:33 gojira tlp[1139]: **Error: tlp.service is not enabled, power saving will not apply on boot.**
Dec 19 10:20:33 gojira tlp[1139]: **>>> Invoke 'systemctl enable tlp.service' to correct this!**
Dec 19 10:20:41 gojira libvirtd[2235]: **internal error: Unknown PCI header type '127' for device '0000:01:00.0'**
Dec 19 10:23:08 gojira login[2265]: **gkr-pam: unable to locate daemon control file**
I’m investigating the missing bbswitch module and the unknown PCI header type now.
Reposting with unwanted terminal sequences removed …
Yes it a NIVIDA OPTIMUS, with suse-prime installed and the NVIDIA driver, but the nvidia driver is not loaded,
as reported by lsmod.
-- Journal begins at Fri 2021-06-11 19:01:21 CDT, ends at Sun 2021-12-19 12:00:13 CST. --
Dec 19 10:20:28 gojira kernel: x86/cpu: SGX disabled by BIOS.
Dec 19 10:20:28 gojira kernel: ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [CAP1] at bit offset/length 64/32 exceeds size of target Buffer (64 bits) (20210730/dsopcode-198)
Dec 19 10:20:28 gojira kernel: ACPI Error: Aborting method \_SB._OSC due to previous error (AE_AML_BUFFER_LIMIT) (20210730/psparse-529)
Dec 19 10:20:28 gojira systemd-modules-load[254]: Failed to find module 'bbswitch'
Dec 19 10:20:32 gojira kernel: ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210730/exoparg2-393)
Dec 19 10:20:32 gojira kernel: ACPI Error: Aborting method \_TZ.GETP due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)
Dec 19 10:20:32 gojira kernel: ACPI Error: Aborting method \_TZ.CHGZ._CRT due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)
Dec 19 10:20:32 gojira kernel: ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210730/exoparg2-393)
Dec 19 10:20:32 gojira kernel: ACPI Error: Aborting method \_TZ.GETP due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)
Dec 19 10:20:32 gojira kernel: ACPI Error: Aborting method \_TZ.CHGZ._CRT due to previous error (AE_AML_PACKAGE_LIMIT) (20210730/psparse-529)
Dec 19 10:20:33 gojira tlp[1139]: Error: tlp.service is not enabled, power saving will not apply on boot.
Dec 19 10:20:33 gojira tlp[1139]: >>> Invoke 'systemctl enable tlp.service' to correct this!
Dec 19 10:20:41 gojira libvirtd[2235]: internal error: Unknown PCI header type '127' for device '0000:01:00.0'
Dec 19 10:23:08 gojira login[2265]: gkr-pam: unable to locate daemon control file
I’m investigating the missing bbswitch module and the unknown PCI header type now.
Do you normally have a second display connected? I had to report a bug a few days ago because of lockups simply trying to boot with more than one display connected to Intel UHD 730 Graphics. Plasma’s KScreen seems to remember things it shouldn’t sometimes.
Half my TW/KDE installations are running TDM instead of LightDM, SDDM or XDM, all three of which I have zero appreciation for.
mrmazda
Have you tried using LightDM lately?
Do you normally have a second display connected?
I haven’t got X to start yet, but I am using openbox, etc.
I removed all attached devices, including the docking station, and am only using the laptop itself.
susejunky
Will the system work if you switch (with suse-prime) to the NVIDIA card?
Which desktop environment (GNOME, KDE, …) and which graphic stack (X, Wayland, …) do you use?
I ran
prime-select nvidia
[FONT=arial]and the system locked up hard, just like it does when i run ‘[FONT=courier new]lspci’ a second time.
I don’t think it matters about which desktop stack or graphic stack, as the system locks up hard before any of that starts.
At this point, the best option seems to be to ‘rmmod’ drivers and try to run ‘lspci’ until I don’t get a lock up, assuming
its one of the modular drivers.
A better option is to configure the boot to use a non-graphic driver for the console, get kernel messages to log to console,
and see what ‘lspci’ is doing to what device that causes the lockup.
I think booting the ‘recovery’ kernel option gets me the text terminal.
I’m not sure how to get kernel prints to the console.
I may try gdb on lspci first, to see what it’s doing before the whole thing dies.
I think I’ll also remove the graphic enviroment drivers (intel, nvidia) as well.
[/FONT]
[/FONT]
I removed these packages:
xf86-video-nouveau-1.0.17-3.1.x86_64
x11-video-nvidiaG05-470.86-46.1.x86_64
xf86-video-intel-2.99.917.916_g31486f40-2.3.x86_64
I reset the selected ‘prime-select’ driver by overwriting the /etc/prime/current_type file.
After rebooting, I was able to run lspci without any hangs.
I now have X up and running, and no hangs.
Thanks all for your helpful advice and suggestions.