Unable to boot kernel 6.5.2-1.1 after zypper dup today

Looking at the journal records in/var/log/journal using journalctl the only mention of anything to do with kernel 6.5.2 is when I installed it on the 13th September, there is nothing recorded relating to actually booting that kernel. Next I’ll try booting again with 6.5.2 and then see if any new journal records are created using my usb system.

Stuart

it still might. It seems there are ussues with this kernel – uninstall it after you have booted in the previous kernel.
The issue seems broader @broadstairs

you can either try to find out the issue, or just disrecard this specific version for now and start up the older kernel.

Can you back this claim by any facts that there are general problems with this kernel? And why uninstalling the kernel? This won’t work unless you lock the prior kernel because with the next dup this kernel will get pulled in again…
Better use proper multiversion setting and make a working kernel default…

I agree making the working kernel default boot is best.

However I have found one bug 1215328 which reports an issue not dissimilar to mine although in this bug NVIDIA is involved which I am watching. So it does seem like there are issues but as yet don’t know what the causes are.

I do have (as mentioned) an ASUS laptop which boots 6.5.2 OK but that is Intel HD graphics and an older I7 CPU where as mine is an AMD Ryzen 5 3400G with Radeon Vega Graphics but NOT using any special drivers.

Stuart

I have been able to copy the most recently updated journal file (which was following a failed boot of 6.5.2) across to my TW system and view it with journalctl, there is nothing in it relating to booting kernel 6.5.2, all it shows is my shutdown of 6.4.12 prior to a reboot. So I am unable to see any errors anywhere relating to the failure of 6.5.2 kernel.

I am fast running out of ideas now as to why this new kernel refuses to boot successfully.

Stuart

Just saw kernel 6.5.3 was released so did zypper dup and tested a boot with it. Still no go so back to 6.4.12 for now. Might do some more testing tomorrow.

Stuart

Just tried another boot of 6.5.3 with nomodeset and quiet removed. This time it got as far as the logo screen I usually get on booting which maybe means it got to SDDM(?) but then went straight back to booting again. There still is no indication using journalctl that anything from a 6.5 kernel has booted. Still unable to see anthing in any other log files relating to 6.5 kernel.

Stuart

This is getting serious now. I downloaded the latest TW ISO so I could install a clean system on a spare disk. On booting the USB stick with this latest snapshot and selecting Installation a load of messages flash through and it the jumps back to booting my system. SO I cannot even run the installler now!

Stuart

Well clutching at straws I thought why not try acpi=off and lo and behold it now boots to the desktop, Rubbish display resolutions but now to find out why it does not like acpi on my MSI B450-A PRO MAX motherboard with latest BIOS loaded.

Stuart

1 Like

From AMD, Intel, & NVidia X graphics driver primer:

IOW, nomodeset is a crude, usability destructive, troubleshooting parameter sometimes necessary as a fallback to get any GUI at all, or to capture logs.

Yes I realise that, however as I didn’t specify nomodeset in this case I am assuming it is a result of acpi=off.

What I’m not sure about is how to find out what in acpi this new kernel does not like. As I said I am on the latest BIOS on my mobo.

Stuart

@broadstairs run the command journalctl -b | grep acpito see what it’s not doing/doing with acpi off.

You might want to look at a specific acpi option rather than all off, which should narrow down the issue… https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt

Well run against the 6.5.3 boot the journal shows

journalctl -b -1 | grep acpi
Sep 16 17:46:48 Tumbleweed.Crowhill kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=off
Sep 16 17:46:48 Tumbleweed.Crowhill kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=off
Sep 16 17:46:48 Tumbleweed.Crowhill dracut-cmdline[220]: Using kernel command line parameters:  resume=UUID=1957a46b-74d5-40d6-baa5-e5ae1b6ecbcd root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f rootfstype=ext4 rootflags=rw,relatime   BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=off
Sep 16 17:46:49 Tumbleweed.Crowhill plymouthd[459]: 00:00:04.530 ply-utils.c:959:ply_get_kernel_command_line                   : Kernel command line is: 'BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=off
Sep 16 17:46:52 Tumbleweed.Crowhill plymouthd[695]: 00:00:07.442 ply-utils.c:959:ply_get_kernel_command_line                   : Kernel command line is: 'BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=off
Sep 16 17:56:23 Tumbleweed.Crowhill plymouthd[4611]: 00:09:38.685 ply-utils.c:959:ply_get_kernel_command_line                   : Kernel command line is: 'BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=off

and against the working 6.4.12 boot the journal shows

journalctl -b | grep acpi
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug LTR DPC]
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-7f] only partially covers this bridge
Sep 16 17:56:52 Tumbleweed.Crowhill kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Sep 16 17:56:58 Tumbleweed.Crowhill kernel: acpi_cpufreq: overriding BIOS provided _PSD data

Now I’m not sure what this shows for the 6.5.3 boot as acpi is off. As mentioned before there is no journal file created when it fails to boot, only works (so far) with acpi=off. I’ll try some of the obvious options first.
Stuart

Well first tried was acpi=noirq and it has booted and I have a good display as well. One thing I did notice as it booted was a message ‘No Southbridge IOAPIC found’ now I have no idea whether this was caused by the acpi-noirq or not.

This time journal showed

journalctl -b | grep acpi
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=noirq
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=noirq
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug LTR DPC]
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-7f] only partially covers this bridge
Sep 16 22:02:00 Tumbleweed.Crowhill kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
Sep 16 22:02:00 Tumbleweed.Crowhill dracut-cmdline[286]: 841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=noirq
Sep 16 22:02:00 Tumbleweed.Crowhill plymouthd[443]: 00:00:03.644 ply-utils.c:959:ply_get_kernel_command_line                   : Kernel command line is: 'BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=noirq
Sep 16 22:02:06 Tumbleweed.Crowhill kernel: acpi_cpufreq: overriding BIOS provided _PSD data
Sep 16 22:02:06 Tumbleweed.Crowhill plymouthd[1057]: 00:00:10.040 ply-utils.c:959:ply_get_kernel_command_line                   : Kernel command line is: 'BOOT_IMAGE=/boot/vmlinuz-6.5.3-1-default root=UUID=190442a2-32b3-47e7-bac7-a39e6841236f splash=silent resume=/dev/disk/by-id/nvme-KINGSTON_SA2000M8250G_50026B728266A552-part3 quiet mitigations=auto acpi=noirq

Stuart

Well I have been trying to determine whether or not there is a bug somewhere in acpi but this is way beyond me. I have been able to extract the various tables etc using the commands in acpica but from here on I am somewhat lost.

Is this worth a bug report against this new kernel? I know it could end up a bug in my hardware but at least the guys on bugzilla might get me to understanding what is happening.

Stuart

@broadstairs did you try any of the others, like acpi_enforce_resources=lax?

That’s the problem so many options knowing which one to try. I’ll try that one and see if I can make a guess at any more. What really concerns me is that for years now I’ve not had to mess around with things like this even on TW, its just worked.

Stuart

@broadstairs What you probably need to do is bisect the kernel, you may be asked to do this if raise a bug report…

If you raise a bug report, then probably need to provide the output from dmidecode, cat /proc/cmdline and journalctl -b and rock on with the noirq if lax doesn’t work…

Well acpi_enforce_resources=lax fails. Just what is bisecting the kernel?

Stuart

@broadstairs See https://docs.kernel.org/admin-guide/bug-bisect.html