Booting today stalls system and I con only boot from previous kernel using advanced options

Have had a bad experience with this system today. Booting was slow and eventually stalled. I re-booted to windoze just in case I had left the system in a bad state. Windoze found errors and had to check the drive. It started OK in the end and appeared to have been stopped mid windoze update. (I had been getting machine ready for office location which uses windoze.) After a couple or updates and restarts of windoze I tried to go back to Leap 15.3 but the system stalled again. I could only get the machine up in Leap 15.3 using the advanced boot option from grub and accepting the previous kernel.
This ran without a problem and updates were waiting so did a sudo zypper up which included a kernel update. I rebooted to the new kernel but the boot stalled so I have had to go back to the version which works.

While I go and find a live distro to help diagnose problem does anybody else have this experience. I am running Kernel Version 5.3.18-150300.59.87-default to get this working for now. smartctl --all /dev/sda and /dev/sdb give no errors.

I will have to read up how to check the /dev/nvme0n1p1 etc. so if anybody can advise please let me know

Where should I start with diagnosis? I suspect my hardware as it is unusual to have two consecutive kernels going wrong.

Budge.

If anybody has time here is my boot log.

https://paste.opensuse.org/24537981

For reasons I don’t understand I cannot get any previous boot logs using

sudo journalctl --list-boots

only this last one. At least it did work!

Perhaps a persistent journal is not enabled?

See section 11.1 here: journalctl: query the systemd journal | Reference | openSUSE Leap 15.5

Many thanks Paul. I shall now try to reboot the updated system and see if I can look at it again/

OK, I now have the persistent journal enabled

# 
# See journald.conf(5) for details. 

[Journal] 
Storage=persistent 
#Compress=yes 
#Seal=yes

I have tried to boot into the upgraded system but this is still stalling. However when stalled the keyboard Caps Lock and Scroll Lock flash on and off.
The only way I can get out of this situation is to use the off switch on the box.

I can however re-boot to the kernel two updates ago.
I am not sure if this is a bug or my very old but till now faithful PS/2 Northgate Omni Key/102.
In my system settings I can only select the Northgate 101 which is what I have been using to date.
Not sure if this is the problem so will grab a spare modern keyboard and check.
What fun!

I have tried using another keyboard which also has a PS/2 plug and I still get a problem with the latest two updates, again with Caps Lock and Scroll Lock flashing. I have also tried these keyboards with PS/2 to USB adaptor and get the same result.
Since the older kernel is still working fine I am going to attempt to report a bug but not tonight!
Meanwhile if anybody else knows anything on this please post here so I know I am not alone!!!

My one remaining 15.3 machine I updated earlier today to 5.3.18-150300.59.93-default with no issues.

The Caps Lock and Scroll Lock flashing is signalling a kernel panic. The Linux kernel has experienced an unrecoverable error and cannot continue.

Don’t offhand know where your problem lies.

I am very jealous. Had I known Northgate would drop off the planet when it did I would have bought several prior thereto. Those I acquired via employers all disappeared by the time I needed another, so I was stuck with one, and it quit more than a decade ago. I’m stuck using these that have gotton well worn.

I have had trouble similar to the problem you are having within the past month, emergency shell, cannot find root filesystem on NVME. Trouble is, I can’t remember which PC or OS, or well enough when to figure it out, even though I only have 3 PCs with NVME in them. Now that I’ve seen this thread, maybe something useful to you will come to mind sometime soon. Maybe Mageia 9 (devel) on (Kaby Lake) host ab250…

Hi mrmazda,
And I thought I was the only one still with Northgates. My main machine uses an Ultra but am using the 102 on this problem machine.

The problem machine has two processors and a 2GB NVMe drive. This is my basic hardware:-

alastair@HP-Z640-1:~> sudo inxi -F 
[sudo] password for root:  
**System:**
  **Host:** HP-Z640-1 **Kernel:** 5.3.18-150300.59.87-default **arch:** x86_64 **bits:** 64 **Console:** pty pts/1 
    **Distro:** openSUSE Leap 15.3 
**Machine:**
  **Type:** Desktop **System:** Hewlett-Packard **product:** HP Z640 Workstation **v:** N/A **serial:** CZC650BZHH 
  **Mobo:** Hewlett-Packard **model:** 212A **v:** 1.01 **serial:** N/A **UEFI:** Hewlett-Packard **v:** M60 v02.59 
    **date:** 03/31/2022 
**CPU:**
  **Info:** 2x 6-core **model:** Intel Xeon E5-2620 v3 **bits:** 64 **type:** MT MCP SMP **cache:****L2:** 2x 1.5 MiB (3 
    MiB) 
  **Speed (MHz):****avg:** 1385 **min/max:** 1200/3200 **cores:****1:** 1498 **2:** 1220 **3:** 1844 **4:** 1197 **5:** 1388 
    **6:** 1198 **7:** 1198 **8:** 1198 **9:** 1198 **10:** 1960 **11:** 1398 **12:** 1369 **13:** 1199 **14:** 1202 **15:** 1200 **16:** 1199 
    **17:** 1380 **18:** 1967 **19:** 1199 **20:** 1454 **21:** 1199 **22:** 1953 **23:** 1440 **24:** 1199 
**Graphics:**
  **Device-1:** NVIDIA G98 [Quadro NVS 295] **driver:** nouveau **v:** kernel 
  **Display:** x11 **server:** X.org **v:** 1.20.3 **with:** Xwayland **driver:****X:****loaded:** nouveau 
    **unloaded:** fbdev,modesetting,vesa **gpu:** nouveau **tty:** 179x42 **resolution:** 2560x1440 
  **Message:** GL data unavailable in console for root. 
**Audio:**
  **Device-1:** Intel C610/X99 series HD Audio **driver:** snd_hda_intel 
  **Sound Server-1:** ALSA **v:** k5.3.18-150300.59.87-default **running:** yes 
  **Sound Server-2:** PulseAudio **v:** 14.2-rebootstrapped **running:** yes 
  **Sound Server-3:** PipeWire **v:** 0.3.24 **running:** yes 
**Network:**
  **Device-1:** Intel Ethernet I218-LM **driver:** e1000e 
  **IF:** eth0 **state:** up **speed:** 1000 Mbps **duplex:** full **mac:** 70:5a:0f:3a:fa:27 
  **Device-2:** ASIX AX88179 Gigabit Ethernet **type:** USB **driver:** ax88179_178a 
  **IF:** eth1 **state:** up **speed:** 1000 Mbps **duplex:** full **mac:** 7c:c2:c6:33:10:d0 
  **IF-ID-1:** br0 **state:** up **speed:** 1000 Mbps **duplex:** unknown **mac:** 7c:c2:c6:33:10:d0 
**Drives:**
  **Local Storage:****total:** 2.96 TiB **used:** 106.26 GiB (3.5%) 
  **ID-1:** /dev/nvme0n1 **vendor:** Kingston **model:** SNVS2000GB **size:** 1.82 TiB 
  **ID-2:** /dev/sda **vendor:** Samsung **model:** SSD 850 PRO 256GB **size:** 238.47 GiB 
  **ID-3:** /dev/sdb **vendor:** Western Digital **model:** WD1003FBYX-23 43W7629 42C0401IBM 
    **size:** 931.51 GiB 
**Partition:**
  **ID-1:** / **size:** 1.82 TiB **used:** 106.26 GiB (5.7%) **fs:** btrfs **dev:** /dev/nvme0n1p4 
  **ID-2:** /boot/efi **size:** 511 MiB **used:** 324 KiB (0.1%) **fs:** vfat **dev:** /dev/nvme0n1p2 
  **ID-3:** /home **size:** 1.82 TiB **used:** 106.26 GiB (5.7%) **fs:** btrfs **dev:** /dev/nvme0n1p4 
  **ID-4:** /opt **size:** 1.82 TiB **used:** 106.26 GiB (5.7%) **fs:** btrfs **dev:** /dev/nvme0n1p4 
  **ID-5:** /tmp **size:** 1.82 TiB **used:** 106.26 GiB (5.7%) **fs:** btrfs **dev:** /dev/nvme0n1p4 
  **ID-6:** /var **size:** 1.82 TiB **used:** 106.26 GiB (5.7%) **fs:** btrfs **dev:** /dev/nvme0n1p4 
**Swap:**
  **ID-1:** swap-1 **type:** partition **size:** 2 GiB **used:** 0 KiB (0.0%) **dev:** /dev/nvme0n1p3 
**Sensors:**
  **System Temperatures:****cpu:** 30.0 C **mobo:** N/A **gpu:** nouveau **temp:** 50.0 C 
  **Fan Speeds (RPM):** N/A 
**Info:**
  **Processes:** 431 **Uptime:** 0h 11m **Memory:** 62.82 GiB **used:** 2.95 GiB (4.7%) **Init:** systemd 
  **target:** graphical (5) **Shell:** Bash **inxi:** 3.3.21 
alastair@HP-Z640-1:~> 

Thanks to your advice I am using the updated inxi!

Thanks for the info on the flashing lights. New to me but a good indication of what might be going wrong.
My question, not yet researched, is how may I test the NVMe in the same way that a hard drive is checked? I think I should start there as the update has not caused problems on other machines so hardware is prime suspect. OTOH this machine is still running fine on earlier kernel version.
I look forward to your advice please when you have time!
Many thanks,
Regards,
Alastair.

This means kernel panic. Boot with “plymouth.enable=0” and without “quiet” on kernel command line, this may give more output on console.

Not sure how to do this now as there is no longer any line for parameter entry in the boot screen. I used E and then added what you suggested but as the machine cannot complete I cannot examine the log or anything else.

No useful text displayed on the terminal before the kernel panic as it attempted to boot?

You could boot from the working kernel and then look at the log for boot that failed. But if the kernel panic is very early there may not be anything written to the log…

Hi:
I also have a bad experience regarding yesterday’s upgrade of kernel form 5.14.21-150400.24.18-default (64-bit) to 5.14.21-150400.24.21-default (64-bit). After reboot the system did not run any more and ended with a screen of “kernel panic”. I had to use previous kernel at the boot screen.
he output of the inxi -SGaz --vs command is as follows:


inxi 3.3.21-00 (2022-08-22)
System:
  Kernel: 5.14.21-150400.24.18-default arch: x86_64 bits: 64 compiler: gcc
    v: 7.5.0 parameters: BOOT_IMAGE=/vmlinuz-5.14.21-150400.24.18-default
    root=UUID=85cb5677-e732-4bcf-b769-b98aee0a5c3c splash=silent
    mitigations=auto quiet
  Desktop: KDE Plasma v: 5.25.5 tk: Qt v: 5.15.5 wm: kwin_x11 vt: 7
    dm: SDDM Distro: openSUSE Leap 15.4
Graphics:
  Device-1: NVIDIA TU106GLM [Quadro RTX 3000 Mobile / Max-Q]
    vendor: Hewlett-Packard driver: nvidia v: 470.141.03
    alternate: nouveau,nvidia_drm non-free: 515.xx+ status: current (as of
    2022-08) arch: Turing code: TUxxx process: TSMC 12nm built: 2018-22 pcie:
    gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s ports:
    active: none off: eDP-1 empty: DP-1,DP-2,DP-3 bus-ID: 01:00.0
    chip-ID: 10de:1f36 class-ID: 0300
  Device-2: Quanta HP HD Camera type: USB driver: uvcvideo bus-ID: 1-7:3
    chip-ID: 0408:5347 class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: 1.20.3 with: Xwayland v: 21.1.4
    compositor: kwin_x11 driver: X: loaded: modesetting,nvidia
    unloaded: fbdev,nouveau,vesa alternate: nv gpu: nvidia,nvidia-nvswitch
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 128 s-size: 381x211mm (15.00x8.31")
    s-diag: 436mm (17.15")
  Monitor-1: DP-6 pos: primary res: 1920x1080 hz: 60 dpi: 128
    size: 382x215mm (15.04x8.46") diag: 438mm (17.26") modes: N/A
  Monitor-2: eDP-1-1 size-res: N/A modes: N/A
  OpenGL: renderer: Quadro RTX 3000/PCIe/SSE2 v: 4.6.0 NVIDIA 470.141.03
    direct render: Yes

I sought help at Linuxquestions forum and got an opinion that I am using NVidia’s proprietary drivers, and something went wrong in the new kernel’s initrd trying to include them. Also I was told that inxi output shows all good running kernel 5.14.21-150400.24.18, with proprietary NVidia drivers installed for graphics. So, I have been suggested to ask an NVidia forum, so as to attract expert attention.

So, please advice.
Regards,
Bojan

Perhaps do as suggested and ask on nvidia’s forum: Linux - NVIDIA Developer Forums

As an aside, you’ve tagged your Leap 15.4 problem onto one concerning 15.3

@Budgie2

You’d be venturing into rather deep waters with this, but, you could use kexec/kdump to save the dump of the crashed kernel.

“Kexec preserves the contents of the physical memory. After the production kernel fails, the capture kernel (an additional kernel running in a reserved memory range) saves the state of the failed kernel. The saved image can help you with the subsequent analysis.”

Details of it’s use here: https://doc.opensuse.org/documentation/leap/archive/15.3/tuning/html/book-tuning/cha-tuning-kexec.html

Hi Paul,
Sorry for the delayed reply. Not been a good week. In spite of having had all the four previous jabs I have succumbed to the dread Covid virus. No idea where it came from but apparently the present version is much more infectious than earlier ones. The effect has been much worse than I anticipated. At sometime during the past week my media NAS died so I am slowly surfacing to a whole bunch of problems. Still too befuddled to get my head around Kexec but it looks like just the tool I need so many thanks. Will report progress in due course.
Thanks again.
Alastair.

Yes, I think you need to be bright eyed and bushy tailed there. Wishing you a full and hopefully fairly swift recovery.

I had my fourth vaccination earlier this week, I’ve managed to avoid covid-19 so far, (hope that’s not talking it up).

Hi Paul,
Covid was not much fun and has left me knackered but my positive test was only last Tuesday and my family say I should give it 10 days.

Meanwhile having noted some troublesome inconsistent behaviour on rebooting I thought my problem might point to an hardware issue.
I gave my HP Z640 a good clean with a compressed air jet, reseated all the memory and the second processor board and now … no more kernel panic or problems with NIC.
No idea why this had been lurking undetected for so long but seems all is now OK.
Will close this now.
Thanks for the help along the way.
Regards,