Real slow boot times

Recently I upgraded my SSD to an nvme drive (I am a really later adopter). I choose this as a good moment to drop Ubuntu, and switch back, after two decades, to OpenSuse. Although the experience has been great, there is one annoying problem: my boot times are really long, like minutes. After I choose OpenSuse in GRUB, I can look at thee green dots for over a minute.

To dive straight into what I have done so far:

systemd-analyze gives me

$ systemd-analyze
Startup finished in 33.501s (firmware) + 9.068s (loader) + 570ms (kernel) + 1min 3.106s (initrd) + 8.266s (userspace) = 1min 54.513s

I have tried systemd-analyze blame, but the output is not very usable to me:

$systemd-analyze blame
1min 3.819s sys-module-configfs.device
1min 3.762s dev-disk-by\x2did-nvme\x2dLexar_SSD_NQ790_2TB_PLG611R006087P220Q_1\x2dpart2.device
1min 3.762s sys-devices-pci0000:00-0000:00:02.1-0000:01:00.0-0000:02:08.0-0000:0a:00.0-nvme-nvme0-nvme0n1-nvme0n1p2.device
1min 3.762s dev-nvme0n1p2.device
...

this goes on for a while, with many devices taking over 1 minute. As far as I understand this, it has something to do with them all being blocked by the same issue.

The first thing I tried was to disconnect my card reader. When this did not help, I disconnected the USB 3 ports from the PC’s case. This seemed to help, but only for a while and now the problem is back. I have disabled the TPM module as well, because I don’t use it anyway and it also showed up in systemd-analyze blame, but that did not change much either.

There are two issues that caught my attention, and may or may not have to do anything with the problem. When I start up, before going to the three green dots (which I think is firmware loading) it shows error -19:

hub 0-0:1.0: config failed, hub doesn't have any ports! (
err -19)

The second thing is, I have a lot of (virtual) serial ports:

/dev/tty    /dev/tty12  /dev/tty17  /dev/tty21  /dev/tty26  /dev/tty30  /dev/tty35  /dev/tty4   /dev/tty44  /dev/tty49  /dev/tty53  /dev/tty58  /dev/tty62
/dev/tty0   /dev/tty13  /dev/tty18  /dev/tty22  /dev/tty27  /dev/tty31  /dev/tty36  /dev/tty40  /dev/tty45  /dev/tty5   /dev/tty54  /dev/tty59  /dev/tty63
/dev/tty1   /dev/tty14  /dev/tty19  /dev/tty23  /dev/tty28  /dev/tty32  /dev/tty37  /dev/tty41  /dev/tty46  /dev/tty50  /dev/tty55  /dev/tty6   /dev/tty7
/dev/tty10  /dev/tty15  /dev/tty2   /dev/tty24  /dev/tty29  /dev/tty33  /dev/tty38  /dev/tty42  /dev/tty47  /dev/tty51  /dev/tty56  /dev/tty60  /dev/tty8
/dev/tty11  /dev/tty16  /dev/tty20  /dev/tty25  /dev/tty3   /dev/tty34  /dev/tty39  /dev/tty43  /dev/tty48  /dev/tty52  /dev/tty57  /dev/tty61  /dev/tty9

but no physical serial ports. I could see this message in dmesg:

[    0.415783] [      T1] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled

So I tried to disable this using grub:

8250.nr_uarts=0

The message is gone, but the serial ports still show up in /dev

I have no idea how to continue. I do consider distro hopping, but I don’t want to go trough configuring the system again.

I mentioned that the problem seemed to be gone after I disconnjected my front USB 3.0 ports of the PC case. The problem returned however. After I posted this, I decided I could just as well reconnect both the card reader and the front USB 3.0 ports. Now the boot time dropped to 30s again, which I am okay with.

$ systemd-analyze
Startup finished in 7.330s (firmware) + 8.737s (loader) + 566ms (kernel) + 5.890s (initrd) + 7.994s (userspace) = 30.519s
graphical.target reached after 5.257s in userspace.

However, I expect the problem to return again, some day, so I would still be grateful if somebody can help me debug this.

If this is what is holding you back, it looks as you should wait for kernel 6.18 that should not be far away: please see bug.cgi?id=220181#c20
If you like troubleshooting, you can install kernel 6.18-rc2 from the Kernel:/HEAD repository which should include the needed patch.

If you like troubleshooting, you can install kernel 6.18-rc2 from the Kernel:/HEAD repository which should include the needed patch.

I tried this. The -19 error is gone indeed. I don’t know if this sped up starting my machine, because it is a bit faster now. If anything, the loeader took a loit more time now.

$ uname -r
6.18.0-rc2-2.g918ee04-default
$ systemd-analyze
Startup finished in 6.716s (firmware) + 21.603s (loader) + 647ms (kernel) + 14.761s (initrd) + 8.001s (userspace) = 51.730s
graphical.target reached after 5.273s in userspace.

The second reboot is even worse: almost 15 minutes!

$ systemd-analyze
Startup finished in 6.703s (firmware) + 9.062s (loader) + 762ms (kernel) + 5min 1.364s (initrd) + 9min 25.187s (userspace) = 14min 43.081s
graphical.target reached after 9min 22.452s in userspace.

This also show why I disabled the TPM module for a while:

$ systemd-analyze blame
14min 18.797s sys-devices-LNXSYSTM:00-LNXSYBUS:00-MSFT0101:00-tpmrm-tpmrm0.device
14min 18.797s dev-tpmrm0.device
14min 18.788s sys-module-configfs.device
...

I will go back to kernel 6.17 for now.

Better show

systemd-analyze critical-chain

I had to restart over 5 times to get a slow boot again, but this is slow enough to be annoying:

$  uname -r
6.17.4-1-default
$ systemd-analyze
Startup finished in 6.725s (firmware) + 8.425s (loader) + 604ms (kernel) + 1min 3.609s (initrd) + 8.056s (userspace) = 1min 27.420s
graphical.target reached after 5.294s in userspace.

I tried critical-chain as @hui suggested:

$ systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @5.294s
└─multi-user.target @5.294s
  └─getty.target @5.294s
    └─getty@tty1.service @5.294s
      └─plymouth-quit-wait.service @3.571s +1.721s
        └─systemd-user-sessions.service @3.560s +9ms
          └─nss-user-lookup.target @3.615s

This is way out of normal with a ssd, anything special in your initrd? Did you try dracut --regenerate-all just to rule out the obvious?
Other times look fairly normal to me…

journalctl --system -b could be interesting.

I wouldn’t know of something I have done weird, but it may have detected some hardware upon installation where it tries to read or write from?

The output is far too long for this forum, so I selected a time jump. Let me know if there’s something more useful.

#journalctl --system -b
...
Oct 24 11:33:08 localhost kernel: ata3: reset failed (errno=-32), retrying in 8 secs
Oct 24 11:33:16 localhost kernel: ata3: limiting SATA link speed to <unknown>
Oct 24 11:33:19 localhost kernel: ata3: reset failed (errno=-32), retrying in 8 secs
Oct 24 11:33:27 localhost kernel: ata3: limiting SATA link speed to <unknown>
Oct 24 11:33:29 localhost kernel: ata3: reset failed (errno=-32), retrying in 33 secs
Oct 24 11:34:04 localhost systemd-udevd[442]: 0000:0d:00.0: Worker [491] processing SEQNUM=2020 is taking a long time
Oct 24 11:34:04 localhost systemd-udevd[442]: 0000:0e:00.0: Worker [468] processing SEQNUM=2034 is taking a long time
Oct 24 11:34:04 localhost systemd-udevd[442]: 0000:0c:00.0: Worker [479] processing SEQNUM=2007 is taking a long time
Oct 24 11:34:04 localhost systemd-udevd[442]: 0000:0d:00.4: Worker [448] processing SEQNUM=2027 is taking a long time
Oct 24 11:34:04 localhost systemd-udevd[442]: 0000:0b:00.0: Worker [452] processing SEQNUM=2003 is taking a long time
Oct 24 11:34:04 localhost systemd-udevd[442]: 0000:0d:00.3: Worker [478] processing SEQNUM=2026 is taking a long time
Oct 24 11:34:05 localhost kernel: ata3: limiting SATA link speed to <unknown>
Oct 24 11:34:06 localhost kernel: ata3: SATA link down (SStatus 0 SControl 3D0)
Oct 24 11:34:06 localhost kernel: ata4: SATA link down (SStatus 0 SControl 300)

...

Does this work with pastebin?
I have no experience with system logs like this, but I guess there is something weird with ata3.

https://paste.opensuse.org/

1 Like

Okay, here is a copy on paste.opensuse.org. Thanks for the tip!

Well, there are problems with two of the four SATA ports; either ports themselves or devices connected to them.

(Very) wild guess: the old ssd you removed was connected to ata3 but the system still “thinks” there should be a disk there, possibly even trying to boot from there.
Maybe there are still stale references to that ata3 disk in the UEFI boot entries or somewhere else, or that ata3 port should have been disabled in the UEFI firmware…

I never removed the drive, usually I just keep adding drives :stuck_out_tongue_closed_eyes:

I disconnected my old SSD and the (older) HDD, will see if I miss them and if it solves everything. For now thank you for helping me trying solutions and analyse the logs!

Update: I connected and disconnected my 3 SATA devices. I was afraid is was my ol HDD, which reminds me I want a backup policy for that one (the data is recreatable, but it is a lot of manual work). But it seems to be my optical disk drive. I used that the last time when I bought Corel Aftershot. I have since switched to darktable, so I guess I am not going to need a DVD/CD-Drive anytime soon.

Thanks again for helping me out, I will let you know if the problem shows up again.