Tumbleweed (20191216) fails to boot after GRUB, Leap 15.1 is fine

I have a new system, the headline specs are dual AMD EPYC 7282 16 Core processors on a Supermicro H11DSi-NT motherboard. Leap 15.1 installs (all defaults) and boots fine. Tumbleweed installs fine, but just after a grub selection is made at “Loading initial ramdisk”, the machine resets. All installs are fresh, and it’s not a dual boot.

Notes:

  • It does not hang, as is the case with seemingly every other bug report regarding a failure at this point in the boot.
  • It’s not graphics. I’ve installed Tumbleweed server. Same problem.
  • I have installed with both Legacy and UEFI booting. The machine supports both. Same problem with both.
  • I have installed on both NVMe and SATA drives. Same problem with both.
  • The Tumbleweed live stick behaves exactly the same way, with the same failure, as does the latest Manjaro (18.1.4). All other OS-s I’ve tried do not suffer this issue (OpenSUSE Leap 15.1, Fedora 30/31, Ubuntu 18.04/19.10).
  • I’ve tried installing Leap 15.1 and zypper dup-ing to Tumbleweed repos. Same problem. I can’t successfully boot the old kernels after the upgrade, but I can snapper back to a working Leap 15.1 system.
  • I’ve tried removing “quiet” and adding “loglevel” settings to the grub execution, but the reset seems to happen too quickly for any output to happen.

Thoughts? Anyone aware of the issue? Any ideas how to get some more logging information?

I can post more hardware details if needed. Any help would be appreciated. I would very much like to run Tumbleweed if possible. I’ve been running it on my last system for a while, and it’s great.

Thanks.

Keep 15.1 for now.

And then, I suggest that you add the kernels repo at

http://download.opensuse.org/repositories/Kernel:/stable/standard/

and install the latest kernel from there. It is probably a 5.4 kernel.

Then see if it boots with this kernel. If not, you should have the previous kernel around (selectable on the grub menu) as a fallback option.

I suggest this as a test of whether you are seeing an incompatibility between your hardware and the latest kernel.

Before doing that, you might want to edit “/etc/zypp/zypp.conf”. Look for the “multiversion.kernels” line, and change it to read:

multiversion.kernels = oldest,latest,latest-1,running

This is to make sure that, when testing kernels, you always keep a working kernel around.

Thanks for the response.

I’m afraid glibc on Leap 15.2 isn’t compatible with the latest kernel in that repository. In yast:


This request will break your system!

kernel-default-5.4.5-1.1.g47eef04.i586 conflicts with libc.so.6()(64bit) provided by glibc-2.26-lp151.18.7.x86_64

On a similar tack, I have tried upgrading the whole system to Tumbleweed (e.g., by doing this https://www.techrepublic.com/article/how-to-upgrade-opensuse-leap-to-opensuse-tumbleweed/). That process also gives me multiple kernels* I can try booting from, all of which fail. This leads me to believe that it’s not the kernel that’s the issue. Might it be the bootloader or something like that?

*Kernels available are 4.12.14-lp151.27.3 (from 15.1 install), 4.12.14-lp151.28.36.1 (from updates to 15.1), and 5.3.12-1.1 (from upgrade to Tumbleweed)

You attempted to install a 32-bit kernel on a 64-bit system.

Look for that “i586” at the end. You cannot use that kernel. You need the one with “x86_64” at the end of the name.

Oh, sorry. My mistake.

I have now installed the correct, 64 bit, 5.4.5-1 kernel in Leap 15.1, and the system boots fine.


linux-m5w9:~ # uname -r
5.4.5-1.g47eef04-default

Good.

You could try the Tumbleweed kernel, which is a 5.3 kernel. Add the Tumbleweed repo, install just the kernel, then disable that repo. That would test if there is a problem with 5.3 kernels.

My advice, however, would be to skip that test. Tumbleweed should be updating to a 5.4 kernel in the next few days. I would suggest waiting for that. And then try the tumbleweed live iso, to see whether that boots. Don’t try updating your installed system to Tumbleweed until you have a live iso that boots.

I’ve done this. Leap 15.1 boots OK with the 5.3.12-1 kernel, which is the same version as in the snapshots of Tumbleweed that don’t boot.

So, I’d conclude that it’s not the kernel. Any idea what else it might be? Presumably I could test any package upgrade in this way.

It is going to be hard to find what causes this.

Does it boot to single user mode? (Put a " 1" – without the quotes – at the end of the kernel boot line).

Test that with the live media, so that you don’t have to first reinstall.

If it boots to single user mode, then try booting to text mode (put a " 3" at the end of the kernel boot line).

Changing the runlevel didn’t do anything. On a more positive note, though …

… I found it. The offending package is “ucode-amd”. In hindsight that might seem obvious for bleeding edge AMD processors. I, however, figured this out the tedious way by installing the most basic Leap 15.1 system I could, switching the repos over to Tumbleweed, and updating things one-by-one (well, in vaguely logical groups).

So, I can have a Tumbleweed system by installing Leap 15.1, switching the repos over to Tumbleweed, putting a lock on the “ucode-amd” package, and then running zypper dup. Or, by installing Tumbleweed, then booting a live stick, manually downgrading that package in a chroot, then locking it. That’s currently what I’m running, and the lock will remain until the “ucode-amd” package is updated such that the Tumbleweed live image boots. If that seems like a terrible idea for reasons I am unaware of, please, anyone, let me know.

Thank you, @nrickert for your help. Before this I really had no idea that you could pull repos in from other releases and selectively upgrade or downgrade in order to test compatibility in this way. Without your suggestions along these lines I would not have cracked this.

Does anyone know who develops/maintains “ucode-amd”? I’d like to report the issue if I can. Does it come from AMD themselves?

I’m glad you found it.

You can probably install from the DVD installer. But, during the install, click on “Software” on the summary screen. And mark “ucode-amd” to not install. Perhaps you can even lock it there.

Note that I have not tested this. Still, it is a possibility to consider if you need to reinstall.

Thank you, @nrickert for your help. Before this I really had no idea that you could pull repos in from other releases and selectively upgrade or downgrade in order to test compatibility in this way.

We don’t usually recommend that, because it often causes problems. But there are circumstances where it can solve hardware compatibility issues.

Does anyone know who develops/maintains “ucode-amd”? I’d like to report the issue if I can. Does it come from AMD themselves?

You can file a bug report

openSUSE:Submitting bug reports

Yes, I see it. I don’t see an option to lock, but just disabling it here is easier than my previous methods. I can always lock it once booted up.

Yes, of course. I won’t keep the installation that I was messing about with in that way. It’s just a good way of diagnosing the problem.

I will. Thanks.

Quick update in case anyone else is following/experiencing this issue. I raised a bug report here: https://bugzilla.opensuse.org/show_bug.cgi?id=1160204. The conclusion of discussion there is that the latest release of AMD microcode within the kernel firmware repository fixes this issue. We now just need to wait until it becomes part of the Tumbleweed release. Until then, I’m keeping the lock and the package version from Leap 15.1.

1 Like