Hyper-V broken in recent 15.2

I don’t know which mailing list is appropriate for 15.2 discussion, especially for this issue, so I decided to post to the forum (I think it’s my first time posting!).

I had installed 15.2 (Build 613.2) in Hyper-V (Windows Server 2016) as a Generation 2 guest with Secure Boot enabled, mostly as an excuse to test 15.2, and have been using it quite successfully for a while. (I also have a number of 15.1 and Tumbleweed guests which are now being temporarily postponed for package maintenance, in light of this issue, until I figure out a good way to mitigate potential failure on those.) I decided it was time to update things, so I did. After rebooting, I get an error during early booting of:

hv_vmbus: unable to connect to host

The result is that I have a non-booting system, and it looks like the KVP stuff is all kinds of wonky on the host’s side.

Booting off the 613.2 DVD’s rescue environment, I downgraded kernel-default and hyper-v packages to the versions included on the disc, as they seemed the most likely culprits, but to no avail: I can’t figure out how to get it to boot again.

I fear something has changed very recently in 15.2, as the Build 654.2 media hangs at “Starting udev…” and I haven’t figured out any magics to get that to actually boot either.

As I’m not sure what component is actually failing, I can’t really try to compose a coherent bug report. :frowning:

If anyone else has access to Hyper-V (server or client), can you try to replicate what I’m seeing? Alternatively, if anyone has any ideas what might be going wrong, that’s also helpful.

Tumbleweed appears to work fine, when installing fresh on the same VM (I backed up the few kilobytes of data that I care about on that box already). So that leaves me more confused than anything, honestly. I’d expect busted stuff that exhibits this sort of behavior to be busted in both the latest 15.2 and Tumbleweed both, not just 15.2.

Can a mod move this to a more appropriate forum, please? I meant to post it to the Virtualization forum, but I’m not sure that’s the correct place anyhow since it may be an issue with something else that Hyper-V is just happening to be the canary in the coal mine for… :frowning:

Moved to Virtualization.

It is absolutely unclear what you did nor what you are booting now. When you update kernel, previous versions (at least, one previous version) is preserved and can be selected in bootloader (grub) menu. Post content of /boot/grub2/grub.cfg (upload to https://susepaste.org) and tell what kernel version booted successfully and what kernel version does not boot.

Also if you are using btrfs you may have previous snapshots with not only previous kernel but with complete previous system state. You can select one of existing snapshots in bootloader menu as well. Did you try it? Can you boot into previous snapshot?

It is 3AM here, and I should be resting, so after the sun rises I’ll do a fresh install of build 613.2 and then a zypper dup and a reboot, then paste both grub.cfg. I can’t do anything from 654.2 as the install media can’t even boot, let alone install.

This is a Hyper-V VM, so ext4 is used (for all Linux VMs) for performance and virtual disk maintenance reasons. I was already planning to make a Hyper-V snapshot in the morning before and after the zypper dup so that I can hopefully figure out what’s going wrong.

Doing an Internet search on your error,
I get plenty of hits over the years.

Skimming the results,
I see the most common cause of the error is the correct security context launching your VM…
Three security contexts are mentioned,

  • If your machine is part of a Windows Domain, you might require a Domain Admin group account
  • You may require an account part of the Local Administrators group
  • You may require an ordinary User account.

Each of the above possibilities make a lot of sense, so I’d recommend looking at those before considering any other possibility.

I don’t see anything in the hits I read what might have led to why so many different security contexts might be required, but I can guess at least for the Domain Admins account and Local Administrators account, and I can speculate that an ordinary User account might work only as a non-default configuration commonly seen in other virtualization but generally discouraged (except for Virtualbox which uniquely installs and runs as a User app).

TSU

After that “zypper dup”, it should be possible to boot the older kernel (the one from the 613.2 install). Use the “Advanced options” menu choice. If that boots, but the newer kernel from the update doesn’t boot, then you have a kernel problem that you can report.

So we’re not sidetracked by host-level concerns: the host is configured properly (as a Domain Member server), and I can think of nothing that is set on the host which would be relevant to a recent kernel change causing the VM to not boot.

Indeed, 5.3.18-lp152.8-default does boot properly when selected from GRUB. I can’t see any changes in 5.3.18-lp152.14 which should cause issues. And, very strangely, removing the .14 kernel does not lead me back to a working configuration. I’ve spent a few hours poking at why this might be, but have come up dry. I might be missing something when removing the package?

This does certainly look like a kernel issue, rather than the hyper-v package or some other package, though. If someone can validate that they can replicate the issue, or not, I’ll see what I can do to report the issue in an actionable way even if it involves installing each kernel individually and then going over the changes in the broken kernel by hand myself and then trying to rebuild a working kernel based on that.

As requested, here is the grub.cfg though there is nothing obviously amiss (though there are EFI logic changes):

Before: https://paste.opensuse.org/31731478
After: https://paste.opensuse.org/78860245

This latest testing was done on a throwaway VM, so I can confirm that it’s not some weirdness in an old VM. And at this moment, I’ve basically halted my 15.2 testing since it’s not currently in a bootable state on Hyper-V. :frowning:

If anyone else wants to try it out (it appears to behave the same on Windows 10 Client Hyper-V), here are some notes to get you pointed into the right direction:

[ul]
[li]Grab a 15.2 ISO that’s not too fresh, to ensure it’ll boot[/li][li]Create a Generation 2 guest[/li][li]Go into the guest’s Settings and make sure you’re using the Microsoft UEFI Certificate Authority for Secure Boot (this is the CA used for non-Windows signing and your physical PC probably ships with both CAs, which is why you’ve probably never noticed they have two before)[/li][li]Also in the settings, set the minimum RAM to 1536 and turn on Dynamic Memory (ballooning works perfectly fine, but the installation requires > 1GB)[/li][LIST]
[li]NOTE THAT SKIPPING THIS ONE WILL PREVENT THE INSTALL MEDIA FROM BOOTING, AT LEAST BACK TO 42.3 IF NOT EARLIER![/li][/ul]

[li]During initial install, have the Network Adapter set to Not Connected, so you don’t accidentally pull down the problematic packages on first boot; you can change it over once the installation process starts or after the initial reboot. You will, of course, have to configure the adapter after rebooting.[/li][li]For performance reasons, you might want to pre-create your virtual hard disk and select that instead of creating one in the wizard:[/li][ul]
[li]In PowerShell (may require Admin; not sure, as my creation script elevates), create a Dynamic disk with 1MB blocks: [/li]```
New-VHD -Path “C:\Path\To\Virtual Hard Disks\15.2 Test.vhdx” -SizeBytes 127GB -Dynamic -BlockSizeBytes 1MB


[/ul]

[li]For absolute best performance, pre-format your disk as ext4 and use your partitions in the installer (making sure to not reformat the ext4 / partition):[/li][ul]
[li]Create a GPT label and then create the following partitions:[/li][LIST]
[li]/dev/sda1: 500M EFI (Type 1)[/li][li]/dev/sda2: 2G Swap (Type 19)[/li][li]/dev/sda3: Remaining space as / using default type[/li][/ul]

[li]Format the ext4 / partition with Hyper-V-friendly options (the -G 4096 is the important option): [/li]```
mkfs.ext4 -G 4096 -O metadata_csum /dev/sda3

[/LIST]

[li]At the final pre-installation screen, set your elevator to noop[/li][/LIST]

A lot of those steps aren’t strictly necessary, but if you’re expecting to reinstall a few times the disk notes should provide a significant disk I/O improvement (on one very heavily loaded host with mostly Linux development VMs, we noticed disk utilization drop by nearly 30% simply by tuning the disk container and formatting properly, plus the noop elevator!), especially on non-SSD storage; you just need to mkfs the / in between each reinstall and the everything goes smoothly.

I had problems with the “.8” kernel on one machine (it wouldn’t boot). And I had problems with the “.11” kernel on another machine (no network). The “.14” kernel fixes both of those.

The kernel team are tweaking the kernel, backpatching features from newer upstream kernels. And apparently that broke your system.

Indeed, 5.3.18-lp152.8-default does boot properly when selected from GRUB.

In a case such as this, I usually recommend editing “/etc/zypp/zypp.conf”. Look for the “multiversion.kernels” line, and insert “oldest” in the list of kernels to keep. That way your working kernel won’t be removed.

You might then want to try the latest kernel from the kernels repo at

http://download.opensuse.org/repositories/Kernel:/stable/standard/

And then consider posting a bug report about the kernel that is not working.

Again - you do not described what you have done so it is impossible to make any useful comment. By the sound of it - you have two kernels, good and bad. You can boot good kernel by selecting it from bootloader menu. And after you remove bad kernel you no more can boot good kernel. Is it what you are saying?

In such troubleshooting every step matters. Even if you think it was entirely obvious.

Although to me, the error more likely points to a HostOS problem accessing the Guest rather than a Guest only issue,

If you want to focus on possible kernel issues and how to address, these are most recent MS resources (one document updated only last month) I see posted by MS

General recommendations but seems to list mostly filesystem considerations
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/best-practices-for-running-linux-on-hyper-v

General statement on Linux Integrated Services (Hyper-V drivers and optimizations now distributed in the Linux kernel)
May be important for those who continue to think they need to upgrade to Hyper-V optimized drivers(not so if recent kernel)
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-linux-and-freebsd-virtual-machines-for-hyper-v-on-windows

Matrix of supported features by SUSE/openSUSE version (too bad not by kernel, so some work is needed to evaluate properly)
Note that MS recommends turning off Secure Boot (near the bottom of the document)
Covers only to 15.1, but can be useful to troubleshoot and extrapolate to 15.2
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-suse-virtual-machines-on-hyper-v

TSU