Nvidia drivers bad signature?

I have a headless server in a datacenter for GPU workloads that’s running MicroOS with UEFI and Secure Boot enabled. It contains a Quadro P2000, so I installed the NVIDIA drivers following the directions for the G06 on SDB:NVIDIA_drivers. Everything seems to go well, I reboot at the end and manually use the attached interface to enroll the new MOK key. But when the system comes up after the enrollment, the nvidia driver can’t be loaded because it fails the signature check:

modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

As I understand it, with Secure Boot enabled, the Lockdown kernel module requires all driver modules to be signed by a key enrolled in the MOK. The nvidia-driver-G06-kmp-default package includes a %post scriptlet for enrolling an included MOK public key as part of the installation of the drivers, though it requires physical presence at the machine on the subsequent boot to manually enroll the key. Once enrolled however, drivers signed by the associated private key should pass inspection and be loadable.

What I’m getting however when I run sudo modprobe nvidia after successful MOK enrollment is:

modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

This would seem to suggest that either the MOK key wasn’t enrolled, or the signature on the NVIDIA driver mismatches the key(s) that were enrolled.
I’ve tried transactional-update shell, force re-installing the nvidia-driver-G06-kmp-default in the shell, explicitly running the mokutil --root-pw --import on all the certificates in /var/lib/nvidia-publickeys/, exiting the shell, rebooting, and accepting the MOK enrollment, but get the same result. When I try to repeat it, I’m just getting a reply during mok --import that the certificates are already enrolled in the MOK. That pretty conclusively eliminates the possibility that keys aren’t in the MOK.

So it seems the OpenSUSE prebuilt G06 NVIDIA drivers are signed with the wrong keys? Or is there something else I can check?


Possibly related is the fact that the MicroOS immutable mounting seems to be incorrectly designed. I get this error if I run transactional-update pkg install -f nvidia-driver-G06-kmp-default:

Warning: The following files were changed in the snapshot, but are shadowed by
other mounts and will not be visible to the system:
/.snapshots/20/snapshot/var/lib/nvidia-pubkeys/MOK-nvidia-driver-G06-535.104.05-11.1-default.der

And the /var/lib/nvidia-pubkeys folder doesn’t exist in the system on the next boot. It seems that the /var/lib exposed for the transactional-update isn’t the same one that is made available at run-time, so there’s at least some things that are missing from the running system. I’m not sure if it actually affects this issue since the MOK keys are enrolled in separate storage, but it doesn’t speak highly of the package correctness.

Side note: I’m not sure who thought it was acceptable to use a frequently changed MOK certificate for signing pre-built drivers, but it means the entire distro is unusable in datacenters. MOK enrollment can happen once during initial installation, and again in the event of catastrophic security breach. Physically entering the datacenter to perform software management should never be needed more than once a quarter, and even that is excessive. The current model of silent automatic updates and reboot combined with silent MOK imports that require physical presence at the unit to be accepted during the next boot mean it’s a complete non-starter for any datacenter. It ends up being a mandatory requirement to lock the specific drivers and not take updates since the key management from OpenSUSE is so poor that a new MOK key has be enrolled on almost every update.

Provide full output of dmesg (upload to https://paste.opensuse.org/) as well as full output of

modinfo nvidia

That needs bug report. Normal assumption is that the content of /usr and /var is independent. RPM violated it (/var/lib/rpm contained database of installed packages which themselves were installed in /usr, so rolling back /usr invalidated database) and RPM database was relocated to /usr to be included in snapshot. As long as installation of NVIDIA package in MicroOS is supported at all, signing key must be kept in sync with the package itself. How it is best done is up to developers.

Nothing prevents you from (re-)signing compiled drivers with your own key and enrolling this key just once during initial deployment.

That is the full output, it prints nothing else except that one line. And the dmesg only contains that one line as well, nothing else related, and no prior instances of anything about nvidia. And journalctl contains nothing at all about any of it.

Excellent information, I’ll open a bugzilla about that then.

Is there a hook I can add to zypper that will always run my re-signing process whenever transactional-update automatically runs the update and reboot test? One of the draws of MicroOS was that it automatically applies updates and then reboots to test them, rolling back if they didn’t work.
I guess I could lock the drivers and then manually update them by unlocking them, updating them, re-signing them, rebooting, and then re-locking, but at that point I’m not sure why I’m using MicroOS anymore. I’ve just eliminated the automatic safe updates feature, and added more steps to the manual one.

The Fedora immutable variants don’t have this problem for example, they generate a unique per-device signing key on the first install that gets MOK enrolled, and use akmods with the pee-device generated signing key instead. The drivers are’t prebuilt, but MOK also doesn’t require re-enrollment on each update, and realistically almost no one uses/provides prebuilt NVIDIA drivers anymore anyway (it’s all dkms and akmods now).

It is always possible to add additional RPM with trigger that does it. It is also possible to simply rebuild packages locally (you do not even need to deploy full-fledged OBS for it, simple osc build would be enough) and provide as internal repository instead of the external NVIDIA-hosted one. The latter is certainly preferred if you are talking about datacenter deployment (and eliminates the need to have development packages on every installed system).

This key will be permanently stored on this device, and some believe it weakens security (key leakage possibility).

I do not like frequent signing key updates either (although for different reasons), but someone needs to actually step in and implement alternative solution. Or pay SUSE so it pays its engineers to do it.

1213664 – NVIDIA drivers in kernels earlier than the latest installed version will have the wrong signature after rollback on Tumbleweed with enabled lockdown (opensuse.org)

1211224 – How does Nvidia KMP work with dracut uefi_secureboot_cert solution when no MOK? (opensuse.org)