Nvidia driver permission problems

Hello!

I’m building a openSUSE Leap 15.2 installer using kiwi-ng and nvidia drivers are installed and included in the built disk image. After installing the image I am experiencing permission problems for nvidia and gl. The sddm-greeter cannot be displayed and glxinfo must be run with sudo. Adding sddm user and normal users to video group works around the issue, but that’s not the real solution. At least it indicates it is “only” a permission problem. Why are not dynamic permissions assigned by logind?

Installed Nvidia packages


sudo zypper se -i nvidia
Loading repository data...
Reading installed packages...

S  | Name                      | Summary                                                               | Type
---+---------------------------+-----------------------------------------------------------------------+--------
i  | nvidia-computeG05         | NVIDIA driver for computing with GPGPU                                | package
i  | nvidia-gfxG05-kmp-default | NVIDIA graphics driver kernel module for GeForce 600 series and newer | package
i+ | nvidia-glG05              | NVIDIA OpenGL libraries for OpenGL acceleration                       | package
i+ | x11-video-nvidiaG05       | NVIDIA graphics driver for GeForce 600 series and newer               | package

An extract from journalctl for the sddm-greeter


Apr 20 14:34:07 localhost sddm-greeter[1162]: Failed to create OpenGL context for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize 24, redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize -1, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::DoubleBuffer, swapInterval 1, colorSpace QSurfaceFormat::DefaultColorSpace, profile  QSurfaceForma>
Apr 20 14:34:07 localhost sddm-helper[1154]: [PAM] Closing session
Apr 20 14:34:07 localhost sddm-helper[1154]: [PAM] Ended.

Devices file permissions


ls -la /dev/nv*
crw-rw----  1 root video 195,   0 Apr 20 14:34 /dev/nvidia0
crw-rw----+ 1 root video 195, 255 Apr 20 14:34 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254 Apr 20 14:34 /dev/nvidia-modeset
crw-rw----+ 1 root video 239,   0 Apr 20 14:34 /dev/nvidia-uvm
crw-rw----+ 1 root video 239,   1 Apr 20 14:34 /dev/nvidia-uvm-tools
crw-------  1 root root   10, 144 Apr 20 14:34 /dev/nvram

ls -la /dev/dri/*
crw-rw----+ 1 root video 226,   0 Apr 20 14:34 /dev/dri/card0
crw-rw----+ 1 root video 226, 128 Apr 20 14:34 /dev/dri/renderD128

/dev/dri/by-path:
total 0
drwxr-xr-x 2 root root  80 Apr 20 14:34 .
drwxr-xr-x 3 root root 100 Apr 20 14:34 ..
lrwxrwxrwx 1 root root   8 Apr 20 14:34 pci-0000:02:00.0-card -> ../card0
lrwxrwxrwx 1 root root  13 Apr 20 14:34 pci-0000:02:00.0-render -> ../renderD128

Dynamic permission


getfacl /dev/nv*
getfacl: Removing leading '/' from absolute path names
# file: dev/nvidia0
# owner: root
# group: video
user::rw-
group::rw-
other::---

# file: dev/nvidiactl
# owner: root
# group: video
user::rw-
user:myuser:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-modeset
# owner: root
# group: video
user::rw-
user:myuser:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-uvm
# owner: root
# group: video
user::rw-
user:myuser:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvidia-uvm-tools
# owner: root
# group: video
user::rw-
user:myuser:rw-
group::rw-
mask::rw-
other::---

# file: dev/nvram
# owner: root
# group: root
user::rw-
group::---
other::---

getfacl /dev/dri/*
getfacl: Removing leading '/' from absolute path names
# file: dev/dri/by-path
# owner: root
# group: root
user::rwx
group::r-x
other::r-x

# file: dev/dri/card0
# owner: root
# group: video
user::rw-
user:myuser:rw-
group::rw-
mask::rw-
other::---

# file: dev/dri/renderD128
# owner: root
# group: video
user::rw-
user:myuser:rw-
group::rw-
mask::rw-
other::---

Kernel modules


lsmod | grep nvidia
nvidia_drm             61440  2
nvidia_modeset       1232896  3 nvidia_drm
nvidia_uvm           1118208  0
nvidia              34168832  66 nvidia_uvm,nvidia_modeset
drm_kms_helper        229376  1 nvidia_drm
drm                   544768  5 drm_kms_helper,nvidia_drm


/sbin/lspci -nnk | grep -iA3 vga
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Quadro K2200] [10de:13ba] (rev a2)
        Subsystem: Hewlett-Packard Company Device [103c:1097]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

Xorg.0.log: https://pastebin.pl/view/8c9af808

I will complement with output from loginctl and glxinfo tomorrow, only have access via ssh now when I post this. But as mentioned above I require sudo to get output from glxinfo, otherwise I get “X Error of failed request: BadValue”.

I noticed that re-installing nvidia-gfxG05-kmp-default (originally installed by kiwi) solved the problem and I could see I got correct correct dynamic permissions.


getfacl /dev/nvidia0 
getfacl: Removing leading '/' from absolute path names 
# file: dev/nvidia0 
# owner: root 
# group: video 
user::rw- 
user:sddm:rw- 
group::rw- 
mask::rw- 
other::---

At first I was trying to compare the system before and after re-installing the package, but couldn’t find any differences in the places I would expect. However, then I checked the install script run by nvidia-gfxG05-kmp-default, and it was at least pretty obvious what went wrong.


sudo zypper install --download-only nvidia-gfxG0-kmp-default

rpm -q --scripts /var/cache/zypp/packages/NVIDIA/x86_64/nvidia-gfxG05-kmp-default-460.73.01_k5.3.18_lp152.19-lp152.37.1.x86_64.rpm

...
# Create symlinks for udev so these devices will get user ACLs by logind later (bnc#1000625)
mkdir -p /run/udev/static_node-tags/uaccess
mkdir -p /usr/lib/tmpfiles.d
ln -snf /dev/nvidiactl /run/udev/static_node-tags/uaccess/nvidiactl 
ln -snf /dev/nvidia-uvm /run/udev/static_node-tags/uaccess/nvidia-uvm
ln -snf /dev/nvidia-uvm-tools /run/udev/static_node-tags/uaccess/nvidia-uvm-tools
ln -snf /dev/nvidia-modeset /run/udev/static_node-tags/uaccess/nvidia-modeset
cat >  /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf << EOF
L /run/udev/static_node-tags/uaccess/nvidiactl - - - - /dev/nvidiactl
L /run/udev/static_node-tags/uaccess/nvidia-uvm - - - - /dev/nvidia-uvm
L /run/udev/static_node-tags/uaccess/nvidia-uvm-tools - - - - /dev/nvidia-uvm-tools
L /run/udev/static_node-tags/uaccess/nvidia-modeset - - - - /dev/nvidia-modeset
EOF
devid=-1
for dev in $(ls -d /sys/bus/pci/devices/*); do 
  vendorid=$(cat $dev/vendor)
  if  "$vendorid" == "0x10de" ]; then 
    class=$(cat $dev/class)
    classid=${class%00}
    if  "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then 
      devid=$((devid+1))
      ln -snf /dev/nvidia${devid} /run/udev/static_node-tags/uaccess/nvidia${devid}
      echo "L /run/udev/static_node-tags/uaccess/nvidia${devid} - - - - /dev/nvidia${devid}" >> /usr/lib/tmpfiles.d/nvidia-logind-acl-trick-G05.conf
    fi
  fi
done
...

[FONT=monospace]This command in the for loop will not give the same result on the kiwi build machine as in the installed system, $(ls -d /sys/bus/pci/devices/*). If manually running the two steps within that for loop, the issue is resolved (I think even doing only the last step is enough if rebooting the system afterwards). I cannot say I completely understand how this configuration solves the problems with ACLs for logind, I though it was enough with the rules in /lib/udev/rules.d, so if someone could explain this a bit I would be grateful. Otherwise, at least the problem has now been resolved.

[/FONT]