This problem (to me it’s a problem) seems specific to openSUSE. From what I’ve read other hosts don’t have this happen.
opened 01:29PM - 21 Jan 24 UTC
closed 10:58AM - 25 Jan 24 UTC
* Running podman rootless returns the error `Failed to initialize NVML: Insuffic… ient Permissions`.
* Running with root works as expected.
I only had to change one setting in `config.toml` for my system.
```toml
user = "root:video"
```
I compared the relevant files to a fresh Ubuntu 23 install, where rootless worked. The only difference was the permissions on `/dev/nvidia*`. My distro, gentoo, installs a [config](https://github.com/gentoo/gentoo/blob/master/x11-drivers/nvidia-drivers/files/nvidia-545.conf#L31) that changes the defaults for device file parameters. This was introduced with a [commit on 2021-07-21](https://github.com/gentoo/gentoo/commit/701b87679ae89e02d11be22d235081fa55ae58be#diff-4e0a8c08b7ffe1db012efe78cde0532749149066d61541c218ceb2b09c4ca9b5R14). The [NVIDIA driver FAQ](https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/README/faq.html) provides an example in the FAQ:
#### How and when are the NVIDIA device files created?
```
Whether a user-space NVIDIA driver component does so itself, or invokes nvidia-modprobe, it will default to creating the device files with the following attributes:
UID: 0 - 'root'
GID: 0 - 'root'
Mode: 0666 - 'rw-rw-rw-'
```
```
For example, the NVIDIA driver can be instructed to create device files with UID=0 (root), GID=44 (video) and Mode=0660 by passing the following module parameters to the NVIDIA Linux kernel module:
NVreg_DeviceFileUID=0
NVreg_DeviceFileGID=44
NVreg_DeviceFileMode=0660
```
This looks *reasonable* to me.
Is this a bug with the container toolkit or is it expected? I would assume that with `ModifyDeviceFiles = 1` that I should not have to change my distro config.
Possible relevant information below.
## Rootless Error
```bash
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
Failed to initialize NVML: Insufficient Permissions
```
```bash
Octal Permissions Size User Group Date Modified Name
0660 crw-rw---- 195,254 root video 21 Jan 13:39 /dev/nvidia-modeset
0666 crw-rw-rw- 509,0 root root 21 Jan 13:39 /dev/nvidia-uvm
0666 crw-rw-rw- 509,1 root root 21 Jan 13:39 /dev/nvidia-uvm-tools
0660 crw-rw---- 195,0 root video 21 Jan 13:39 /dev/nvidia0
0660 crw-rw---- 195,255 root video 21 Jan 13:39 /dev/nvidiactl
```
```bash
cat /proc/driver/nvidia/params
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 27
DeviceFileMode: 432
```
## Rootless Success
After adding a config override with file mode `0666`, podman rootless works as expected.
```
options nvidia NVreg_DeviceFileMode=0666
```
```bash
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L 13:51:17
GPU 0: NVIDIA GeForce RTX 3080 (UUID: GPU-...)
```
```bash
Octal Permissions Size User Group Date Modified Name
0666 crw-rw-rw- 195,254 root video 20 Jan 16:27 /dev/nvidia-modeset
0666 crw-rw-rw- 509,0 root root 20 Jan 16:27 /dev/nvidia-uvm
0666 crw-rw-rw- 509,1 root root 20 Jan 16:27 /dev/nvidia-uvm-tools
0666 crw-rw-rw- 195,0 root video 20 Jan 16:27 /dev/nvidia0
0666 crw-rw-rw- 195,255 root video 20 Jan 16:27 /dev/nvidiactl
```
```bash
cat /proc/driver/nvidia/params
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 27
DeviceFileMode: 438
```
Ultimately I need to run my podman container rootless and it needs to have a normal user in the container using gpu acceleration. But the normal user cannot use the gpu, I think, because of this… nobody:nogroup.
user@91585e6a257a:/$ ls -l /dev/nvidia*
crw-rw----+ 1 nobody nogroup 195, 0 Jan 26 20:12 /dev/nvidia0
crw-rw----+ 1 nobody nogroup 195, 255 Jan 26 20:12 /dev/nvidiactl
crw-rw----+ 1 nobody nogroup 195, 254 Jan 26 20:12 /dev/nvidia-modeset
crw-rw-rw-+ 1 nobody nogroup 237, 0 Jan 26 20:12 /dev/nvidia-uvm
crw-rw-rw-+ 1 nobody nogroup 237, 1 Jan 26 20:12 /dev/nvidia-uvm-tools
Thank you for any help!