Cuda + Nvidia + bumblebee + codecs "safe" way

Hello everyone,

In short, by default I work on a laptop with Nvidia dGPU and script cuda codes on it, and I also watch movies on the laptop. Therefore the following set of instructions are what I install in the beginning. I put together what I learned mostly from this thread (https://forums.opensuse.org/showthread.php/534832-Installing-NVIDIA-on-modern-machines-ending-disastrously/page2).

  1. I install OpenSUSE LEAP 42.3/15.0/15.1+ with a bootmode: nomodeset. This means the laptop should have in
/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="splash=silent quiet mitigations=auto nomodeset"

with no update applied whatsoever. Just fresh install with nomodeset in default boot mode.

  1. I add default Nvidia repository from the community packages.
zypper addrepo --refresh [https://download.nvidia.com/opensuse/leap/15.1](https://download.nvidia.com/opensuse/leap/15.1) NVIDIA

Mind you that the version number such as 15.1 should be whichever version you are using or tumbleweed (for which case instead /leap/15.1).
source (https://en.opensuse.org/SDB:NVIDIA_drivers)

Remember do not update or install anything.

  1. Set up cuda repository from Nvidia’s website (https://developer.nvidia.com/cuda-downloads).
    They typically provide an rpm package which adds necessary repos. You can do this from a terminal.
sudo zypper addrepo http://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/cuda-opensuse15.repo

Please note that they don’t distinguish OpenSUSE 15.1 from 15.0, if you add their repo they will keep the necessary packages related to cuda up to date.

  1. Now make sure all of the packages are loaded and refreshed.
sudo zypper ref

trust always on everything.

4.Install cuda which then in turn will lock, and install nvidia driver that it considers most compatible and locks the kernel and also blacklists nouveau automatically.

sudo zypper in cuda-tools-10-1 cuda-toolkit-10-1 cuda

Do not reboot yet.

5.Remove nomodeset from

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="splash=silent quiet mitigations=auto nomodeset"

Make sure that grub is updated with

sudo mkinitrd

6.Reboot.

  1. Update
sudo zypper up

8.Set up codecs

sudo zypper addrepo -f http://packman.inode.at/suse/openSUSE_Leap_15.1/ packman
sudo zypper addrepo -f http://opensuse-guide.org/repo/openSUSE_Leap_15.1/ dvd

Be sure to use the right version then install.

sudo zypper install --allow-vendor-change ffmpeg lame gstreamer-plugins-bad gstreamer-plugins-ugly gstreamer-plugins-ugly-orig-addon gstreamer-plugins-libav libavdevice56 libavdevice58 libdvdcss2 vlc-codecs

Then make sure there are no stray packages clinging to other repos.

sudo zypper dup --allow-vendor-change --from http://packman.inode.at/suse/openSUSE_Leap_15.1/

Again careful with the repos.
source (https://opensuse-guide.org/codecs.php)

9.Reboot.
10. Update.
11. If updated, then reboot again.

  1. Install bumblebee.
sudo zypper in bumblebee bbswitch

sudo usermod -aG bumblebee $USER
sudo usermod -aG video $USER

turn it on

sudo systemctl enable bumblebeed
sudo systemctl start bumblebeed

Install 32 bit libraries:

sudo zypper in Mesa-libGL1-32bit libX11-6-32bit primus-32bit

Install Mesa-demo for testing.

sudo zypper in Mesa-demo-x Mesa-demo

configure

/etc/bumblebee/bumblebee.conf

Change the following lines under [bumblebeed]:


TurnCardOffAtExit=true
Driver=nvidia

Change the following lines under [driver-nvidia]:


LibraryPath=/usr/lib64/:/usr/lib/
XorgModulePath=/usr/lib64/nvidia/xorg/,/usr/lib64/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia

Setup xrog path and symbollic link:

sudo mkdir -p /usr/lib64/nvidia/xorg/modules/extensions
sudo ln -s /usr/lib64/xorg/modules/extensions/nvidia/nvidia-libglx.so /usr/lib64/nvidia/xorg/modules/extensions/libglx.so

reinitiate kernel.

mkinitrd

At least all of my laptops needed the following. You need to change a line in:

/etc/modprobe.d/50-bbswitch.conf
options bbswitch load_state=-1 unload_state=1

source: (https://en.opensuse.org/SDB:NVIDIA_Bumblebee)
13. reboot, update, reboot, and test. First with:

optirun --status
Bumblebee status: Ready (3.2.1). X inactive. Discrete video card is off.

Then test with mesa demo package

optirun glxgears 
optirun glxspheres

If everything works, now set up your application launcher to change its instance of nvidia-settings to

optirun -b none nvidia-settings -c :8

Also within nvidia-settings or “NVIDIA X Server Settings”, NEVER touch “save to X Configuration File”, or touch anything for that matter in the “X Server Display Configuration” tab. This will most likely going result in not being able to boot to DE.

From my experience the above procedure had to be followed in the specified sequence. Any missing step or wrong sequence broke my system.

I also found a way to make steam to use optirun properly for most cases, but I will not cover there here.

-SJL

Hi
A few observations…

After changes to /etc/default/grub, the command to run should be;


grub2-mkconfig -o /boot/grub2/grub.cfg

What about blacklisting the nouveau module? Perhaps this should be added temporarily to the grub options?

I prefer the manual install of cuda as can set override options, I also install cudann, but this is a manual (or can script) process.

Hello malcolmlewis,
Instead of

grub2-mkconfig -o /boot/grub2/grub.cfg

I typically run

mkinitrd

which is less efficient method of updating grub.

Also, on step 0. I don’t really mention it because this should have been set under “Boot Loader” during installation.

As for blacklisting nouveau, this is typically done automatically by the nvidia driver under

/etc/modprobe.d/nvidia-default.conf 
blacklist nouveau

during installation, so I didn’t mention it seperately.

I have tried manual installation of Nvidia and Cuda but I am not quite sure how to do it stably and without breaking it during an update let alone to write a set of instructions for it. Hence why I mentioned “safe” way as the title.

Thank you.
-SJL

For future reference, a recent update of cuda+nvidia driver +bumblebee broke. I compiled a solution to it. This can be found here: https://forums.opensuse.org/showthread.php/538299-Recent-Cuda-Nvidia-driver-on-bumblebee-system-breaks-x-server-here-s-a-workaround?p=2921688#post2921688

Hello,

It is right now 2020-07-22 and I have just installed Tumbleweed (the latest version in repo) on an external USB-3.1<->NVMe<->USB3.0 enclosure.

To my surprise my bumblebee method works 100% still. I am a little bit puzzled because I thought that no one is supporting Bumblebee anymore, has that changed? I have not yet tested on LEAP 15.2

For my future reference, TW with kernel 5.7.9-1-default
requires


/etc/modprobe.d/50-bbswitch.conf       

to be set to:


options bbswitch load_state=0 unload_state=1

on Alienware 15 (2015)

On Lenovo T480 with NVIDIA MX150 dGPU the above method works still on LEAP 15.2 kernel 5.3.18-lp152.36-default with nvidia-gfxG05…

Odd development on LEAP 15.2 on Sept 7 2020.
Kernel:

~> uname -a
Linux Zooricker 5.3.18-lp152.36-default #1 SMP Tue Aug 18 17:09:44 UTC 2020 (885251f) x86_64 x86_64 x86_64 GNU/Linux

when running optirun as a regular user, I get:

~> optirun glxspheres
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xad (8/8/8/0)
Visual ID of window: 0x21
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  29
  Current serial number in output stream:  30

with sudo permissions:

~> sudo optirun glxspheres
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
ERROR in line 637:
Could not open display

but as root:

 optirun glxspheres
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xad (8/8/8/0)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: GeForce MX150/PCIe/SSE2
141.912480 frames/sec - 158.374327 Mpixels/sec

I’ve tried re-setting the user permissions

sudo **user**mod -aG bumblebee $USER
     sudo **user**mod -aG video $USER

and

sudo gpasswd -a $USER bumblebee  
sudo gpasswd -a $USER video

yielded no change… Still looking.

Solution:

  1. comment out the contents of
/etc/modprobe.d/09-nvidia-modprobe-pm-G05.conf
#options nvidia NVreg_DynamicPowerManagement=0x01

apparently the power managment requires super user previlages.
2.re-link broken symbolic link for nvidia-libglx.so with the appropriate one.

**ln** -sf /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so /usr/lib64/nvidia/xorg/modules/extensions/libglx.so

The naming convention changes from libglx.so to libglxserver_nvidia.so apparently.

Then recompile the kernel


mkinitrd

I got my TW bumblebee to work again by installing gfx05 package and using the same symlink as I did for LEAP 15.2, and I had to make one more change,
in the file

 /etc/bumblebee/xorg.conf.d/xorg.conf.nvidia

uncomment the BUSID line

#   BusID "PCI:01:00:0"

to

   BusID "PCI:01:00:0"

for me this works perfectly fine but it depends on the BUSID of the nvidia driver can be seen using lspci.

 sudo lspci  |grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro K2000M] (rev ff)

It may not be case for everyone.

Typo:

 /etc/bumblebee/xorg.conf.nvidia

not

 /etc/bumblebee/xorg.conf.d/xorg.conf.nvidia

Hi guys.

I’m having issues in making the recipe work in a fresh install. :frowning:

(unfortunately I’m also not sure for how long I’ll be able to be keep on doing fresh installs, since this notebook is “off duty” for too long: I’ll have to put it back to work soon)

I followed the instructions to the letter on my openSUSE 15.2 installation to have CUDA and bumblebee working with my GP107M (GTX1050M).

My commands outputs follows:

optirun --status
Bumblebee status: Ready (3.2.1). X inactive. Discrete video card is off.
optirun glxgears
  182.087121] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver

  182.087150] [ERROR]Aborting because fallback start is disabled.
optirun glxspheres
  189.007213] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver

  189.007258] [ERROR]Aborting because fallback start is disabled.
sudo lspci  |grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)
glxinfo | grep OpenGL
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2) 
OpenGL core profile version string: 4.6 (Core Profile) Mesa 19.3.4
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 19.3.4
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.3.4
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

Worst: “lsmod | grep nvidia” gives no output, and both “/proc/driver/nvidia/version” or any “/dev/nvi*” exist! :’(

Can anybody please help me here somehow? :slight_smile:

Hi
This is really a FAQ, start a new thread in the forum and just link to this :wink: To be honest, try suse-prime or switcheroo to use the discrete gpu…

Sorry, my bad. Already started a new thread on Hardware. Trying to find where to delete this previous post.

At this time If you have your repositories configured right, at this time when you use zypper to install cuda, it will install a specific version of G05 (cuda 460.27.04_k5.3.18_lp152.19_lp152.33.1.x86_64) that will work with cuda and the address issue (when trying to run cuda examples requiring X interface) will be resolved.

Also, note that after Nvidia drivers are installed, upon reboot you need to register the MOK during boot, otherwise the driver will not run.

Not true, the bug is due to VirtualGL version mixmatch. CUDA will throw

"CUDA error at GpuArray.h:244 code=219(cudaErrorInvalidGraphicsContext) "cudaGraphicsGLRegisterBuffer(&m_cuda_vbo_resource[0], m_vbo[0], cudaGraphicsMapFlagsWriteDiscard)""

and

XORG will throw "[ERROR][XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied"

All due to VGL throwing:

[VGL] ERROR: Could not load GLX/OpenGL functions[VGL]    /usr/lib64/libvglfaker.so: undefined symbol: glXGetProcAddressARB

Commented on bugreport 1176422 – VirtualGL: unresolved symbol in libvglfaker.so but not yet resolved, current work-around is downgrade to last stable LEAP 15.1 package VirtualGL-32bit. Long term solution to follow.

Since this will take some time to get sorted out, I came up with the following:

Quick, dirty fix:
Downgrade VirtualGL to LEAP 15.1 version then backup the /usr/lib64/libvglfaker.so from the older version then update back to LEAP 15.2/TW repo and overwrite /usr/lib64/libvglfaker.so from LEAP 15.1

Semi-long term solution:
Get the VirtualGL from source from git and compile your own libvglfaker and use it instead of what’s available in the repository.

long-term solution:
Get the VirtualGL from source from git and compile your own VirtualGL and use update-alternative to point to your VGL components.

I am going to demonstrate the second one since the long-term solution is a bit more tedious.

  1. Get VGL source:
cd ~/Downloads
git clone https://github.com/VirtualGL/virtualgl.git

2.install pre-requisites:

sudo zypper in libjpeg8-devel libjpeg8-devel-32bit

3.make a build and target directory

 mkdir ~/Downloads/virtualgl/build
mkdir ~/Downloads/virtualgl/compiled

4.Go to build directory

cd ~/Downloads/virtualgl/build

5.use cmake to build,

cmake ../ -DCMAKE_INSTALL_PREFIX=~/Downloads/virtualgl/compiled -DVGL_FAKEXCB=OFF -DVGL_FAKEOPENCL=OFF

I am disabling fakexcb and opencl because they do get problematic at times, xcb is easy to resolve by installing xcb

sudo zypper in **xcb**-util-keysyms-devel

but not so trivial for openCL.
6.install into the compiled directory

make install -j$nproc

at this VGL is compiled at ~/Downloads/virtualgl/compiled

7.the semi-longterm solution is to backup the old faker and replace it with the on you’ve just compiled:

sudo cp /usr/lib64/libvglfaker.so /usr/lib64/libvglfaker.so.backup
sudo cp ~/Downloads/virtualgl/compiled/lib64/libvglfaker.so /usr/lib64/libvglfaker.so

This method currently works for LEAP 15.1, 15.2 and TW as of 2021-01-12

Almost forgot, GLproto also needs to be installed to compile VirtualGL

sudo zypper in glproto-devel

For a while, I had to retire my Lenovo T480 with Nvidia MX 150 and my good old Lenovo W530 and its Nvidia K2000M, no modern drivers were supporting the GPU anymore and I could not really apply or test these steps.

Now I am making a new effort with my Lenovo P51 and Nvidia M2200m to renew/verify these steps on LEAP 15.3/15.4 and TW.

So far, on 2022-06-10, just a few days after launch of LEAP 15.4, I can’t get anything to work properly unfortunately. My biggest concern is that for some reason bbswitch is even’t even being loaded into ACPI or even mentioned in DMESG. I am going to try with LEAP 15.3

Also need to mention that the packman repos seem more or less permanently moved to https://ftp.gwdg.de/pub/linux/misc/packman/suse/openSUSE_Leap_$releasever/

which I thought would be temporary from http://packman.inode.at/suse/openSUSE_Leap_15.1/ packman

The method works perfectly, once the secure boot is disabled… Time for me to read up how to register the bbswitch and Nvidia to shim.