nVidia + Cuda + Bumblebee + Codecs: "Recipe" not working on openSUSE 15.2

johannesrs · December 23, 2020, 4:53pm

Hi SJLPHI!

You just downloaded the rpm and no conflicts found, or something more? I’m afraid that might break my X if it does not work. Bt that would be a great solution already!

SJLPHI:

Another work-around without patching libvglfaker.so is as follows:
First in your bash
export __GLVND_DISALLOW_PATCHING=1
then instead of directly running optirun, you need to bridge through primus by first modifying
/etc/bumblebee/bumblebee.conf
at primus paths
PrimusLibraryPath=/usr/lib64/**primus**:/usr/lib/**primus**
then run using
optirun -b primus nbody
You can also probably play with your bumblebee.conf VGL transport method
# The method used for VirtualGL to transport frames between X servers.
# Possible values are proxy, jpeg, rgb, xv and yuv.
VGLTransport=proxy
to work without patching the libvglfaker but performance drops below half~1/4 in comparison to optirun with proxy bridge.

Unfortunately, such performance drop is a no go for me: I do need CUDA due to its performance for some computationaly expensive calculations (gromacs and gamess), and these is the benchmark for future CPU/GPU acquisitions for several people here… But thanks, anyway, it would be a much cleaner solution (than mixing 15.1 and 15.2 version files)!

I do hope that solving it for optirun paves the way to make offload works too…

SJLPHI · December 23, 2020, 5:31pm

The safest option, which is what I do is that I already have the libvglfaker.so from LEAP 15.1. On my LEAP 15.2, I make a backup by

sudo cp /usr/lib64/libvglfaker.so /usr/lib64/libvglfaker.so.back.15.2

then with the one from 15.1

sudo cp /usr/lib64/libvglfaker.so.15.1 /usr/lib64/libvglfaker.so

For me this is easy because I have many machines to pass it back-and-forth. For you, you should install VirtualGL from openSUSE Software from official repos and be sure not to keep the repository for LEAP 15.1 After that, copy your libvglfaker with

sudo cp /usr/lib64/libvglfaker.so /usr/lib64/libvglfaker.so.back.15.1

then be sure that LEAP 15.1 repository is no longer in your zypper/yast list then update which will then upgrade VGL to the broken version. From there on you can do

sudo cp /usr/lib64/libvglfaker.so.back.15.1 /usr/lib64/libvglfaker.so

I think the long term solution is that VGL2.6.4+ will fix this and/or an official developer will fix it in the next update. It seems that bumblebee repository has their own VGL 2.6.4 but I cannot tell you if that would work.

SJLPHI · December 23, 2020, 5:37pm

malcolmlewis:

Hi
So it’s not actually in use/present, so bumblebee loads the module?..
01:00.0 **3D** controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev ff)
Kernel modules: nouveau, nvidia_drm, nvidia
If you modprobe nvidia does it show in the lspci and xrandr output?

Yes, bumblebee loads/unloads on demand for specific application. For example, I run

 >optirun glxspheres &
~>/sbin/lspci -nnk | egrep -A3 "VGA|Display|3D"
00:02.0 **VGA** compatible controller [0300]: Intel Corporation UHD Graphics 620 [8086:5917] (rev 07)
        Subsystem: Lenovo ThinkPad T480 [17aa:225e]
        Kernel driver in use: i915
        Kernel modules: i915
--
01:00.0 **3D** controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
        Subsystem: Lenovo ThinkPad T480 [17aa:225e]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

but no changes in xrandr since right now there is nothing other than the laptop.

~> xrandr --listproviders 
Providers: number : 1 
Provider 0: id: 0x47; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 3; outputs: 5; associated providers: 0; name: modesetting 
   output eDP-1 
   output DP-1 
   output HDMI-1 
   output DP-2 
   output HDMI-2

Now if I plug in an external monitor, number of providers for xrandr changes depending on which port I use, and Nvidia does get loaded automatically.

SJLPHI · December 23, 2020, 5:44pm

To be exactly clear, I do not recommend this but you can copy and paste and safe the following as vgl151.ymp

<metapackage xmlns:os="http://opensuse.org/Standards/One_Click_Install" xmlns="http://opensuse.org/Standards/One_Click_Install">  <group distversion="openSUSE Leap 15.2">
    <repositories>
      <repository recommended="true">
        <name>openSUSE:Leap:15.2</name>
        <summary>openSUSE Leap 15.2</summary>
        <description></description>
        <url>http://download.opensuse.org/distribution/leap/15.1/repo/oss/</url>
      </repository>
    </repositories>
    <software>
      <item>
        <name>VirtualGL</name>
        <summary>A toolkit for displaying OpenGL applications to thin clients</summary>
        <description>VirtualGL is a library which allows most Linux OpenGL applications to be
remotely displayed to a thin client without the need to alter the
applications in any way.  VGL inserts itself into an application at run time
and intercepts a handful of GLX calls, which it reroutes to the server's
display (which presumably has a 3D accelerator attached.)  This causes all
3D rendering to occur on the server's display.  As each frame is rendered
by the server, VirtualGL reads back the pixels from the server's framebuffer
and sends them to the client for re-compositing into the appropriate X
Window.  VirtualGL can be used to give hardware-accelerated 3D capabilities to
VNC or other remote display environments that lack GLX support.  In a LAN
environment, it can also be used with its built-in motion-JPEG video delivery
system to remotely display full-screen 3D applications at 20+ frames/second.


VirtualGL is based upon ideas presented in various academic papers on
this topic, including "A Generic Solution for Hardware-Accelerated Remote
Visualization" (Stegmaier, Magallon, Ertl 2002) and "A Framework for
Interactive Hardware Accelerated Remote 3D-Visualization" (Engel, Sommer,
Ertl 2000.)</description>
      </item>
    </software>
  </group>
</metapackage>

then double click on it to execute and be sure to be not stay subscribed to the repository to downgrade VirtualGL.

johannesrs · December 23, 2020, 6:27pm

Hi SJLPHI!

SJLPHI:

To be exactly clear, I do not recommend this but you can copy and paste and safe the following as vgl151.ymp

<metapackage xmlns:os="http://opensuse.org/Standards/One_Click_Install" xmlns="http://opensuse.org/Standards/One_Click_Install">  <group distversion="openSUSE Leap 15.2">
    <repositories>
      <repository recommended="true">
        <name>openSUSE:Leap:15.2</name>
        <summary>openSUSE Leap 15.2</summary>
        <description></description>
        <url>http://download.opensuse.org/distribution/leap/15.1/repo/oss/</url>
      </repository>
    </repositories>
    <software>
      <item>
        <name>VirtualGL</name>
        <summary>A toolkit for displaying OpenGL applications to thin clients</summary>
        <description>VirtualGL is a library which allows most Linux OpenGL applications to be
remotely displayed to a thin client without the need to alter the
applications in any way.  VGL inserts itself into an application at run time
and intercepts a handful of GLX calls, which it reroutes to the server's
display (which presumably has a 3D accelerator attached.)  This causes all
3D rendering to occur on the server's display.  As each frame is rendered
by the server, VirtualGL reads back the pixels from the server's framebuffer
and sends them to the client for re-compositing into the appropriate X
Window.  VirtualGL can be used to give hardware-accelerated 3D capabilities to
VNC or other remote display environments that lack GLX support.  In a LAN
environment, it can also be used with its built-in motion-JPEG video delivery
system to remotely display full-screen 3D applications at 20+ frames/second.


VirtualGL is based upon ideas presented in various academic papers on
this topic, including "A Generic Solution for Hardware-Accelerated Remote
Visualization" (Stegmaier, Magallon, Ertl 2002) and "A Framework for
Interactive Hardware Accelerated Remote 3D-Visualization" (Engel, Sommer,
Ertl 2000.)</description>
      </item>
    </software>
  </group>
</metapackage>

then double click on it to execute and be sure to be not stay subscribed to the repository to downgrade VirtualGL.

I’ll try this solution as soon as possible! Let’s hope.

From what I gather on your discussions with Malcolm, this loading/unloading would be the reason behind the fact that offloading and primus are not working here? I tried to stop the bumblebeed service for that matter, but ir didn’t work: What would actually possibly be needed to make it work?

Anyway: Malcolm, in case my output commands were missed (previous page), I’m reproducing them below:

/sbin/lspci -nnk | egrep -A3 “VGA|Display|3D”

00:02.0 **VGA** compatible controller [0300]: Intel Corporation UHD Graphics 630 (Mobile) [8086:3e9b] 
        Subsystem: Acer Incorporated [ALI] Device [1025:1264] 
        Kernel driver in use: i915 
        Kernel modules: i915 
-- 
01:00.0 **VGA** compatible controller [0300]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev ff) 
        Kernel modules: nouveau, nvidia_drm, nvidia 
06:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader [10ec:5287] (rev 01) 
        Subsystem: Acer Incorporated [ALI] Device [1025:1264]

xrandr --listproviders

Providers: number : 1

Provider 0: id: 0x43; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 3; outputs: 1; associated p
roviders: 0; name: modesetting

    output eDP-1

[FONT=monospace]**optirun glxspheres > glxspheres.out &
[FONT=monospace][FONT=monospace]/sbin/lspci -nnk | egrep -A3 “VGA|Display|3D”

[FONT=monospace]00:02.0 **VGA** compatible controller [0300]: Intel Corporation UHD Graphics 630 (Mobile) [8086:3e9b]
        Subsystem: Acer Incorporated [ALI] Device [1025:1264]
        Kernel driver in use: i915
        Kernel modules: i915
--
01:00.0 **VGA** compatible controller [0300]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
        Subsystem: Acer Incorporated [ALI] Device [1025:1265]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia[/FONT][/FONT]**

xrandr --listproviders

[FONT=monospace]Providers: number : 1
Provider 0: id: 0x43; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 3; outputs: 1; associated p
roviders: 0; name: modesetting
    output eDP-1

[/FONT][/FONT][/FONT]

Svyatko · December 23, 2020, 6:28pm

No need to add repo. Just download packages.

johannesrs · December 23, 2020, 9:42pm

Ok, good news (need some extra testing, but…)

The 1-click install script file did not work: seems that my system do not like to be blindly downgraded in any way…

So, I did the following (should have made a system’s snapshot before, my bad):

Add a “TEMPORARY” repository with priority 100 pointing towards “http://download.opensuse.org/distribution/leap/15.1/repo/oss/” and refresh.
sudo cp /usr/lib64/libvglfaker.so /usr/lib64/libvglfaker.so.back.15.2
On Yast, Software Management, looked for virtualgl, and manually choose the “TEMPORARY” repository version. Install.
[FONT=monospace]sudo cp /usr/lib64/libvglfaker.so /usr/lib64/libvglfaker.so.back.15.1
Again o[FONT=monospace]n Yast, Software Management, looked for virtualgl, and manually choose the up-to-date (Leap 15.2) version. Install.
[/FONT][/FONT][FONT=monospace][FONT=monospace]sudo cp /usr/lib64/libvglfaker.so.back.15.1 /usr/lib64/libvglfaker.so
optirun ./nbody runs with proper graphics and astonishing ~300-400Gflop/s with 5120 bodies (same parameters asking for the cpu led to only 1 Gflop/s). Doubling the number of bodies reaches ~800Gflop/s! rotfl!

I’ll reboot and see if everything keeps working just in case, and after run the other tests. I’ll come back here to inform!

P.S.: Now that we now that bumbleblee loads and unloads the device, it was to be expected that the offloading still wouldn’t work. Still, tested, and got the same error.
[/FONT][/FONT]

johannesrs · December 23, 2020, 10:35pm

Ok folks, thanks a lot!

Working as a charm now with bumblebee and optirun (top speed of ~1.4Tflops apparently). rotfl!rotfl!

I would really like to give primus offload a try (if, of course, it does not break the actual installation).

SJLPHI, one question: how did you find out that there was some sort of issue in the virtualgl? I was expecting something on the line of “out of bounds” for the error you previously described for the “smoke test” (which also works beautifully btw), and unless I’m mistaken there is no message in my outputs complaining loudly about any library (specially that one specifically): so, how?

Thanks a lot you all!

And Malcolm, I’m still open to make some attempts on primus offload (as long that do not risk the rest of the installation) if you are still willing to (however, probably only after Xmas now).

SJLPHI · December 24, 2020, 1:33am

johannesrs:

Ok folks, thanks a lot!

Working as a charm now with bumblebee and optirun (top speed of ~1.4Tflops apparently). rotfl!rotfl!

I would really like to give primus offload a try (if, of course, it does not break the actual installation).

SJLPHI, one question: how did you find out that there was some sort of issue in the virtualgl? I was expecting something on the line of “out of bounds” for the error you previously described for the “smoke test” (which also works beautifully btw), and unless I’m mistaken there is no message in my outputs complaining loudly about any library (specially that one specifically): so, how?

Thanks a lot you all!

And Malcolm, I’m still open to make some attempts on primus offload (as long that do not risk the rest of the installation) if you are still willing to (however, probably only after Xmas now).

I am glad that you got it all sorted out. To answer your question… I did a lot of reading online looking at error throws seen from

sudo systemctl status bumblebeed

then I got to check porting nvidia-settings

optirun -vv nvidia-settings -c :8

which ended up returning saying that libvglfaker.so has undefined symbol glXGetProcAddressARB
similar to https://www.gitmemory.com/issue/VirtualGL/virtualgl/139/690349348
(by the way, VGL_VERBOSE=1 does work for nvidia settings but for cuda graphics porting, it does not work) which had me asking the dependencies on the libvglfaker.so

ldd /usr/lib64/libvglfaker.so

and it was missing linkers for libGLX and others. I tried looking for a dirty cheap solution (https://forums.opensuse.org/showthread.php/548214-compiling-shared-library-with-already-built-library-(libvglfaker-so)) but really the best solution is to test the new VGL then let the developer responsible know that his build is good.

Long story short, you basically stumbled upon a problem I learned to work-around so I decided to do some research to fix it and well… we have a temporary solution until VGL gets patched.

Svyatko · December 24, 2020, 3:18pm

VirtualGL updated to 2.6.5 with Experimental repos: https://software.opensuse.org/package/VirtualGL
You may test it.

SJLPHI · December 25, 2020, 10:46pm

Just checked the TW official repo and also experimental X11:bumblbee repos. Still no good.

libvglfaker.so from experimental bumbleee:

Dec 24 10:18 /usr/lib64/libvglfaker.so
ldd /usr/lib64/libvglfaker.so      
        linux-vdso.so.1 (0x00007fff7e75d000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f089eee9000)
        libturbojpeg.so.0 => /usr/lib64/libturbojpeg.so.0 (0x00007f089ee4a000)
        libXv.so.1 => /usr/lib64/libXv.so.1 (0x00007f089ee42000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007f089ecfd000)
        libXext.so.6 => /usr/lib64/libXext.so.6 (0x00007f089ece8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f089ecc6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f089eb7e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f089e9b3000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f089f00c000)
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007f089e988000)
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007f089e983000)

libvglfaker.so from TW official repo:


Dec 24 19:50 /usr/lib64/libvglfaker.so
ldd /usr/lib64/libvglfaker.so       
        linux-vdso.so.1 (0x00007ffd27131000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f373ef0f000)
        libturbojpeg.so.0 => /usr/lib64/libturbojpeg.so.0 (0x00007f373ee70000)
        libXv.so.1 => /usr/lib64/libXv.so.1 (0x00007f373ee68000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007f373ed23000)
        libXext.so.6 => /usr/lib64/libXext.so.6 (0x00007f373ed0e000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f373ecec000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f373eba4000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f373e9d9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f373f032000)
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007f373e9ae000)
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007f373e9a9000)

Still missing the proper linkers for CUDA to work.

BrendanAI89 · January 12, 2021, 3:47pm

Thank You!

SJLPHI · January 12, 2021, 5:52pm

I posted a bit of an ugly solution for libvgl here :https://forums.opensuse.org/showthread.php/537906-Cuda-Nvidia-bumblebee-codecs-quot-safe-quot-way/page2

Also @BrendanAI89, what GPU do you have?

ehjhsuse · January 13, 2021, 2:02pm

Hi
I have a HP Pavilion laptop with a nVidia GeForce 8400M GS card.
I’ve installed OpenSuse Leap 15.2 with KDE.
I would like to install the nvidia driver but I see 2 options, which one should I choose, 400 or 600?

palomin:~ # zypper se x11-video-nvidiaG0*
Cargando datos del repositorio…
Leyendo los paquetes instalados…

palomin:~ # lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation G86M [GeForce 8400M GS] (rev a1)

palomin:~ # lscpu
Arquitectura: x86_64

Where can I check which cards are supported by 400 o 600 series?

thanks, Eduardo

malcolmlewis · January 13, 2021, 2:30pm

ehjhsuse:

Hi
I have a HP Pavilion laptop with a nVidia GeForce 8400M GS card.
I’ve installed OpenSuse Leap 15.2 with KDE.
I would like to install the nvidia driver but I see 2 options, which one should I choose, 400 or 600?

palomin:~ # zypper se x11-video-nvidiaG0*
Cargando datos del repositorio…
Leyendo los paquetes instalados…

E | Nombre | Resumen | Tipo
–±--------------------±--------------------------------------------------------±-------
| x11-video-nvidiaG04 | NVIDIA graphics driver for GeForce 400 series and newer | paquete
| x11-video-nvidiaG05 | NVIDIA graphics driver for GeForce 600 series and newer | paquete

palomin:~ # lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation G86M [GeForce 8400M GS] (rev a1)

palomin:~ # lscpu
Arquitectura: x86_64

Where can I check which cards are supported by 400 o 600 series?

thanks, Eduardo

Hi
Neither, the card is not supported with either driver, G03/340.108 with patches and installed the hardway…

johannesrs · January 20, 2021, 11:05pm

Hi, how are you all?

Just one question that occured me now, because of the upgrade suggestions (from nvidia):

Would upgrade nvidia/kernel/gcc safe, or does it have a high risk of breaking everything down?

Having it working ok now took us all a lot of time, and I would really not like to jeopardize it in any possible way…

Can I proceed, or should I not risk it and “taboo” these packages?

SJLPHI · January 21, 2021, 12:04am

Chance of breaking things are low risk at this point. Just make sure that after upgrade, nothing creates something in /etc/xorg.conf.d/ that has to do with Nvidia, else just refer to the original instructions I’ve compiled. In the worst case, feel free to create a new thread on it.

johannesrs · January 24, 2021, 11:48pm

Hi SJLPHI!

I thought about backing up the whole directory, but are you certain about the path you entered? There is no /etc/xorg.conf.d in my system.

johannesrs · January 25, 2021, 4:23am

Just noticed it: /etc/X11/xorg.conf.d . Just to be on the safe side, also backed up /etc/bumblebee/xorg.conf.d and /usr/share/X11/xorg.conf.d .

Thanks!

SJLPHI · January 25, 2021, 1:17pm

Yes, you are absolutely right. Sorry, I have been sloppy at it lately. Overwhelmed under work load. Also for your information, your computer will still function even if you delete everything in

/etc/X11/xorg.conf.d/

you may just have to re-configure a couple of things here and there as a result.

I have a USB 3.1 NVMe SSD “traveling” linux stick and when I change from computer to computer, I typically have to erase all contents of that directory and reboot.