13.2: Cannot get CUDA 6.5 to work. Is bumblebee the problem?

devnoob · December 19, 2014, 5:10am

Hi,

I’m on 13.2 and try to get cuda to work. My nvidia driver is the newest available atm (340.65-36.1), newer than the one in the cuda-repo, but cuda (6.5 prod) complains:

./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Both are from the nvidia repos. Note, the cuda repo would have version 340.29-0 of the drivers, but newer should work (they don’t have a 13.2 repo for cuda atm).

I also have nvidia-bumblebee 343.36-1.1. And it works (otirun, primusrun):

How do I find out which nvidia driver version bumblebee is using? How to find out which drivers is running atm?
Am I right that my nvidia drivers from the repository are not used at all and I can remove them? How can I check if they are used somehow?
Has anybody cuda running with bumblebee? Must I downgrade to an old cuda version? (And: Can you use it without being root?)

Thanks and regards

Details:

zypper lr -uP
#  | Alias                     | Name                               | Enabled | Refresh | Priority | URI                                                                       
---+---------------------------+------------------------------------+---------+---------+----------+---------------------------------------------------------------------------
 5 | cuda                      | cuda                               | Yes     | No      |   99     | http://developer.download.nvidia.com/compute/cuda/repos/opensuse131/x86_64
 6 | nVidia Graphics Drivers   | nVidia Graphics Drivers            | Yes     | Yes     |  100     | http://download.nvidia.com/opensuse/13.2/

drivers:

S | Name                          | Type       | Version                | Arch   | Repository             
--+-------------------------------+------------+------------------------+--------+------------------------
i | nvidia-bumblebee              | package    | 343.36-1.1             | x86_64 | Bumblebee                 
v | nvidia-computeG03             | package    | 340.29-0               | x86_64 | cuda                   
i | nvidia-computeG03             | package    | 340.65-36.1            | x86_64 | nVidia Graphics Drivers
v | nvidia-gfxG03-kmp-desktop     | package    | 340.29_k3.11.6_4-0     | x86_64 | cuda                   
i | nvidia-gfxG03-kmp-desktop     | package    | 340.65_k3.16.6_2-36.1  | x86_64 | nVidia Graphics Drivers
v | nvidia-glG03                  | package    | 340.29-0               | x86_64 | cuda                   
i | nvidia-glG03                  | package    | 340.65-36.1            | x86_64 | nVidia Graphics Drivers
v | nvidia-uvm-gfxG03-kmp-desktop | package    | 340.29_k3.11.6_4-0     | x86_64 | cuda                   
i | nvidia-uvm-gfxG03-kmp-desktop | package    | 340.65_k3.16.6_2-36.1  | x86_64 | nVidia Graphics Drivers

cuda:

S | Name                        | Type    | Version  | Arch   | Repository
--+-----------------------------+---------+----------+--------+-----------
i | cuda                        | package | 6.5-14   | x86_64 | cuda      
i | cuda-6-5                    | package | 6.5-14   | x86_64 | cuda      
i | cuda-command-line-tools-6-5 | package | 6.5-14   | x86_64 | cuda      
i | cuda-core-6-5               | package | 6.5-14   | x86_64 | cuda      
i | cuda-cublas-6-5             | package | 6.5-14   | x86_64 | cuda      
i | cuda-cublas-dev-6-5         | package | 6.5-14   | x86_64 | cuda      
i | cuda-cudart-6-5             | package | 6.5-14   | x86_64 | cuda      
i | cuda-cudart-dev-6-5         | package | 6.5-14   | x86_64 | cuda      
i | cuda-cufft-6-5              | package | 6.5-14   | x86_64 | cuda      
i | cuda-cufft-dev-6-5          | package | 6.5-14   | x86_64 | cuda      
i | cuda-curand-6-5             | package | 6.5-14   | x86_64 | cuda      
i | cuda-curand-dev-6-5         | package | 6.5-14   | x86_64 | cuda      
i | cuda-cusparse-6-5           | package | 6.5-14   | x86_64 | cuda      
i | cuda-cusparse-dev-6-5       | package | 6.5-14   | x86_64 | cuda      
i | cuda-documentation-6-5      | package | 6.5-14   | x86_64 | cuda      
i | cuda-driver-dev-6-5         | package | 6.5-14   | x86_64 | cuda      
i | cuda-drivers                | package | 340.29-0 | x86_64 | cuda      
i | cuda-license-6-5            | package | 6.5-14   | x86_64 | cuda      
  | cuda-minimal-build-6-5      | package | 6.5-14   | x86_64 | cuda      
i | cuda-misc-headers-6-5       | package | 6.5-14   | x86_64 | cuda      
i | cuda-npp-6-5                | package | 6.5-14   | x86_64 | cuda      
i | cuda-npp-dev-6-5            | package | 6.5-14   | x86_64 | cuda      
i | cuda-repo-opensuse131       | package | 6.5-14   | x86_64 | cuda      
i | cuda-runtime-6-5            | package | 6.5-14   | x86_64 | cuda      
i | cuda-samples-6-5            | package | 6.5-14   | x86_64 | cuda      
i | cuda-toolkit-6-5            | package | 6.5-14   | x86_64 | cuda      
i | cuda-visual-tools-6-5       | package | 6.5-14   | x86_64 | cuda

devnoob · December 19, 2014, 5:38am

nvidia-settings gives me the version number of nvidia-bumblebee (343.36-1.1):
optirun -b none nvidia-settings -c :8

So what version is this in nvidia universe?

After reading https://en.opensuse.org/SDB:NVIDIA_Bumblebee it seems like I have to remove everything and start again. Or not?
Bumblebee doc on github states:
“If you want to use the proprietary nvidia driver, it is going to be more difficult because certain nvidia libraries must be moved to avoid conflicts with the Mesa libraries for 3D acceleration.”

But I don’t have this problem, stuff runs except cuda.

devnoob · December 19, 2014, 5:46am

More data:

optirun ./deviceQueryDrv
./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
Error: API mismatch: the NVIDIA kernel module has version 343.36,
but this NVIDIA driver component has version 295.20.  Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
cuInit(0) returned 100
-> CUDA_ERROR_NO_DEVICE (no CUDA-capable devices were detected)
Result = FAIL

Where does the 295.20 come from?

devnoob · December 19, 2014, 5:54am

Something looks wrong here:

ls -la /usr/lib64/nvidia/*
lrwxrwxrwx 1 root users       15 Oct  7 16:17 /usr/lib64/nvidia/libGL.so -> libGL.so.295.20
lrwxrwxrwx 1 root root        33 Dec 17 17:49 /usr/lib64/nvidia/libGL.so.1 -> /usr/lib64/nvidia/libGL.so.343.36
-rwxr-xr-x 1 root users  1052240 Oct 24  2013 /usr/lib64/nvidia/libGL.so.295.20
-rwxr-xr-x 1 root root   1274552 Oct  8 03:29 /usr/lib64/nvidia/libGL.so.343.22
-rwxr-xr-x 1 root root   1274520 Dec 17 17:49 /usr/lib64/nvidia/libGL.so.343.36
lrwxrwxrwx 1 root root        36 Dec 17 17:49 /usr/lib64/nvidia/libOpenCL.so -> /usr/lib64/nvidia/libOpenCL.so.1.0.0
-rwxr-xr-x 1 root root     21712 Dec 17 17:49 /usr/lib64/nvidia/libOpenCL.so.1.0.0
lrwxrwxrwx 1 root users       17 Oct  7 16:17 /usr/lib64/nvidia/libcuda.so.1 -> libcuda.so.295.20
-rwxr-xr-x 1 root users  8582228 Oct 24  2013 /usr/lib64/nvidia/libcuda.so.295.20
lrwxrwxrwx 1 root users       20 Oct  7 16:17 /usr/lib64/nvidia/libnvcuvid.so -> libnvcuvid.so.295.20
lrwxrwxrwx 1 root users       20 Oct  7 16:17 /usr/lib64/nvidia/libnvcuvid.so.1 -> libnvcuvid.so.295.20
-rwxr-xr-x 1 root users  2215680 Oct 24  2013 /usr/lib64/nvidia/libnvcuvid.so.295.20
lrwxrwxrwx 1 root users       23 Oct  7 16:17 /usr/lib64/nvidia/libnvidia-cfg.so -> libnvidia-cfg.so.295.20
lrwxrwxrwx 1 root users       23 Oct  7 16:17 /usr/lib64/nvidia/libnvidia-cfg.so.1 -> libnvidia-cfg.so.295.20
-rwxr-xr-x 1 root users   136616 Oct 24  2013 /usr/lib64/nvidia/libnvidia-cfg.so.295.20
-rwxr-xr-x 1 root users 27731728 Oct 24  2013 /usr/lib64/nvidia/libnvidia-compiler.so.295.20
-rwxr-xr-x 1 root users 34625520 Oct 24  2013 /usr/lib64/nvidia/libnvidia-glcore.so.295.20
lrwxrwxrwx 1 root users       22 Oct  7 16:17 /usr/lib64/nvidia/libnvidia-ml.so.1 -> libnvidia-ml.so.295.20
-rwxr-xr-x 1 root users   243784 Oct 24  2013 /usr/lib64/nvidia/libnvidia-ml.so.295.20
-rwxr-xr-x 1 root users    11416 Oct 24  2013 /usr/lib64/nvidia/libnvidia-tls.so.295.20

devnoob · December 19, 2014, 2:56pm

Cuda runs now (nvidia-uvm missing was the next error after deleting the garbage versions shown in the previous post):

Had a problem similar to the one described here:
http://www.blackmoreops.com/2014/06/30/kali-linux-1-0-7-kernel-3-14-install-nvidia-driver-kernel-module-cuda-pyrit/#Step_5_Fixing_ERROR_could_not_insert_nvidia_uvm_Invalid_argument

So running make in /usr/src/nvidia-343.36/ builds nvidia.ko, but not nvidia-uvm.ko.

However the ‘make -C uvm’ from the blog did not work for me, but ‘cd uvm; make’ seems to do the trick (however, also ran ‘make module’ and other stuff inbetween, maybe it had an influence on the build finally working). Then copied nvidia-uvm.ko by hand.

So final 2 issues are:

since nvidia-bumblebee is used, I wanted to remove the nvidia drivers, but this wants to remove the cuda stuff (I’m on slow internet, so redownloading those would be bad)
optirun will no longer switch off nvidia after use (probably does not unload nvidia_uvm), so I must run the following after usage:

rmmod nvidia_uvm
rmmod nvidia
tee /proc/acpi/bbswitch <<<OFF

gogalthorp · December 19, 2014, 6:29pm

So you have the solution use bbswitch. As has been noted bumblebee is just a bad idea. if you want to do more then surf get a machine that does graphic correctly with one family of GPU. Or get on NVIDIA’s case and tell them to supply the solutions for Linux. What you did to get cuda to work is a hack and can not be thought of as a general solution. I’m amazed you even got it to work as well as you have.

devnoob · December 19, 2014, 11:29pm

I want to use cuda for machine learning (theano for python), so I should test the performance first before opening the Champagne.

Did I understand right that you mean I can switch between integrated Intel and Nvidia with bbswitch without needing bumblebee?

I’m on a notebook with optimus without any bios settings for it. Always using Nvidia is no option, because of battery life and because it gets very hot as long as the Nvidia card is not turned off explicitly.
I thought the only solution to that problem is bumblebee with nvidia drivers if I also need cuda from time to time?

devnoob · December 19, 2014, 11:49pm

Ah sorry, now I think I get your post: call bbswitch by hand for now as I do, but it would be better to buy a desktop with a nvidia card to not need such hacks.

gogalthorp · December 20, 2014, 12:18am

There you go. I think NVIDIA invented optimus to sell chips when Intel started to put GPU’s on their CPU’s. The basic goal is not bad and maybe it has better support on Windows but Linux being the poor step child gets little attention. Since NVIDA drivers are binary proprietary driver there is not too much the community can do but ask NVIDIA to please pay some attention to it’s needs. You could create some scripts to start and stop each app you need to run and use it as a launcher.