Tumbleweed + Nvidia + Bumblebee: cannot prevent fallback to Nouveau

I’m trying to trouble shoot a Bumblebee problem on TW. I don’t use the dGPU very often, so I don’t know when this broke. But, my configuration was working for quite some time.

When I try to run something on the dGPU, e.g.,

optirun glxgears

I get the error ‘Cannot access secondary GPU …’ I can see that the problem is that the nouveau drivers are loaded and not the nvidia drivers.


> sudo lspci -nnk | egrep -A3 'VGA|3D'
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics P630 [8086:591d] (rev 04)
    DeviceName:  Onboard IGD
    Subsystem: Dell Device [1028:07bf]
    Kernel driver in use: i915
--
01:00.0 3D controller [0302]: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] [10de:13b6] (rev ff)
    Kernel modules: nouveau

I am using the correct Nvidia drivers from Bumblebee, my /etc/bumblebee/bumblebee.conf has

Driver=nvidia

, my /etc/modprobe.d directory still contains the file 50-blacklist.conf with the

blacklist nouveau

line.

For some reason, the nouveau drivers are loading instead of the nvidia drivers. Googling the problem returns way too many results that are not similar enough to chase down any more.

Hi
Perhaps initrd was not rebuilt to take care of the blacklist entry, rebuild as root user with;


mkinitrd

Reboot and see how that goes.

Thanks Malcom,
I should try to give an accounting of all the things I have tried that didn’t work. …that was one of them.

Generally, I’ve been searching around the web looking at forums/reddit at similar questions. Things I’ve tried so far:

  1. Uninstalled and reinstalled bumblebee, nvidia drivers (32 and 64 bit), bbswitch
  2. Written blacklist nouveau in 50-blacklist.conf, blacklist.conf, and 99-local.conf
  3. Added nouveau.blacklist=1 onto the /etc/default/grub (and run grub2-mkconfig -o)
  4. Added options bbswitch load_state=0 unload_state=1 to 50-bbswitch.conf
  5. Restarted bumblebeed.service and run mkinitrd and/or dracut -f after any of the above and their combinations

Is it possible that mkinitrd is working off a cached file that’s not getting purged/re-written?

Sounds like the Kernel upgrade to 5.2 may have broken the driver.
If you install nvidia-bumblebee drivers from command line do you get any errors?
What version of Nvidia driver are you using? Is it 340.1070?
https://forums.opensuse.org/showthread.php/536612-Nvidia-Optimus-(using-Bumblebee)-No-Longer-works?p=2907484#post2907484

Ooh! You’re on to something. …beside the fact that other users have recently had working systems break with updates. I’m on a newish laptop with a newish card, so that solution isn’t directly related, but uninstalling/reinstalling from command line does produce errors.


DKMS make.log for nvidia-418.74 for kernel 5.2.3-1-default (x86_64)Sat Aug  3 07:32:15 EDT 2019
make[1]: Entering directory '/usr/src/linux-5.2.3-1'
make[2]: Entering directory '/usr/src/linux-5.2.3-1-obj/x86_64/default'

---snip---

CONFTEST: is_export_symbol_gpl_refcount_dec_and_test
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-frontend.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-instance.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-acpi.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-chrdev.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-cray.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-dma.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-gvi.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-i2c.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-mempool.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-mmap.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-p2p.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-pat.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-procfs.o
/var/lib/dkms/nvidia/418.74/build/nvidia/nv-procfs.o: warning: objtool: .text.unlikely: unexpected end of section
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-usermap.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-vm.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-vtophys.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/os-interface.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/os-mlock.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/os-pci.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/os-registry.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/os-usermap.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-modeset-interface.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-pci-table.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-kthread-q.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-kthread-q-selftest.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-memdbg.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-ibmnpu.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-report-err.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-rsync.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv-msi.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nv_uvm_interface.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/nvlink_linux.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia/linux_nvswitch.o
  SYMLINK /var/lib/dkms/nvidia/418.74/build/nvidia/nv-kernel.o
  LD [M]  /var/lib/dkms/nvidia/418.74/build/nvidia.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm_utils.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm_common.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm_linux.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/nvstatus.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/nvCpuUuid.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm8.o
  CC [M]  /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm8_tools.o
/var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm8_tools.c:209:13: error: conflicting types for ‘put_user_pages’
  209 | static void put_user_pages(struct page **pages, NvU64 page_count)
      |             ^~~~~~~~~~~~~~
In file included from /var/lib/dkms/nvidia/418.74/build/common/inc/nv-pgprot.h:17,
                 from /var/lib/dkms/nvidia/418.74/build/common/inc/nv-linux.h:20,
                 from /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm_linux.h:41,
                 from /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm_common.h:48,
                 from /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm8_tools.c:23:
/usr/src/linux-5.2.3-1/include/linux/mm.h:1080:6: note: previous declaration of ‘put_user_pages’ was here
 1080 | void put_user_pages(struct page **pages, unsigned long npages);
      |      ^~~~~~~~~~~~~~
make[3]: *** [/usr/src/linux-5.2.3-1/scripts/Makefile.build:280: /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm8_tools.o] Error 1
make[2]: *** [/usr/src/linux-5.2.3-1/Makefile:1609: _module_/var/lib/dkms/nvidia/418.74/build] Error 2
make[2]: Leaving directory '/usr/src/linux-5.2.3-1-obj/x86_64/default'
make[1]: *** [Makefile:179: sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-5.2.3-1'
make: *** [Makefile:81: modules] Error 2


Which looks like a bug/conflict.

Any idea whether I should report this to the kernel or Nvidia? The first error is in /var/lib/dkms/nvidia/418.74/build/nvidia-uvm/uvm8_tools.c and the other is in /usr/src/linux-5.2.3-1/include/linux/mm.h but they both reference the same put_user_pages.

Update, in case it helps the next person get closer (I still haven’t solved the Bumblebee problem).

I found this: https://garajau.com.br/2019/07/compiling-nvidia-418-on-kernel-52

I went into /var/lib/dkms/nvidia/418.74/source/nvidia-uvm and commented out those lines then ran

sudo /usr/sbin/dkms build -m nvidia -v 418.74

The Nvidia 418.74 drivers seem to have built successfully on kernel 5.2.3-1, so I ran

~> sudo mkinitrd

and rebooted.

Bumblebee service runs without complaining and I see that the Nvidia drivers are installed and available. Even getting optirun --status looks promising again. But, I still can’t run something with optirun.


~> systemctl status bumblebeed.service 


● bumblebeed.service - Bumblebee C Daemon
   Loaded: loaded (/usr/lib/systemd/system/bumblebeed.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2019-08-03 13:32:44 EDT; 2min ago
 Main PID: 14706 (bumblebeed)
    Tasks: 1 (limit: 4915)
   Memory: 828.0K
   CGroup: /system.slice/bumblebeed.service
           └─14706 /usr/sbin/bumblebeed


~> sudo lspci -nnk | egrep -A3 'VGA|3D'

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics P630 [8086:591d] (rev 04)
    DeviceName:  Onboard IGD
    Subsystem: Dell Device [1028:07bf]
    Kernel driver in use: i915
--
01:00.0 3D controller [0302]: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] [10de:13b6] (rev ff)
    Kernel modules: nouveau, nvidia_drm, nvidia

~> optirun --status
Bumblebee status: Ready (3.2.1). X inactive. Discrete video card is off.


~> optirun glxgears
[12379.499107] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card


[12379.499192] [ERROR]Aborting because fallback start is disabled.

~> optirun --status
Bumblebee status: Error (3.2.1): Could not enable discrete graphics card






Restarting Bumblebee returns the hopeful optirun --status result, but I can’t seem to get any further than this.

My understanding (I don’t have the hardware) is that bumblebee is not full supported at the moment so most are going to suse-prime. That is just an observation of the current threads

Someone that actually has Optimus hardware should comment

Another update and another near-solution.

After reading through this thread https://github.com/Bumblebee-Project/bbswitch/issues/140

I understood that bbswitch may be conflicting with tlp and/or powertop.

Uninstalling bbswitch allows optirun to use the Nvidia drivers for the dGPU, but the dGPU is switched on all the time.

Similarly, with bbswitch installed, setting PMMethod=none in the /etc/bumblebee/bumblebee.conf file allows optirun to work, but still cannot turn the dGPU off.


~> cat /proc/acpi/bbswitch
0000:01:00.0 ON

~> sudo tee /proc/acpi/bbswitch <<<OFF
OFF

~> cat /proc/acpi/bbswitch
0000:01:00.0 ON

Not sure what the effect of just commenting out the offending lines will be.
I have bumblebee working correctly with setup in link I posted

https://forums.opensuse.org/showthread.php/536612-Nvidia-Optimus-(using-Bumblebee)-No-Longer-works?p=2907484#post2907484

This patches the driver for the current version of the kernel but uses version 390.116 of the nvidia driver for my legacy Optimus card.
Remember
In order to use Bumblebee, it is necessary to add your regular user to the bumblebee group:

# gpasswd -a your-user-name bumblebee

Following bumblebee post install instructions might be helpful just in case you didn’t

INFO: Please ensure that users using bublebee/video card are in following group(s):
INFO: gpasswd -a <USER> bumblebee
INFO: If going to use nvidia binary driver:
INFO: gpasswd -a <USER> video
INFO: Also ensure the nouveau module is blacklisted (even if you plan to use it):
INFO: echo “blacklist nouveau” >> /etc/modprobe.d/50-blacklist.conf
INFO: mkinitrd

bbswitch is the kernel module that makes it possible to power off the NVIDIA card entirely.

So uninstalling it is probably going to result in you not being able to shut down the discrete GPU.

I’d suggest rolling back your system if possible and starting again following guide at

https://en.opensuse.org/SDB:NVIDIA_Bumblebee

using nvidia-bumblebee package from https://download.opensuse.org/repositories/X11:/Bumblebee/openSUSE_Tumbleweed/x86_64/
instead of using packages from Index of /opensuse/tumbleweed or building yourself.
Or use nvidia-bumblebee driver from my repo as mentioned if there are problems installing with current kernel and driver version 390.116 works for your card.

Thanks Franky. I think the Nvidia drivers and Bumblebee are squared away now.

What’s left is to sort out between bbswitch, GDE, tlp, powertop, etc which gets to decide when and whether the dGPU is powered. It looks like there’s a lot of coordination going on between those development teams, too.

I think I’ll have to sit on this for a while while they do their thing.

In the mean time, I’m getting an estimated ~4 hours from the battery (instead of 7) even when bbswitch reports that the dGPU is switched on. I can live with that for a while.

No problem, you could try **suse-prime **in the meantime as suggested earlier in the thread which might be a better solution for you.

With bumblebee dGPU will not usually be activated unless primusrun or optirun command is used i.e

primusrun glxgears

or

optirun glxgears

so if you aren’t using them bumblebee is probably redundant on your system.
I’d imagine it might take a while for teams to provide collaborative solution I think you’re hoping for.