Older laptop, Tumbleweed, nvidia and nouveau drivers

As per a previous post, have an older laptop with NVIDIA graphics that runs Leap 15.1 without problems but requires “nomodeset” on the kernel commandline to boot Tumbleweed. The following summarizes where I stand after a week of hacking before coming back asking again for help. Infinitely more details available upon request.

  1. I installed and tried to run the “nvidia…G03…” packages from https://download.opensuse.org/repositories/home:/wkazubski:/G03/openSUSE_Tumbleweed/x86_64/ because the main repos don’t provide them. That gives no errors but doesn’t bring up X11.

  2. I tried https://en.opensuse.org/SDB:NVIDIA_the_hard_way without success. Neither the patches from https://nvidia.if-not-true-then-false.com/patcher/NVIDIA-340xx/ nor the sources in nvidia-gfxG03-kmp-default-340.108_k6.3.2_1-165.1.x86_64.rpm help. Even with the “devel_C_C++” and “devel_kernel” patterns installed the Makefile can’t find many basic “#include” files. I added several EXTRA_CFLAGS += <dir> entries which fixed that problem but the build still fails with fatal errors, including a problem with varargs (it’s getting a C++ include file with a “namespace” line that isn’t guarded by #ifdef __cplusplus so it of course fails when compiling C source code).

  3. Using nouveau, unless “nomodeset” is in the kernel parameters, the system hangs during boot. dmesg shows crashes and the laptop’s keyboard does nothing, including no response to ALT-F1, CTRL-ALT-F1, or CTRL-ALT-DEL. I can ssh in and do anything that’s not X11-related, but while shutdown -r or shutdown -P do kill the system, the laptop remains frozen until hardware power is turned off.

  4. With “nomodeset”, the system boots to X11 and the display manager, and I can log in to a window manager or desktop environment. There are no errors or mention of nouveau in dmesg. It’s documented that nouveau needs kernel mode setting to work, and that seems to be correct. The nouveau modules are loaded:

$ lsmod | egrep -i 'nv|nou'
nouveau              3403776  0
mxm_wmi                16384  1 nouveau
i2c_algo_bit           20480  1 nouveau
drm_display_helper    212992  1 nouveau
drm_ttm_helper         16384  1 nouveau
ttm                   102400  2 drm_ttm_helper,nouveau
video                  73728  3 dell_wmi,dell_laptop,nouveau
button                 24576  1 nouveau
wmi                    45056  7 video,dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor,mxm_wmi,nouveau

But nouveau doesn’t seem to be running:

$ xrandr --query          
Screen 0: minimum 320 x 200, current 1920 x 1200, maximum 4096 x 4096
None-1 connected primary 1920x1200+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   1920x1200     60.00*+


inxi -Ga         
  Device-1: NVIDIA G92GLM [Quadro FX 3800M] vendor: Dell driver: N/A
    alternate: nouveau non-free: series: 340.xx status: legacy (EOL) last:
    release: 340.108 kernel: 5.4 xorg: 1.20 arch: Tesla process: 40-80nm
    built: 2006-13 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 2
    speed: 5 GT/s bus-ID: 01:00.0 chip-ID: 10de:061f class-ID: 0300
  Device-2: Ricoh Dell Laptop Integrated Webcam driver: uvcvideo type: USB
    rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-1.3:3
    chip-ID: 05ca:1815 class-ID: 0e02
  Display: x11 server: X.Org v: 21.1.8 driver: X: loaded: modesetting,vesa
    unloaded: fbdev gpu: N/A display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 96 s-size: 508x317mm (20.00x12.48")
    s-diag: 599mm (23.57")
  Monitor-1: Unknown-1 mapped: None-1 res: 1920x1200 hz: 60 size: N/A
    modes: 1920x1200
  API: OpenGL v: 4.5 Mesa 23.0.3 renderer: llvmpipe (LLVM 16.0.2 128 bits)
    direct-render: Yes

To me the above says that I’m running completely unaccelerated framebuffer graphics, because on Leap, xrandr shows “VGA-0”, “LVDS-0” and several other devices, and inxi shows “driver: nvidia” instead of “driver: N/A” (and there are posts here that show “driver: nouveau”).

So, questions …

  1. Am I right that nouveau isn’t running?
  2. Does nouveau support this “Quadro FX 3800m”/“G03” hardware?
  3. There’s lots of online documentation about “early” kernel mode setting. Is there a way to “nomodeset” until after the kernel and systemd have started and then somehow set the kernel mode so nouveau comes up without hanging?
  4. Would completely removing plymouth help? Is it the reason why early mode setting is required, e.g. for graphical displays during the boot process? I’ve tried many combinations of “splash=silent” and “quiet” but still always get the hang unless “nomodeset” is also included.

You could start with showing Xorg.log and output of

journlactl -b --full --no-pager

with these drivers.

I’ll start with nouveau. I’ve uploaded Xorg.log and journalctl. These are for the hang that happens without nomodeset in the kernel commandline, and the latter shows nouveau Oops-ing the kernel with a null pointer dereference. As I said, with nomodeset there isn’t a problem (except no nouveau), and there is no mention of nouveau in journalctl.

Also, the journalctl has:

ACPI: \_SB_.PCI0.AGP1.VID_: failed to evaluate _DSM
Console: switching to colour dummy device 80x25

With nomodeset that doesn’t happen, but I don’t think it’s the root of the problem. I found an online post mentioning it and coming to the same conclusion, and if I add acpi=off (without nomodeset) to the kernel commandline it doesn’t happen and X11 does come up. But still without nouveau, and the graphics performance (and even the boot process) is noticeably slower.

I’d appreciate any insights you can offer. If you’re more interested in what happens with the nvidia drivers instead of nouveau I can re-install them and paste those logs, but I’d like to hear what you think about nouveau first.

This is nouveau bug and has to be reported. Xorg.log actually looks quite good, it loads modesetting driver which completes initialization. Of course after driver crash system could have been left in some undetermined state.

Should I do that here at Bugzilla or upstream at Nouveau?

The latter has:

If you are using packages from your distribution and are unable/unwilling to test the latest versions of all the pieces of nouveau, send the bug reports to your distribution and not directly to us. If you’re using an out-of-date software version, our first question will probably be “does it still happen on latest”.

I have a Tesla 8 months older than OP’s on which everything seems to be fully as expected:

# inxi 3.3.27-00 (2023-05-07)
  Host: g5eas Kernel: 6.2.12-1-default arch: x86_64 bits: 64 compiler: gcc
    v: 13.0.1 parameters: root=LABEL=<filter> ipv6.disable=1 net.ifnames=0
    noresume consoleblank=0 preempt=full mitigations=none
  Desktop: KDE Plasma v: 5.27.5 tk: Qt v: 5.15.9 wm: kwin_x11 vt: 7 dm: SDDM
    Distro: openSUSE Tumbleweed 20230522
  Device-1: NVIDIA G98 [GeForce 8400 GS Rev. 2] vendor: PNY driver: nouveau
    v: kernel non-free: series: 340.xx status: legacy (EOL) last:
    release: 340.108 kernel: 5.4 xorg: 1.20 arch: Tesla process: 40-80nm
    built: 2006-13 pcie: gen: 1 speed: 2.5 GT/s lanes: 1 link-max: lanes: 16
    ports: active: DVI-I-1,VGA-1 empty: none bus-ID: 0b:00.0
    chip-ID: 10de:06e4 class-ID: 0300
  Display: x11 server: X.Org v: 21.1.8 compositor: kwin_x11 driver: X:
    loaded: modesetting unloaded: fbdev,vesa alternate: nouveau,nv,nvidia
    dri: nouveau gpu: nouveau display-ID: :0 screens: 1
  Screen-1: 0 s-res: 3600x1200 s-dpi: 120 s-size: 762x254mm (30.00x10.00")
    s-diag: 803mm (31.62")
  Monitor-1: DVI-I-1 pos: primary,left model: NEC EA243WM serial: <filter>
    built: 2011 res: 1920x1200 hz: 60 dpi: 94 gamma: 1.2
    size: 519x324mm (20.43x12.76") diag: 612mm (24.1") ratio: 16:10 modes:
    max: 1920x1200 min: 640x480
  Monitor-2: VGA-1 pos: right model: Dell P2213 serial: <filter> built: 2012
    res: 1680x1050 hz: 60 dpi: 90 gamma: 1.2 size: 473x296mm (18.62x11.65")
    diag: 558mm (22") ratio: 16:10 modes: max: 1680x1050 min: 720x400
  API: OpenGL v: 3.3 Mesa 23.0.3 renderer: NV98 direct-render: Yes
# rpm -qa | grep -Ei 'vidia|veau|mesa' | sort

portions of journal and dmesg: openSUSE Paste

OP’s susepastes are giving me empty pages. Which kernel(s) is/are he producing trouble with?

Just my luck that mine isn’t working. :frowning:

Yes, there’s the driver that I’m not getting. :frowning: The only thing I can think of now is that I have a “NVIDIA G92GLM [Quadro FX 3800M]” on a laptop (“M” suffix is for “mobile”) and I’ve see online posts before about the mobile chipsets being different than the desktop/PCI-(whatever) ones.

This was my first time using openSUSE Paste but I swear I unchecked the “Private” box. I can download the pastes without being logged into the site, and it seems that @arvidjaar also could. Maybe it’s been temporarily down like the reports in Is download.opensuse.org working today?

I’m on a zypper dup to 20230521 (haven’t done 20230522 yet – just some apps in that one, plus the kernel’s been unchanged for the last 2 or 3 snapshots) with:

$ uname -a
Linux harriet 6.3.2-1-default #1 SMP PREEMPT_DYNAMIC Mon May 15 15:59:38 UTC 2023 (70ea6f6) x86_64 x86_64 x86_64 GNU/Linux

and with:

# rpm -qa | grep -Ei 'vidia|veau|mesa' | sort

Completely unrelated and unimportant: I’m guessing we have different $LC_ALL environment variables (mine is unset) which is why your sort order in case-insensitive and mine is strict ASCII.

Possibly important: I have rpms which you don’t. Should I zypper rm the “kernel-firmware-nvidia” package that probably came in with my nvidia experiments? I thought I’d removed them all but obviously not. I did manually remove some /etc/modprobe.d/* files containing “blacklist nouveau” which the uninstall incorrectly left behind. (That problem threw me off for a day or two.)

And what about “libvdpau_nouveau” and “xf86-video-nouveau”? I installed the latter manually as one of my many desperate/random experiments to get this working. The “libvdpau_nouveau” came in with my initial Tumbleweed install of Snapshot20230420 and has been there ever since. But I also have “libvdpau1-1.5-1.5.x86_64”. Does your system (which works) have that one? Could it be conflicting with the “libvdpau_nouveau” on mine?

Thanks for your help, @mrmazda, and apologies if any of this has been excessive/irrelevant information.

Those are completely unrelated to the driver crash. As to where to report it - I would say upstream. Tumbleweed kernel is pretty new. You could try installing vanilla kernel to check whether it could be local SUSE patches. If vanilla kernel works - report to openSUSE bugzilla, otherwise upstream.

There is recent report coming from openSUSE, but those kernel messages look somewhat different:

Removing kernel-firmware-nvidia should tell you if there’s a bad piece in it for your G92GLM, as long as the bad piece isn’t included in your initrd. One would not think blacklisting is any issue given the match between my and your lsmod output:

# lsmod | egrep 'nv|vidia|veau' | sort
button                 24576  1 nouveau
drm_display_helper    212992  1 nouveau
drm_ttm_helper         16384  1 nouveau
i2c_algo_bit           20480  1 nouveau
mxm_wmi                16384  1 nouveau
nouveau              3403776  4
ttm                   102400  2 drm_ttm_helper,nouveau
video                  73728  1 nouveau
wmi                    45056  3 video,mxm_wmi,nouveau

I suppose yes:

# cat /usr/local/bin/zypsei
zypper --no-refresh se -s -i $*  | grep -Ev 'debug|devel|srcp|openSUSE-20' | grep -E 'x86|noarch'| sort
# zypsei vdpau
i | libvdpau1 | package | 1.5-1.5 | x86_64 | OSS
# alias rpmqa
alias rpmqa='rpm -qa | sort | grep $*'
# rpmqa vdpau

Not having xf86-video-nouveau installed is the easy way to enjoy the upstream default display driver configuration.

After years of trying to help NVidia GPU users make pure FOSS work, I can only surmise that nouveau kernel module not loading results from imperfect purging of a proprietary NVidia driver installation, whether or not it was successfully installed. I’ve never attempted to load NVidia’s proprietary drivers on any hardware I own. It could be that such attempts by others have left something behind that blocks the nouveau kernel driver from doing its job, if it even loads in the first place. Your lsmod output clearly shows not loading is not your issue, but I suppose there could be a modified or extra library left behind that’s getting in your way; or something left behind in /etc/X11/.

Is /dev/dri/card# getting created at all, or soon enough? Not infrequently here, with various hardware, X tries to start before it has been created, causing X to fail to start on the first try. If the DM doesn’t retry after /dev/dri/card# is created, then X fails in trying to start. It’s not unusual to find /dev/dri/card0 didn’t get created, but /dev/dri/card1 eventually did, followed by X finally starting. I find when this happens that TDM and KDM3 at least will eventually succeed, usually after a 90s timeout, but sometimes it takes more than one timeout for it to happen. I don’t use LightDM or SDDM very often, or GDM ever, so am not familiar with their behavior in regard to retries or timeouts.

Reminds me a little bit

…I finally gave up and installed 15.5beta…

And this one describes exactly the same crash, analysis (what kernel commits may be responsible) and even has proposed patch. With it you could try openSUSE bug report.

[Nouveau] Panic report and patch against master (Quadro FX) - Monty Montgomery (kernel.org)

…published one month ago and nothing happend since then? Wow…

Thanks for all the input, everyone. Even though it’s beginning to look like the situation is pretty hopeless. :frowning:

FWIW, I agree:

$ rpm -qil libvdpau1
This package contains the libvdpau wrapper library and the libvdpau_trace
debugging library, along with the header files needed to build VDPAU
applications.  To actually use a VDPAU device, you need a vendor-specific
implementation library.  Currently, this is always libvdpau_nvidia.  You can
override the driver name by setting the VDPAU_DRIVER environment variable.

$ rpm -qil libvdpau_nouveau

This package contains the VDPAU state tracker for Nouveau.

So despite what “libvdpau1” says about “this is always libvdpau_nvidia” I’m assuming the rest of the description is correct: It’s a wrapper around the actual libvidpau implementation which in my case is “libvdpau_nouveau”, which in turn is needed for GPU video acceleration.

So as much as I’m out of my league disagreeing with @mrmazda, I’m leaving both “libvdpau1” and "ibvdpau_nouveau” in for now.

Very interesting information about X startup, but I don’t think nouveau is getting past the kernel “oops” and to that point. /dev/dri isn’t created at all, much less anything underneath it like in my nomodeset/“nouveau isn’t running” boots which have:

$ ls -AlRF /dev/dri
total 0
drwxr-xr-x  2 root root      60 May 24 13:34 by-path/
crw-rw----+ 1 root video 226, 0 May 24 13:34 card0

total 0
lrwxrwxrwx 1 root root 8 May 24 13:34 platform-simple-framebuffer.0-card -> ../card0

Note that’s one more proof (if any is needed) that nouveau isn’t running.

Thanks for the suggestion. I tried removing it (zypper also removed kernel-firmware-all) but it made no difference.

Does more than remind me “a little bit”. This looks to be exactly the same problem.

That’s what I’m going to do.

BTW, I tried nosimplefb=1from @suse_rasputin’s other thread. That didn’t help (same nouveau crash/hang), plus it made the boot message console scrolling flickering and very slow.

I’m going to comment on this in my next post.

Off-topic, and I’m probably going to get in trouble for posting this but …

Some cherry-picked quotes from the No boot with kernel 6.2 thread:

It is a 15 years old museum piece…. As there is continous development of hardware and software you can‘t keep backwards compatibility forever…

@suse_rasputin nvidia cards have a product lifetime of 10 years of support. Something has to break at some point, as the user base declines for old cards I suspect the kernel developers don’t have the hardware to test (do you expect them to test everything?) so rely on bug reports to try and fix.

For myself personally, I don’t expect them to have and test all and/or old hardware. I would hope they’d respond to detailed bug reports, including suggestions on how to build debug kernels/modules and capture their output.

And in this case where there’s a post exactly showing the problem and providing a patch for it. There shouldn’t be any “we can’t accept this patch because it might break something else” when the patch clearly shows a coding error (out of bounds array access) that’s obvious from static code analysis and should be fixed even if it coincidentally never caused a problem before. And if there is some secondary or tertiary effect, that’s also a newly exposed bug which needs to be addressed.

In my little world: change nothing, nothing will break. So it’s win7 times? Really?

I’ve been a LInux booster for almost 30 years now, making fun of the Apple and M$ bigots who don’t have any of these problems … because their only support option is, “Your machine is no longer supported. Please buy a new one.”

I accept that there will be problems and I’ll have to help however I can with them, but this issue – or more specifically how it’s been handled – is shaking my faith.

that’s business as usual with TW, change over to leap and done

This is my first experience with Tumbleweed, and it’s something I resisted for a very long time. I feel I was forced into it because I can no longer live with the 3+ year release cycle of Leap. I’m getting killed being stuck on GCC 7 and Python 3.6 and have to move forward.

I’ve always thought along the lines of the following, that there’s a conflict between the claims about Tumbleweed stability and the reality of using it:

From Latest updates killed my GUI (4th week of Jan'23) - Kernel 6.8.1?:

Those are all valid points, of course, but it might be nice to put that on the website before people install it. Currently, it only says “Tumbleweed – Get the newest Linux packages with our rolling release. Fast! Integrated! Stabilized! Tested!” It’s not until you get to the second “learn more” option on the installation page that there are any warnings about who should not use Tumbleweed or what can happen.

And as before, note that I’m not averse to working around problems. I understand the limitations of the OBS automated testing. I’d like to see an openSUSE flavor that’s halfway between Leap and Tumbleweed, but think that between delaying my zypper dups and snapper I can probably achieve that by myself.

Practical question: I originally installed TW from Snapshot20230420 which I think was already 6.2 and this problem. So I can’t snapper back before it. Can I zypper dup to some earlier repo and get back to where it might work on my system?


My practice with Tumbleweed, is to keep one kernel from the previous series. At present, that’s a 6.1 kernel. And I will keep it around until 6.3 kernels show up.

Can anyone point me to documentation on how to do this? And I thought TW always had to be kept in dist-upgrade sync, that you couldn’t mix packages compiled against a different kernel. Or is that only for modules, like if I ever can get the closed-source NVIDIA driver working?

Thanks for listening to my complaints.

I think it’s too simple for anyone to have bothered to document:

# inxi -S
  Host: g5eas Kernel: 6.3.2-1-default arch: x86_64 bits: 64 Console: pty pts/0
    Distro: openSUSE Tumbleweed 20230522
# systemctl list-unit-files | grep purge
purge-kernels.service                  masked          enabled
# zypper ll | grep kernel
21 | kernel-de*               | package | (any)      |
# lsattr /boot/initrd*t
----i---------e------- /boot/initrd-5.19.13-1-default
----i---------e------- /boot/initrd-6.0.12-1-default
----i---------e------- /boot/initrd-6.1.12-1-default
----i---------e------- /boot/initrd-6.2.12-1-default
----i---------e------- /boot/initrd-6.3.2-1-default
# rpm -qa | grep kernel-de | sort

When you wish a newer kernel, you install one with zypper, rpm or yast.
When you wish to remove a kernel, you reverse the process.

You can check /history - openSUSE Download if it still has what you need. I think it keeps about couple of dozens previous snapshots. But note that downgrades are not in generally considered and tested. While packages are explicitly prepared to handle migration from earlier versions (adapting configuration files, changed file locations etc) nothing is done on downgrade.

As alternative to suggested manual kernel installation, you can disable automatic kernel removal.

SDB:Keep multiple kernel versions - openSUSE Wiki

If you comment out multiversion.kernels, no kernel will be removed automatically, it is up to you to clean them as needed.

Hi! I had a very similar problem: Tumbleweed, Quadro FX880M (hwinfo: GT216GLM) and after an update from 6.1.12-1 to 6.3.2-1 I could only boot with option nomodeset into an unusable 640x480 resolution.

I fixed it by booting once with nomodeset and deinstalling xf86-video-nouveau and using xf86-video-nv instead.

Afterwards the laptop boots just fine (without nomodeset) into 1920x1080 and also with a second screen simultaneously enabled also running in 1920x1080 as an extended desktop, like before the fatal update.

The system works fine, even video works without problems. Hope this helps you too!

To clarify: The old Dell M4500 Laptop worked fine with Tumbleweed for several years, occasionally running zypper dup. However, a few weeks ago, this caused the machine to freeze completely shortly after presenting the loading screen. One could barely type in the password before the freeze occurred. The freeze was complete: no reaction to mouse nor keyboard, no blinking leds. Screen remaning as it was.

I had once used the proprietary drivers, but at some point a few years ago, I switched to the nouveau drivers, again due to similar update problems. So at first I tried to switch back to the proprietary drivers, but this only got me a high-resolution black screen with a nicely moving mouse pointer, but nothing else on it.

Update from my situation …

I think I’ve found a fix for the nouveau kernel fault at boot problem. See the ongoing discussion at https://gitlab.freedesktop.org/drm/nouveau/-/issues/213

In addition to this, the wizard responsible for https://download.opensuse.org/repositories/home:/wkazubski:/G03/openSUSE_Tumbleweed/x86_64/ has updated the now abandoned-by-Nvidia “G03” official driver packages there. (I am majorly impressed – I tried to compile the Nvidia sources and found them to be totally broken against current GCC and/or kernels.)

There’s also a matching Leap 15.5 repo. I haven’t had time to try either, but if they work the general opinion has always been that the closed-source Nvidia drivers are superior to their open-source nouveau, etc. counterparts. (I’ve used them for years on earlier openSUSE releases but they were removed from the official repos at or before 15.5, I suppose because Nvidia stopped updating them and the “unbuildable” problems started.)

On that subject, I’d also always read that the “nv” driver (xf86-video-nv??) was old, deprecated, unmaintained, poor-performing, etc. But you, @STurtle , are getting good results with it? Maybe my information was incorrect, and if so it’s nice to know there’s another alternative for this hardware.

Finally, I just yesterday came across @mrmazda’s excellent https://forums.opensuse.org/t/amd-intel-nvidia-x-graphics-driver-primer-third-edition/148576 post but am even more confused after reading it (my fault, not the document’s or its author’s). I had thought that including “nomodeset” on the kernel commandline to disable the nouveau (or whatever) driver resulted in having no GPU acceleration at all, falling back to the framebuffer (FBDEV??) or “vesa” CPU-only code, as indicated by seeing no driver listed in:

$  inxi -Ga         
  Device-1: NVIDIA G92GLM [Quadro FX 3800M] vendor: Dell driver: N/A

and xrandr having only one resolution and no named screens, etc. as I’ve posted before.

But now I’m understanding(??) that instead I’m getting an implicit, unnamed “modesetting” driver that’s built into the X server? That’s believable because the graphics performance I’ve been seeing all this time with “nomodeset” to prevent the nouveau crash hasn’t been terrible. But I don’t do any heavy GPU tasks on this machine (moving windows around, etc. has been fine) so I have nothing to compare it to. I just assumed that the CPU was so fast that pure, un-accelerated, software-only rendering was keeping pace.

So all this has been (for me) for nothing? I don’t need nouveau, nvidia, nv, etc? Not really, because I see no reason to bog down the CPU when the system has a perfectly capable GPU sitting idle. Again, I’d like to understand what’s going on if anyone knowledgeable can explain it.

Thanks for all the continuing interest in the subject, and I still think it’s important because the current Tumbleweed and Leap 15.5 install media doesn’t boot on systems with these GPUs. There’s information on how to handle that in https://en.opensuse.org/SDB:Configuring_graphics_cards and other documentation, but I claim that most users hitting it will just give up, go to another distro, or back to Windows. (And the other distros will have the same problem once the go to 6.3 and beyond if their install media boots to nouveau on Nvidia systems, at least on the older GPUs.)