Hi,
I have a suse 12.1 x64 installation (kde) up and running on a desktop pc.
Recently after applying some online-update of the suse-repository the GUI fails to load after reboot. Only shell available.
Luckily I have a rather fresh backup (partition image) and that works.
But after the 3rd time applying online-update - xserver not working - backup restoring in 5 weeks I would like to solve the problem rather than doing without any updates.
Unfortunately I’m at a loss what is the problem. Looking through log-files did not help me much.
I thought the message “xhci-hcd failed to enable msi-x” could be the problem. But according to forum postings this has something to do with USB 3.0. I have no USB 3.0 device.
The only thing I can think of is the graphics-card (Nvidia Geforce GTX 560). The nouveau-driver with the suse 12.1 DVD did not work during installation so I had to download the driver from the Nvidia Homepage. That worked fine.
After having problems with the online-update I updated the driver to the newest version. It also worked fine (in my backup-system) but after online-update the problem remains the same. The Nvidia driver seems to load, although log says it taints the kernel.
Typically the proprietary nvidia driver is built (or packaged) with a specific kernel version. If the kernel version changes, then typically the packaged version of the proprietary driver needs to be changed to a version that works with the new kernel (or one needs to rebuild the proprietary driver against the new kernel source).
If your update installed a new kernel, then it could be you need to rebuild your graphic driver (or find an rpm with the driver already rebuilt).
Sometimes with an xorg-X11-server update, the same problem can occur (albeit less likely).
I tried that already. I installed the newest nvidia driver after online-update and it compiled itself against the new kernel. So no progress there.
Is there a tool or a logfile I can check to get more information on the problem? I looked in /var/log but no information why xserver died.
The log files to check are /var/log/Xorg.0.log and /var/log/Xorg.0.log.old. Sometimes /var/log/messages will provide helpful information. Sometimes ‘dmesg | grep less’ will provide useful information.
I know its a PAIN to type the exact file names of the log files that you looked at, but I note I have had a number of times in the past, users state they “looked in /var/log” and they looked all right but they never opened one file ! They simply looked at the directory and noted, 'yep, there is files in there ! ’ So its a general statement that leaves someone providing support wondering. Just what did one look at ? Please don’t take this the wrong way, its just difficult to help when generalities are provided.
Now you note you updated with the ‘newest’ nvidia driver. What exact technique did you follow in updating ? What version was that ‘newest’ driver ? I recall a number of times where users stated they updated with the ‘newest’ driver, only after a waste of a dozen posts to find out what they thought was the ‘newest’ driver, was 3 versions old. Its far better to state, “I installed the newest 295.20 nvidia proprietary driver via a custom compilation, with no errors” . I KNOW that it is a REAL PAIN to type those details, as it takes more time, but I do believe that a bit of time spent there, solves time in the long run in trying to find the root cause of a problem.
OK, I know my Information about the driver was not precise. I downloaded the driver yesterday, so I assume it is the “newest”. It is an executable binary doing everything for me.
I checked the logfiles you proposed, but did not find anything helpful apart from the already mentioned.
But this becomes secondary now, because I’ve got xserver running again!
I don’t know exactly what did the trick. I reinstalled the nvidia-driver several times (old and new), finally uninstalled it, updated the nouveau-driver from the suse-repository and then reinstalled the nvidia-driver (of course rebooting inbetween every step). Only after the last step the system booted with the GUI coming up and kde starting.
So this leaves me with a mixture between relief and bad feeling about the next online-update supplied by suse (there are again 5 Updates available…).
I am very glad you succeeded. I do note that one can encounter strange hiccups with the proprietary nvidia driver wrt Kernel Mode Setting, and I typically ensure KMS is set OFF via YaST, and also I ensure the nouveau driver is blacklisted, and I boot with the ‘nomodeset 3’ boot code when building the proprietary driver in run level 3. I am told that not all of those steps are necessary, but I have kept doing them for now, as I do not know how reliable the mitigating/replacement implementations (for what I do) work.
One thing I do, when troubleshooting, is have my digital camera handy, and a paper notebook. And I painfully document everything I try. This one means I do not ‘fiddle’ (at least not without having a detailed step-by-step list or picture showing EXACTLY what was changed) and it means I can usually reproduce failed behaviour and successful behaviour, as the stops are documented on Camera or on paper.
But I digress, and my apologies for that.
The bottom line is this is working, and I am glad to read that.
The postinstall script of the nvidia driver (in package x11-video-nvidiaG02) checks if KMS in set in /etc/sysconfig/kernel and if so, it turns it off an rebuilds initrd:
**$ rpm -q --scripts x11-video-nvidiaG02 | grep -A 5 "recreate initrd without KMS"**
# recreate initrd without KMS, if the use of KMS is enabled in initrd
if grep -q NO_KMS_IN_INITRD=\"no\" /etc/sysconfig/kernel; then
sed -i 's/NO_KMS_IN_INITRD.*/NO_KMS_IN_INITRD="yes"/g' /etc/sysconfig/kernel
mkinitrd
fi
exit 0
But it doesn’t look in initrd. In other words, it looks if KMS is supposed to be disabled (from a YaST point of view) but not if it really is. This might explain why it fails in some cases (like if you rebuild initrd yourself or edit /etc/sysconfig/kernel manually without rebuilding initrd).
The package also includes the /etc/modprobe.d/nvidia.conf, which blacklists nouveau: