diagnose KDE or nouveau thing crash?

Dear Opensuse Users: I think I’m seeing a KDE thing in my 13.1x64 install on a big opteron machine. I come in in the morning and the screen is blank. If I tap the keyboard a few times, I get a login: prompt. So KDE is gone - the machine doesn’t seem to be hung, but I can’t startx if I log on. So I guess it could be a Nouveau driver issue.

What would be the recommended troubleshooting procedure? Just look in all the /var/log’s that I can?

TYAN4:/home/patti # ipmiutil alarms
ipmiutil ver 2.88
ialarms ver 2.88
-- BMC version 2.4, IPMI version 2.0 
ipmiutil alarms, completed successfully
TYAN4:/home/patti # ipmiutil sel -e
ipmiutil ver 2.88
isel: version 2.88
-- BMC version 2.04, IPMI version 2.0 
SEL Ver 0 Support 03, Size = 512 records (Used=24, Free=488)
RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data]
ipmiutil sel, completed successfully

Thank You!

Check; dmesg | grep -i segfault
There was a broken systemd patch released a while back, it would show up as a segmentation fault in dmesg and prevent any applications from starting, including X.

Then; systemctl status xdm

Then; grep -i “(EE)” /var/log/Xorg.0.log

Wow, thanks Miuku! These seem to be dated at about the time I turned the monitor on to log into the computer and KDE was gone… I should have kept better track of exact times. I’m not sure whether these events happened as I tried to log into the computer (and got only a login: console prompt) or after I rebooted to get KDE back.

TYAN4:/home/patti # dmesg | grep -i segfault
TYAN4:/home/patti # systemctl status xdm
xdm.service - LSB: X Display Manager
   Loaded: loaded (/etc/init.d/xdm)
   Active: active (running) since Tue 2015-02-24 14:57:53 PST; 1h 31min ago
  Process: 20424 ExecStart=/etc/init.d/xdm start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/xdm.service
           ├─20496 /usr/bin/kdm
           └─20565 /usr/bin/Xorg -br :0 vt7 -nolisten tcp -auth /var/lib/kdm/AuthFiles/A:0-oUamcb

Feb 24 14:57:52 TYAN4 systemd[1]: Starting LSB: X Display Manager...
Feb 24 14:57:53 TYAN4 xdm[20424]: Starting service kdm..done
Feb 24 14:57:53 TYAN4 systemd[1]: Started LSB: X Display Manager.
Feb 24 14:57:53 TYAN4 kdm_config[20497]: Multiple occurrences of section [General] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Feb 24 14:57:53 TYAN4 kdm_config[20497]: Multiple occurrences of section [Xdmcp] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Feb 24 14:57:53 TYAN4 kdm_config[20497]: Multiple occurrences of section [X-*-Core] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Feb 24 14:57:53 TYAN4 kdm_config[20497]: Multiple occurrences of section [X-*-Greeter] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Feb 24 14:57:53 TYAN4 kdm_config[20497]: Multiple occurrences of section [X-:*-Core] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Feb 24 14:57:53 TYAN4 kdm_config[20497]: Multiple occurrences of section [X-:0-Core] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Feb 24 14:57:53 TYAN4 kdm[20496]: plymouth is running
Feb 24 14:57:53 TYAN4 kdm[20496]: plymouth is active on VT 7, reusing for :0
Feb 24 14:57:53 TYAN4 kdm[20496]: plymouth should quit after server startup
Feb 24 14:57:57 TYAN4 kdm[20496]: Quitting Plymouth with transition
Feb 24 14:57:57 TYAN4 kdm[20496]: Is Plymouth still running? no
Feb 24 14:57:58 TYAN4 kdm[20656]: :0[20656]: pam_unix(xdm-np:session): session opened for user patti by (uid=0)
TYAN4:/home/patti # grep -i "(EE)" /var/log/Xorg.0.log 
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
   250.917] (EE) Failed to load module "nvidia" (module does not exist, 0)
TYAN4:/home/patti # 

Thank you very much for all your help so far… This is starting to be an issue. This time it happened while I was watching, and I’m afraid it’s due to some system update.
I was doing normal KDE stuff in Dolphin and suddenly the screen went black, and I received:

[0.001000] tsc: Fast TSC calibration failed

Welcome to Opensuse ...  (blah, blah, bottle)  3.11.10-25-desktop tty1
login:

The only way out of this appears to be power cycle. Logging on as root and issuing a reboot command doesn’t work.

Does anyone know what this means? This is a 64-cpu motherboard (opteron) which works really hard doing matrix inversions (it’s not a fileserver or any other kind of server).
Is there a better (more robust) kernel to use than “desktop?”

It only started doing this in the last month or so, and it’s been up and running doing the same hard work for ~6 months. No hardware errors at all.

requested checks (I think these are all errors during reboot):

patti@TYAN4:~> su
Password: 
TYAN4:/home/patti # dmesg | grep -i segfault
TYAN4:/home/patti # systemctl status xdm
xdm.service - LSB: X Display Manager
   Loaded: loaded (/etc/init.d/xdm)
   Active: active (running) since Tue 2015-03-03 11:28:25 PST; 7min ago
  Process: 20353 ExecStart=/etc/init.d/xdm start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/xdm.service
           ├─20477 /usr/bin/kdm
           └─20524 /usr/bin/Xorg -br :0 vt7 -nolisten tcp -auth /var/lib/kdm/AuthFiles/A:0-B7z9hc

Mar 03 11:28:24 TYAN4 systemd[1]: Starting LSB: X Display Manager...
Mar 03 11:28:25 TYAN4 xdm[20353]: Starting service kdm..done
Mar 03 11:28:25 TYAN4 systemd[1]: Started LSB: X Display Manager.
Mar 03 11:28:25 TYAN4 kdm_config[20478]: Multiple occurrences of section [General] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Mar 03 11:28:25 TYAN4 kdm_config[20478]: Multiple occurrences of section [Xdmcp] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Mar 03 11:28:25 TYAN4 kdm_config[20478]: Multiple occurrences of section [X-*-Core] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Mar 03 11:28:25 TYAN4 kdm_config[20478]: Multiple occurrences of section [X-*-Greeter] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Mar 03 11:28:25 TYAN4 kdm_config[20478]: Multiple occurrences of section [X-:*-Core] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Mar 03 11:28:25 TYAN4 kdm_config[20478]: Multiple occurrences of section [X-:0-Core] in /usr/share/kde4/config/kdm/kdmrc. Consider merging them.
Mar 03 11:28:25 TYAN4 kdm[20477]: plymouth is running
Mar 03 11:28:25 TYAN4 kdm[20477]: plymouth is active on VT 7, reusing for :0
Mar 03 11:28:25 TYAN4 kdm[20477]: plymouth should quit after server startup
Mar 03 11:28:29 TYAN4 kdm[20477]: Quitting Plymouth with transition
Mar 03 11:28:29 TYAN4 kdm[20477]: Is Plymouth still running? no
Mar 03 11:28:30 TYAN4 kdm[20647]: :0[20647]: pam_unix(xdm-np:session): session opened for user patti by (uid=0)
TYAN4:/home/patti # grep -i "(EE)" /var/log/Xorg.0.log
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
   181.189] (EE) Failed to load module "nvidia" (module does not exist, 0)
TYAN4:/home/patti # 

Check the NVIDIA driver Which flavor are you running? Seems to me, If I recall right, that you are running some high end NVIDIA cards maybe you need to move to the new flavor G04.

Thank you very much for the reply. Oh… no, that’s my home machine that’s using crazy cards. This is just an old PNY Nvidia clone GeForce 8400GS, but it also has a motherboard graphics (disabled, but it shows up in sysinfo:/ as MGA G200eW WPCM450).

I should have googled as soon as I got that fast recalibration failure… Seems something is going on… but I don’t understand.
https://lkml.org/lkml/2014/3/10/137
http://comments.gmane.org/gmane.linux.kernel/1663687

EDIT: Oops, yast say I’m running - oops again, there used to be a “G0x” number but now I don’t see any… it says** xf86-video-nv** and the xorg-x11-driver-video-nouveau
| xf86-video-nv - NVIDIA video driver for the Xorg X server
|
|

nv is an Xorg driver for NVIDIA video cards. The driver supports 2D acceleration and provides support for the following framebuffer depths: 8, 15, 16 (except Riva128) and 24. All visual types are supported for depth 8, TrueColor and DirectColor visuals are supported for the other depths with the exception of the Riva128 which only supports TrueColor in the higher depths.

It is probably running nouveau. nv is rather old and really only a backstop

Try the proprietary driver it may be more stable you never know

I figured that since this box is a number cruncher it would off load some calcs to the GPU. but that does require the proprietary driver(s) and of course this software has to be written for it. But then the software may use it if available. So it calls for the propritary driver and if it speeds up processing drop a hot new NVIDIA in and really see it fly.

This morning KDE was gone (as described in this thread) and I was at a login prompt - I tried logging in as root, then issuing “reboot”

Failed to open /dev/initctl: No such device or address
Failed to talk to init daemon

So hard-bounced it. I am turning off desktop effects and uninstalling KVM, etc., to see if I can get stability back. Can anyone think of a logfile that might have information on what is actually causing the problem(s)?

Thank You!!

had some what similar problems once, turned out the fan had died on my old 6800+ card. Ran fine until the chip over heated then boom. Bought a 630 based card (Asus) with a honking big heat sink and no fan cost about $65 good card.

Run the NVIDIA-configure program it will show you the temp

Status: I turned off Desktop Effects, and uninstalled Xen/KVM. This seems to have cleared up the problem <<fingers crossed>> :slight_smile:

Oh Xen can do odd stuff and you had it mixed with KVM. Definitely a witches brew,.

Well, that may have been the problem. Trying to rush, not reading documentation, you know the story. And I agree about big heat sinks vs. fans!