Computer Crashes and I dont understand root cause - see system log.

Hopefully someone can guide me on this annoying issue.
OpenSuse 12.1, see hardware detail in my signature.

I did some searching and it was thought that it could have been the latest FLASH update… but I uninstalled it and the lockup are still occurring.

I can use the pc for some time and if I walk away and come back later it will crash, last line being the watchdog…
see the pastebin link
06/03/12 02:07:07 PM LINUX-PD cpufreq[698] Loading CPUFreq modules - hardware su - Pastebin.com

Thanks
qu1nn

On 2012-06-03 22:26, qu1nn wrote:
>
> Hopefully someone can guide me on this annoying issue.
> OpenSuse 12.1, see hardware detail in my signature.
>
> I did some searching and it was thought that it could have been the
> latest FLASH update… but I uninstalled it and the lockup are still
> occurring.
>
> I can use the pc for some time and if I walk away and come back later
> it will crash, last line being the watchdog…
> see the pastebin link
> ‘06/03/12 02:07:07 PM LINUX-PD cpufreq[698] Loading CPUFreq modules -
> hardware su - Pastebin.com’ (http://pastebin.com/HFBn9fFs)

I don’t know if the watchdog is related, and the last line in the log is
that it is running. Or do you see something else in the display?

A watchdog is some hardware and daemon that checks if the cpu is locked.
For example, a daemon has to be called by the cpu every second, and reset a
counter; the hardware access that counter and decreases it. If the OS gets
stuck the daemon does not run, the counter is not reset, and the count
reaches zero. The hardware sees that and sends a pulse to the computer to
force a hardware reset.

This way the machine will never lock for long, as if an operator is there
to press the button.

You might have such a watchdog enabled not intentionally.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

You might have such a watchdog enabled not intentionally.

How would you check?
Is there a specific application?

amazingly I have been “up” without crashing for some 6 hrs now.
did you see anything funky in the logfile that you would be concerned with?

thanks for your time
qu1nn

The watchdog in the log looks to be normal. It is related to rtkit and pulseaudio. I glanced at the log and nothing seems abnormal to me.

So you need to tell us a lot more about this computer to really be helpful. I doubt a Flash update, loaded into your browser would be the cause of a delayed system lockup. In fact, most ofeten in older computers, its just due to memory problem.

I did find a few curious things in the listing:


[ol]
[li]06/03/12 02:07:08 PM    LINUX-PD        kernel  [    1.036136] cpuidle: using governor ladder [/li][li]06/03/12 02:07:08 PM    LINUX-PD        kernel  [    1.036139] cpuidle: using governor menu [/li][/ol]

This has to do with extreme low power settings, most often associated with a Laptop computer

06/03/12 02:07:08 PM    LINUX-PD        kernel  [    1.178576] ata1.00: limited to UDMA/33 due to 40-wire cable

While this error is associated with a desktop and using the wrong IDE cable.

06/03/12 02:07:08 PM    LINUX-PD        kernel  [    2.324389] [drm]  nouveau 0000:01:00.0: Detected an NV50 generation card (0x0a5000a2)

This seems to indicate an nVIDIA video chipset which if true can cause a problem when not using the nVIDIA proprietary video driver or perhaps a newer kernel version with a better nouveau driver. Your listing has MUCH BS due to the nouveau driver.



[ol]
[li]06/03/12 02:07:13 PM    LINUX-PD        nmb[1984]       Starting Samba NMB daemon ..done [/li][li]06/03/12 02:07:13 PM    LINUX-PD        smb[2004]       Starting Samba SMB daemon ..done [/li][/ol]

Looks like you are using Samba I guess.


[ol]
[li]06/03/12 02:07:14 PM    LINUX-PD        vmware[2120]    Blocking file system#033[71Gfailed [/li][li]06/03/12 02:07:14 PM    LINUX-PD        vmware[2120]    Virtual ethernet#033[71Gfailed [/li][/ol]

Perhaps you don’t have VM support enabled in your BIOS setup?


[ol]
[li]06/03/12 02:07:23 PM    LINUX-PD        dbus-daemon[792]        **** pci ADDING /sys/devices/pci0000:00/0000:00:00.0 [/li][li]06/03/12 02:07:23 PM    LINUX-PD        dbus-daemon[792]        **** pci IGNORING ADD /sys/devices/pci0000:00/0000:00:00.0 [/li][/ol]

This nonsense goes on for a long time and not sure what it is telling us about your startup.

06/03/12 02:14:15 PM    LINUX-PD        dbus[792]       [system] Activating via systemd: service 

Still using systemd as it is not disabled.

06/03/12 03:20:01 PM    LINUX-PD        dbus-daemon[792]        (packagekitd:14135): PackageKit-Zypp-DEBUG: zypp_backend_destroy

This also goes on a while, showing some problems.

While you are worried about the end, I suggest there is more before then, but more computer hardware info including age and condition can be helpful to know.

Thank You,

My hardware is as follows
Pentium D 820, 2.80GHz, Intel D945PSN ATX,
evga geforce 7600GT 256MB PCIE,
Geil DDR2, (1GB x 2) Dual Channel 667MHz
Antec True Power Trio 550W supply
Antec Nine Hundred Case
24" Samsung

yep it is old, I custom purchased and assembled in 2007
Its a nice machine, always on … and used to be quite stable!

There are (2) SATA internal hard drives
there is (1) external firewire LaCie Drive

I have VMware installed, wasnt aware of any specific bios settings for it.
Yep I use Samba

What is systemd?
looks like it should be a part of the 12.1 opensuse os.

Kinfocenter states the following regarding video:
3d accelerator = unknown
Driver = nouveau
renderer= Gallium 0.4 on NVa5
OpenGL/ES version = 2.1 Mesa 7.11

Yast shows: xorg-x11-driver-video-nouveau has version 0.0.16_20110720_b806e3f-2.1.2
saw this:NVIDIA DRIVERS 285.05.09Certified
but I guess that I will have to look into the video drivers a bit more…
When I did the OpenSuse 12.1 install… I just let the defaults go…
I dont do anything fancy with the 3d stuff… but looking at it maybe I should …could just offload the cpu a bit…

any further input would be great
qu1nn

On older systems I have got to ask. When was the last time you really cleaned out your system? Computers have lots of fans and fans fail and suck in dust into every where. I suggest that all PC’s be cleaned out once a yer with a can or two of duster spray, best used outdoors in the bright sunlight and disconnected from all of its cables and power. I normally unplug and re-seat all memory, adapter cards and cables, one at a time. I blow out all heat sinks and in general remove all dirt from the PC. When I take it back in and reconnect it, I leave the side panel off and turn it back on. I use a flash light to inspect each fan to make sure it is rotating. Be careful and do not dismiss the need for such a cleaning on any PC older than one year old, particularly if it has never been cleaned before. You might want to invest in an 80 pin IDE cable (40 signal + 40 ground) for your blu-ray DVD burner or reader. If you can’t find anything else, you might want to consider replacing, or upgrading the memory.

After all of the above, you can look into upgrading your video driver, but make sure its for a clean PC:

Installing the nVIDIA Video Driver the Hard Way - Blogs - openSUSE Forums

LNVHW - Load NVIDIA (driver the) Hard Way from runlevel 3 - Version 1.45 - Blogs - openSUSE Forums

S.A.N.D.I. - SuSE Automated NVIDIA Driver Installer - Version 1.46 - Blogs - openSUSE Forums

Thank You,

On 06/04/2012 04:36 AM, jdmcdaniel3 wrote:
> When was the last time you really cleaned out your system?

+1 ! !

i don’t see any info about system temperature in your log…you should
(if hardware allows) enable logging of CPU temp, at least (have you
installed and enabled [the software package named] “sensors”? [its in
the standard repo, i think])…

when cleaning, be sure and blow the junk out of the power supply (blow
from inside the case and out)…

AND, although your power supply is a hefty critter, even it can wear out
over time…and, eventually cause strange problems (“crashes”) which
don’t point to an understandable root cause…

“I have been “up” without crashing for some 6 hrs now” may go on for
many hours and if suddenly the power supply delivers a tiny microsecond
of too low power, or no power, it will be boom time again…

by the way, when you say “crash” what do you mean?
does the machine power off? (totally off, nothing moving, whirring?)
or the screen freeze but you can move the mouse?
or screen freeze with no mouse or keyboard input?
or, screen and inputs frozen along with the num and caps lock LEDs flashing?
screen blank, but num lock LED can be turned on and off with the “Num
Lock” key?

any of the above and you can Ctrl+Alt+F1 to a terminal and then log in
as root, and then kill X with “init 3” then issue “init 5”?

if the “crash” leaves the machine powered on and keyboard inputs are
accepted: have you tried shutting down with REISUB? or, how are you
regaining control.

does it ever crash when the external drive is not connected?
or, if not being used with the samba supported devices connected?

oh, and when it is shut down for the cleaning, and the panel is off:
gently disconnect and reconnect all the connections to the motherboard
and disk one at a time…remove and replace the RAM…

if after the cleaning and cable/RAM contacts ‘polishing’ (from the
remove-reinstall) you still have problems whose root cause is murky,
then i’d guess you have a weakening power supply or a CPU overheat
problem…the former can be tested with a power supply tester, and the
CPU overheat is often cured by removing the old and applying a fresh
application of CPU grease/paste between the CPU and cooler…

let us know how you get on…please. . .


dd

On 2012-06-04 02:36, qu1nn wrote:
>
> You might have such a watchdog enabled not intentionally.
>
>
>
> How would you check?
> Is there a specific application?

I have never used one. It would be a kernel thing. And it should reboot the
machine, not crash it.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

I will look into the sensors.
I cleaned out the computer and it wasnt that bad, still a bit dusty.
Disconnected video, ram, drives and reconnected them.
Set the 4 internal fans (to the case) from Low to MEDIUM

the following will happen:
nothing on screen…
pc still on
caps lock and scroll lock will blink at 1/s
there is no response from the keyboard
I can shut the computer down by holding the reset button.

Havent tried, I use the external drive for some of my shares.

will look into these… thanks for the detailed input.

qu1nn

On 2012-06-05 02:56, qu1nn wrote:
> caps lock and scroll lock will blink at 1/s

That’s a kernel panic. If you can go to console #10 you might see the cause
there.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

sorry for not following up in a timely manner, been busy with other things
I think that it had something to do with the video in that when monitoring the thermals of the video card it would be stable at 127F and then randomly take off. I checked the fan during one of these run-aways and the fan WAS ON…
maybe some sort of internal calculation that hangs up? no idea. but the temp would spike to over 220F then the computer would lock up.

things “appear” to be better ( meaning I have been up for 5 days without a crash, instead of several times a day.!)

  1. got rid of the nouveau video drivers that nvidia autoinstalls and installed the nvidia repository, installed, rebooted
  2. yast software > updated alot of files that had older revisions and rebooted.

I will post back if something happens
thanks for your time and efforts
qu1nn

220F is very bad but it sounds like using the proprietary video driver is working better for you. Any time over heating is a problem, you got to make sure all heat sinks are clear of heavy dust which you can use duster spray to blow out. Clogged heat sinks can/will cause anything to over heat. In any event, it is good to hear you are working better now.

Thank You,