System Monitor

Morning All,

I’ve searched the forums (and the internet) for “System Monitor” and “log files” and am not getting anywhere, so maybe I’m the first person in the world to have this problem - though I doubt it! I’m having some system stability issues - random kernel panics with no log messages at all - and I was hoping to log some system parameters to a file to try and work out what the system was doing next time it freezes.

The System Monitor seems to have a “log to file” option when you drag a sensor on to a worksheet, but there is no way of creating a new file when it comes up with the file browser. (Obviously?) if I create an empty text file for it, it doesn’t use it. Does this feature work? The Help doesn’t help. Am I being a bit dim?

Ta,

Nick

my personal opinion is you are more likely to find the cause of your
freezes by searching these fora for the terms

freeze “kernel panic”

or similar…

then without dragging a sensor to a worksheet or digging through logs
you would learn that the most often cause of a kernel panic is faulty
ram…

if you search and read enough you would probably find where someone
would suggest you carefully remove the ram and gently clean the metal
contacts with a pencil eraser…then boot from an openSUSE CD or DVD
and run memtest…over night, at least…

if all your ram passes, then it is likely some other hardware
problem…i’d start by . . .

well, search around…


goldie
Give a hacker a fish and you feed him for a day.
Teach man and you feed him for a lifetime.

Well, actually some of the searches resulted in people suggesting that logging information (usually by installing some logging software) might lead to some clues as to what the computer was up to when it fell over. I’m trying to get as much information as possible - for example, if it always crashes when the PC is overloaded and the temperature is high, then that would indicate a cooling issue. Other than that, there seems to be as many solutions to system freezes as there are people who’ve had the problem - I’m struggling to work out which ones are worth looking into and which aren’t. Examples include Beagle, Firefox, Thunderbird, compiz, graphics drivers, some network services, etc., etc.

Beside all that, I would still like to know if the sensor logger works on System Monitor or whether it’s a known bug. I can’t find anything useful in Google / forums. And I would like to be able to log information to file regardless of my system stability issues.

Was going to run memtest tonight, so great minds thing alike!

Ta for reply.

Nick

> Beside all that, I would still like to know if the sensor logger works
> on System Monitor or whether it’s a known bug.

i do not know anything about that…

as for system logging, i my opinion, you get get much better than atop…

that and the usual in /var/logs is pretty complete…(so complete it
is not ‘easy’ to use if you don’t also know about grep…)

oh, and search/find how to install lm_sensors and running
‘sensors-detect’ as root…if your bios/mb works well with those it
will shove some heat info into some logs, somewhere (maybe ‘messeges’
i don’t recall this second…

my bottom line: i personally wouldn’t be trying to solve this “what is
causing it” problem only via GUI things… computers have been shoving
health info into log files before the mouse was invented…


goldie
Give a hacker a fish and you feed him for a day.
Teach man and you feed him for a lifetime.

Still, it would be nice if the logger worked, seeing it is there!
That aside, the memory checks out ok, but I am getting a lot of SMART messages (many to do with ECC correction) on both of my disks, and the computer seems less stable during massive disc usage (for example md5deep -rl /home/nick - which contains 40GB of data). So it could be my hard drives are knackered.

> a lot of SMART messages (many to do with ECC correction) on both of
> my disks

i think you should (quickly) back up your data…and THEN deal with
the stability issues…

it could be a heat issue…
you might try cleaning out the cat hair and chicken bones…

this might help:
http://www.endpcnoise.com/cgi-bin/e/computercooling.html

or you might be having an overtaxed or failing power supply…

did atop show anything?


goldie

Well, the temperatures reported appear to be in Fahrenheit - the drives are cool to touch, so I don’t think it’s a heat issue, sadly.

I’ve re-installed on a completely different hard drive on an IDE cable (as opposed to the original installation on two SATA drives), and strangely, this drive (brand new) is also reporting lots of ECC errors and heat messages. I’m pretty sure all three of my hard drives can’t be broken in exactly the same way! Predictably, the reinstalled system is also horrendously unstable. It’s almost impossible to run the updates without it hanging. Again, no messages at all.

This system has been running SuSE 10.3 and Windows XP (almost - it crashed maybe once every three or four months) flawlessly for two years, so I’m finding it hard to believe it’s a hardware issue. Windows XP still runs flawlessly, in fact, but it’s not my cup of tea really.

Haven’t tried atop yet - trying to get the system to install anything is a bit of a nightmare at the moment!

sounds more and more like a hardware problem…

everyone has there own set of hardware diagnosis tricks…
i’ve given mine several different places here and don’t wanna retype
them …

HINT: use google’s site specific trick to find stuff…like this:

site:forums.opensuse.org “hardware problem” heat “power supply” cables

turns up a fairly good rundown of hardware problem sleuthing (though
the symptoms [uncommanded reboot] is different from yours [kernel panic]:

http://forums.opensuse.org/hardware/386304-reboots-during-websearch.html

change around, add or subtract terms [freeze stall “kernel panic” hang
PSU “power supply” heat cable shorts cracked {many more}] in the
search string and find many other such discussions in these
forums…leave out the “site:forums.opensuse.org” and you will find
millions of hits on the net–including some good step-by-step decision
matrices…

by the way, you wrote “the memory checks out ok” and i wonder how long
you ran memtest…overnight is just barely enough to get
started…if you have LOTS of RAM you should let it run 24 hours, at
least…

many say (google it) that it is best to pull RAM and leave only one
stick to test at a time…and, once each stick has proven itself (at
least “overnight”) then move the sticks around until you find which
slot is messed up…

it just takes one TINY electrical fault to mess up the whole
machine…frayed cable and vibration, tiny crack in mother board and
heat changes enough to open a circuit…

on and on and on…if i were you i’d spend time tracking down the
hardware problem…

btw, kernel panic is most often (but not exclusively) caused by bad
RAM (which might also mean bad RAM slot, bad circuits to/from RAM, etc
etc etc etc)


goldie

I’m pretty sure it’s not a heat issue - I barely see temperatures above 30C in any of the components, and everything is cool to touch. The RAM was tested for three or four days, sadly both together as they’re dual channel sticks, so taking one out isn’t an option. It ran through all 2GB at least a few hundred times, so I’m suspecting that the RAM isn’t the issue! And, as I said before, Windows XP runs flawlessly.

I’ve just got a stable install going on a separate hard drive on IDE, and completely removed the RAID drives from the system. So far, it has crashed once before I did a system update (seems normal for SuSE - seems to do this on almost any machine I install on!). Since then it’s been running fine without problems. But then it used to run for four or five days sometimes. I might try cleaning all the dust out of the machine too. Oh, and the new hard drive also reports ECC SMART messages. My other machine (which runs SuSE flawlessly) doesn’t. I’m hoping the ECC messages are red herrings, or else I have a large stack of duff hard drives!

Well, the power supply exploded last week. Since I’ve bought a new one, everything seems ok even with the RAID array plugged back in, so maybe it was just on the way out. Will try the RAID presently. However, I still haven’t had anyone answer original question - does the Log to File work on System Monitor? It doesn’t for me.

nickelarse wrote:
> Well, the power supply exploded last week… However, I
> still haven’t had anyone answer original question - does the Log to File
> work on System Monitor? It doesn’t for me.

yep, flaky power can cause all kinds of hard to track down problems…
congrats to get it fixed…finally…

as to your your original Q, when you say “The ‘System Monitor’” just
what do you mean? that is, there must be about a thousand “system
monitors” loose…as far as i recall i’ve never added any but Menu >
System > System Monitor shows THIRTEEN programs available, including
Gnome System Monitor and KDE4 System Monitor…

perhaps begin a new thread, and this time don’t confuse the issue by
having TWO issues (a freeze problem and a application problem)…and,
specify exactly which OS version, DE (KDE3, KDE4, Gnome, Xfce, etc) as
well as the specific “System Monitor” application won’t do what for
you (i can’t tell for sure if drag and drop works, or not…exactly
what are not able to do?)


platinum
Give a hacker a fish and you feed him for a day.
Teach man and you feed him for a lifetime.

Note: Accuracy, completeness, legality, or usefulness of this posting
may be illusive.

To be fair, I didn’t post two problems originally - I just mentioned that while trying to solve a stability issue I was trying to use the System Monitor to log to a file. All the people trying to help then tried to help me solve the stability issue rather than the actual question! I should have just not mentioned the stability issue…

It wasn’t that hard to solve in that the PSU exploded, so I had no choice in replacing it! It was actually the last thing I suspected of being faulty.

Anyway, sorry for not being clear (in many ways) - the System Monitor I’m talking about is the KDE 4 one.

Ta,

Nick

> the KDE 4 one.

i don’t use KDE4, so i pass…

i have read that there are many not-yet-working features in KDE4
(which is one of the reasons i’m still using KDE3.5.7)…

therefore, if you look on the KDE site <http://kde.org/> you might
find the answer to your question already exists in some long list of
bugs or dreams not yet implemented…

or, if after looking you don’t find it, you might ask in their forum:
<http://forum.kde.org/>


platinum

As for my system stability issue - I’ve started booting off my RAID array again, and it came back within 20 minutes. So I guess it had nothing to do with the power supply after all. I’ll start a new thread if I can’t find anything about RAID rendering the system unusable.