Troubleshooting Spontaneous System Shutdown

[size=2][FONT=arial]Very likely having a hardware failure which is causing infrequent, random system shut down.

[/FONT][/size]OS: Linux 3.11.10-21-desktop x86_64
System: openSUSE 13.1 (x86_64)
KDE: 4.11.5

AMD Athlon 64 X2 Dual Core Processor 4800+ / 2Gb RAM cause this aging ASUS A8N32-SLI Deluxe is a royal pain in the butt about running 4Gb RAM.
Powered by an Enermax EG851AX-VH EPS 12V 660Watt Power Supply which has got to be a good 5yrs old at least, and probably older. I tend to extract every ounce of useful life I can from things…

Question 1 is… what log files might I look at that would reveal a sudden and spontaneous shut down event?
And might they reveal my culprit or …sigh… is this just going to be an easter egg hunt?

Question 2 is… what s/w tools do we have to monitor processor temps? 'twould be nice to have something that would print a history of the processor temp over time. Maybe correlate a spike in temperature with a sudden shutdown event… I can dump the system at any point and run the BIOS to see what the temp says but that’s kind of pointless I think.

Thanks for your consideration!

gkrellm Nice tool to monitor system. Very configurable, nice interface, many sensors, has alarms, etc.
So that’s a big help. I can at least live monitor while I’m at the keyboard and can define some alarms.

I did find some log files. ** /var/log/warn** seems to reveal the last crash. Nothing particular that I understand other than verifying the crash… the system was up, running and I had been using it up to about midnight or so. Last entry of my activity is 2014-10-01T00:39:59.112009-04:00 and then these lines:

[FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        [FONT=Droid Sans][size=2][FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]2014-10-01T01:04:02.120388-04:00 Popeye kernel: [306925.871323] IPv4: martian source 10.10.40.6 from 173.194.65.102, on dev tun0
        [/size][/size][/size][/size][/FONT][/FONT]2014-10-01T01:04:02.120396-04:00 Popeye kernel: [306925.871351] IPv4: martian source 10.10.40.6 from 173.194.65.102, on dev tun0[/FONT][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T01:04:02.120399-04:00 Popeye kernel: [306925.871373] IPv4: martian source 10.10.40.6 from 173.194.65.102, on dev tun0[/size][/FONT][/size][/FONT]
[FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T01:04:02.231391-04:00 Popeye kernel: [306925.982118] IPv4: martian source 10.10.40.6 from 70.54.124.242, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T01:04:02.302397-04:00 Popeye kernel: [306926.053761] IPv4: martian source 10.10.40.6 from 139.55.45.27, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T01:04:02.388393-04:00 Popeye kernel: [306926.139527] IPv4: martian source 10.10.40.6 from 67.240.103.15, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T01:04:02.393384-04:00 Popeye kernel: [306926.144993] IPv4: martian source 10.10.40.6 from 112.158.174.133, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T09:51:49.271600-04:00 Popeye kernel:     0.000000] ACPI: RSDP 00000000000fb520 00024 (v02 ACPIAM)[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]        2014-10-01T09:51:49.271602-04:00 Popeye kernel:     0.000000] ACPI: XSDT 000000007ffb0100 00044....[/size][/FONT][/size][/FONT]

[size=2][size=2]So I assume it shut down at 0104… then I restarted the system the following morning at 0951 from power off…
[/size][/size]Then this set of details from /var/log/messages
[INDENT=2]
[/INDENT]


   [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]           2014-10-01T00:45:02.056013-04:00 Popeye /usr/sbin/cron[28085]: pam_unix(crond:session): session opened for user root by (uid=0)[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T00:45:02.193481-04:00 Popeye systemd[1]: Starting Session 341 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T00:45:02.194601-04:00 Popeye systemd[1]: Started Session 341 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T00:45:02.812684-04:00 Popeye /USR/SBIN/CRON[28085]: pam_unix(crond:session): session closed for user root[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:00:02.003511-04:00 Popeye /usr/sbin/cron[28430]: pam_unix(crond:session): session opened for user root by (uid=0)[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:00:02.142737-04:00 Popeye systemd[1]: Starting Session 342 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:00:02.143205-04:00 Popeye systemd[1]: Started Session 342 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:00:02.653526-04:00 Popeye /USR/SBIN/CRON[28430]: pam_unix(crond:session): session closed for user root[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.120388-04:00 Popeye kernel: [306925.871323] IPv4: martian source 10.10.40.6 from 173.194.65.102, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.120396-04:00 Popeye kernel: [306925.871351] IPv4: martian source 10.10.40.6 from 173.194.65.102, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.120399-04:00 Popeye kernel: [306925.871373] IPv4: martian source 10.10.40.6 from 173.194.65.102, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.231391-04:00 Popeye kernel: [306925.982118] IPv4: martian source 10.10.40.6 from 70.54.124.242, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.302397-04:00 Popeye kernel: [306926.053761] IPv4: martian source 10.10.40.6 from 139.55.45.27, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.388393-04:00 Popeye kernel: [306926.139527] IPv4: martian source 10.10.40.6 from 67.240.103.15, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.393384-04:00 Popeye kernel: [306926.144993] IPv4: martian source 10.10.40.6 from 112.158.174.133, on dev tun0[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.553469-04:00 Popeye dbus[582]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:02.691095-04:00 Popeye systemd[1]: Starting Network Manager Script Dispatcher Service...[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:03.109806-04:00 Popeye dbus[582]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:04:03.111525-04:00 Popeye systemd[1]: Started Network Manager Script Dispatcher Service.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:15:01.772866-04:00 Popeye /usr/sbin/cron[28732]: pam_unix(crond:session): session opened for user root by (uid=0)[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:15:01.788760-04:00 Popeye systemd[1]: Starting Session 343 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:15:01.789224-04:00 Popeye systemd[1]: Started Session 343 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:15:01.896891-04:00 Popeye /USR/SBIN/CRON[28732]: pam_unix(crond:session): session closed for user root[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:30:01.905423-04:00 Popeye /usr/sbin/cron[28872]: pam_unix(crond:session): session opened for user root by (uid=0)[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:30:01.918156-04:00 Popeye systemd[1]: Starting Session 344 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:30:01.918705-04:00 Popeye systemd[1]: Started Session 344 of user root.[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T01:30:01.971024-04:00 Popeye /USR/SBIN/CRON[28872]: pam_unix(crond:session): session closed for user root[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T09:51:49.270759-04:00 Popeye rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="618" x-info="http://www.rsyslog.com"] start[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T09:51:49.271252-04:00 Popeye kernel:     0.000000] Initializing cgroup subsys cpuset[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T09:51:49.271257-04:00 Popeye kernel:     0.000000] Initializing cgroup subsys cpu[/size][/FONT][/size][/FONT]
  [FONT=Nimbus Sans L][size=2][FONT=Droid Sans][size=1]            2014-10-01T09:51:49.271260-04:00 Popeye kernel:     0.000000] Initializing cgroup subsys cpuacct[/size][/FONT][/size][/FONT]

Again, no clue what this all means. Whats this business about martians? lol0
What are all the opening and closing of user root sessions?
That seems disturbing, as I rarely work as root or even as su… is any of this revealing?
I don’t see a thing that helps me define what caused this last system shut down.

Thanks for your consideration!

On 2014-10-02 06:56, SomeSuSEUser wrote:

> Question 2 is… what s/w tools do we have to monitor processor temps?
> 'twould be nice to have something that would print a history of the
> processor temp over time.

Perhaps sensord. I noticed it yesterday while looking for something else.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Perhaps sensord...

looks interesting.
I’m going to replace a power supply here.
The Enermax is going on 10yrs old and is throwing only 2.85-3.02 volts from the 3.3 supply. 5 & 12v supplies staying within 5%+/-
Monitoring live for about 3 days now, with gkrellm and it’s been consistently alarming since I setup the sensors.
Was able to catch it shutting down today. Got a little warm in the house and I had it churning on some video processsing, so both AMD cores were grinding hard, memory maxed out and I watched the rail keep dropping power as the case & mobo temps went up and it dropped dead somewhere in the 2.8v range.

I opened a window to get some cooler air in the room and set a deskfan directly toward it on high… after a few minutes I restarted the system and the processing of the video. The 3.3v rail stayed up on 3.02volts throughout that operation, no failure… even though approaching a 10% deviation from norm.

Good plug for Enermax though…
10years is a darned good run for a power supply in my opinion, though this thing was in the $200+ range when it was new if I recall correctly.
Found a brand new overstocked one online today…$24.00 lol.

Think I’ll upgrade to a full modular this time though…
Thanks Carlos.

JeepNut

SomeSuSEUser wrote:

<snip>

>
> Again, no clue what this all means. Whats this business about martians?
> lol0
> What are all the opening and closing of user root sessions?
> That seems disturbing, as I rarely work as root or even as su… is any
> of this revealing?
> I don’t see a thing that helps me define what caused this last system
> shut down.
>
> Thanks for your consideration!
>
>

IPv4 Martians can be caused by a badly configured network interface

http://en.wikipedia.org/wiki/Martian_packet

As to the crashing it is probably down to your power supply but I would also
keep an eye on the temps, have you cleaned the CPU heatsink of dust bunnies
lately or the is fan speed set a little slow?

Plus maybe the heatsink needs to be removed and some new thermal paste
applied, some pastes tend to go brittle after a while and do not transfer
the heat a good to the sink.

HTH


Mark
Nullus in verba
Caveat emptor
Nil illigitimi carborundum

On 2014-10-04 05:46, SomeSuSEUser wrote:
>
> Code:
> --------------------
> Perhaps sensord…
> --------------------
>
>
> looks interesting.
> I’m going to replace a power supply here.
> The Enermax is going on 10yrs old and is throwing only 2.85-3.02 volts
> from the 3.3 supply. 5 & 12v supplies staying within
> 5%+/-
> Monitoring live for about 3 days now, with gkrellm and
> it’s been consistently alarming since I setup the sensors.
> Was able to catch it shutting down today. Got a little warm in the
> house and I had it churning on some video processsing, so both AMD cores
> were grinding hard, memory maxed out and I watched the rail keep
> dropping power as the case & mobo temps went up and it dropped dead
> somewhere in the 2.8v range.

Sometimes I run ffmpeg with “trickle” to limit the max cpu power it uses.

> I opened a window to get some cooler air in the room and set a deskfan
> directly toward it on high… after a few minutes I restarted the
> system and the processing of the video. The 3.3v rail stayed up on
> 3.02volts throughout that operation, no failure… even though
> approaching a 10% deviation from norm.

Good hunting!

Do your computer fans speed up with heat?
Is your hardware the same as when you bought the power supply?
Say, did you add/replace hard disks?

Maybe your internal fans do not turn smoothly, and need more electricity
to turn at the same speed, or do not reach the needed speed, and thus
the cpu overheats.

> Good plug for Enermax though…
> 10years is a darned good run for a power supply in my opinion, though
> this thing was in the $200+ range when it was new if I recall correctly.
> Found a brand new overstocked one online today…$24.00 lol.

Maybe it is not bad, simply not enough power.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)