Suse 10.1 Server Dies

sd_read · December 13, 2013, 8:52pm

I have a couple of servers based on 10.1, one is a main server the other does nightly backups and provides printing services and some other things. These are home use. They are many years old (since 10.1 was released) but they run 24/7 flawlessly other than a few drive crashes over the years.

To answer the first question I am sure many are thinking, I haven’t upgraded because when they were new (I replaced an NT4.0 server at the time) I was just happy to get them running then later on they worked as needed so why bother. Today I would like to for security reasons but now this hardware is too old and incompatible.

Also, I have been running a home server for probably 15 years (Win 95) and the only time I have ever lost data was when I migrated from Microsoft to Linux.

So, I am having a problem with the main server. That is it will just suddenly disappear (hang) and at this point I don’t know why. Nothing has changed either with the servers or anything else around them.

Over the last week this has happened 3 times where the only choice is to power cycle. Something is clearly wrong as they never hang.

Here is the log file of the most recent showing the last few entries before the hang:

Dec 12 17:49:56 SambaServer smbd[12549]:   read_data: read failure for 4 bytes to client 192.168.0.51. Error = No route to host
Dec 12 18:01:23 SambaServer syslog-ng[2505]: STATS: dropped 0
Dec 12 18:21:07 SambaServer zmd: NetworkManagerModule (WARN): Failed to connect to NetworkManager
Dec 12 18:21:39 SambaServer zmd: Daemon (WARN): Not starting remote web server
Dec 12 18:26:06 SambaServer zmd: ShutdownManager (WARN): Preparing to sleep...
Dec 12 18:26:08 SambaServer zmd: ShutdownManager (WARN): Going to sleep, waking up at 12/13/2013 18:11:06

My question is, what can I do next to determine what is happening?

Thank you for any help.

hcvv · December 13, 2013, 10:04pm

As it is unlikely that anything changed in the software (simply because there are no updates for years alrready), IMHO it is something in the hardware. And because it seems to happen intermittently, it will be difficult to find. But the messages point to the network. Network card trying to die?

hendersj · December 13, 2013, 10:37pm

I would be looking at the output from dmesg for indications of a hardware
issue.

Jim

Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

sd_read · December 15, 2013, 10:05pm

Thank you for your replies.

If you are referring to the “Failed to connect to NetworkManager” I think this is normal as both servers are identical and I get this same error all through the log files on both servers.

Regarding the dmesg file I am not sure where to find this log file. It is not in /var/log nor do I see anything in yast?

Typing dmesg in a terminal window shows a bunch of stuff but nothing that sounds like a problem.

Without knowing where the log file is I guess I will run it right after rebooting from the next hang.

Here is a sample from dmesg in a terminal (obviously the server is not hung at this point):

SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=224.0.0.251 LEN=67 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=47 SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:19:db:48:07:4f:08:00 SRC=192.168.0.51 DST=224.0.0.251 LEN=94 TOS=0x00 PREC=0x00 TTL=255 ID=29625 PROTO=UDP SPT=5353 DPT=5353 LEN=74
SFW2-INint-ACC-RPC IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=16835 DF PROTO=TCP SPT=890 DPT=2049 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0008B43F0000000001030307)
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=224.0.0.251 LEN=67 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=47
SFW2-INint-ACC-RPC IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25431 DF PROTO=TCP SPT=830 DPT=2049 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A000F56780000000001030307)
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:19:db:48:07:4f:08:00 SRC=192.168.0.51 DST=224.0.0.251 LEN=94 TOS=0x00 PREC=0x00 TTL=255 ID=17209 PROTO=UDP SPT=5353 DPT=5353 LEN=74
SFW2-INint-ACC-TCP IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39650 DF PROTO=TCP SPT=48487 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A001249BC0000000001030307)
SFW2-INint-ACC-TCP IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=17446 DF PROTO=TCP SPT=48488 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A00128DDE0000000001030307)
SFW2-INint-ACC-TCP IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63994 DF PROTO=TCP SPT=48489 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A001336700000000001030307)
SFW2-INint-ACC-RPC IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=36065 DF PROTO=TCP SPT=794 DPT=2049 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0013BE770000000001030307)
SFW2-INint-ACC-RPC IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35416 DF PROTO=TCP SPT=842 DPT=2049 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A001668640000000001030307)
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=190 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=170
SFW2-INint-ACC-TCP IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52945 DF PROTO=TCP SPT=48628 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A0018321F0000000001030307)
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=110 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=90
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=181 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=161
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=123 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=103
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=181 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=161
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=123 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=103
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=95 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=75
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=190 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=170
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=190 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=170
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=190 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=170
SFW2-INint-ACC-RPC IN=eth1 OUT= MAC=00:d0:c9:98:27:53:00:19:db:6b:46:a0:08:00 SRC=192.168.0.50 DST=192.168.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=45869 DF PROTO=TCP SPT=702 DPT=2049 WINDOW=14600 RES=0x00 SYN URGP=0 OPT (020405B40402080A001A289C0000000001030307)
SFW2-INint-DROP-DEFLT IN=eth1 OUT= MAC=01:00:5e:00:00:fb:00:26:ab:8c:f8:b6:08:00 SRC=192.168.0.54 DST=224.0.0.251 LEN=190 TOS=0x00 PREC=0x00 TTL=255 ID=0 DF PROTO=UDP SPT=5353 DPT=5353 LEN=170

hendersj · December 16, 2013, 12:21am

On Sun, 15 Dec 2013 21:06:02 +0000, sd read wrote:

> Here is a sample from dmesg in a terminal (obviously the server is not
> hung at this point):

All that’s showing is iptables information, not anything useful. You
may need to tweak the log settings to disable this. There are bound to
be useful messages in there somewhere, but the iptables info is drowning
it out.

Jim

–
Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

robin_listas · December 16, 2013, 12:53am

On 2013-12-16 00:21, Jim Henderson wrote:
> On Sun, 15 Dec 2013 21:06:02 +0000, sd read wrote:
>
>> Here is a sample from dmesg in a terminal (obviously the server is not
>> hung at this point):
>
> All that’s showing is iptables information, not anything useful. You
> may need to tweak the log settings to disable this. There are bound to
> be useful messages in there somewhere, but the iptables info is drowning
> it out.

Instead of looking at dmesg, look at the contents of
“/var/log/messages”, it should not have the iptables messages, which
instead should go to “/var/log/firewall”

–
Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

sd_read · December 16, 2013, 1:21am

Hi Jim, I don’t know how to tweak the log settings as I don’t even know where to find dmesg. What I posted was the results of typing dmesg in a terminal.

I looked through yast and googled this but was not able to figure out how I would change the log settings.

Carlos, sorry I wasn’t clear in my initial post but what I pasted there is from /var/log/messages. It is what I believe what was last recorded before it hung.

Steve

robin_listas · December 16, 2013, 2:03am

On 2013-12-16 01:26, sd read wrote:

> Carlos, sorry I wasn’t clear in my initial post but what I pasted there
> is from /var/log/messages. It is what I believe what was last recorded
> before it hung.

Oh. Well, you can send iptables messages to the firewall log instead,
but your machine being 10.1 I don’t know what syslog daemon you use.
Knowing that, I might find the rule to add.

–
Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

hendersj · December 16, 2013, 4:27am

On Sun, 15 Dec 2013 23:53:06 +0000, Carlos E. R. wrote:

> On 2013-12-16 00:21, Jim Henderson wrote:
>> On Sun, 15 Dec 2013 21:06:02 +0000, sd read wrote:
>>
>>> Here is a sample from dmesg in a terminal (obviously the server is not
>>> hung at this point):
>>
>> All that’s showing is iptables information, not anything useful. You
>> may need to tweak the log settings to disable this. There are bound to
>> be useful messages in there somewhere, but the iptables info is
>> drowning it out.
>
> Instead of looking at dmesg, look at the contents of
> “/var/log/messages”, it should not have the iptables messages, which
> instead should go to “/var/log/firewall”

I usually find that hardware-related error messages are somewhat clearer
in dmesg. Maybe that’s only me, though.

Jim

–
Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

hendersj · December 16, 2013, 4:33am

On Mon, 16 Dec 2013 00:26:01 +0000, sd read wrote:

> Hi Jim, I don’t know how to tweak the log settings as I don’t even know
> where to find dmesg. What I posted was the results of typing dmesg in a
> terminal.
>
> I looked through yast and googled this but was not able to figure out
> how I would change the log settings.
>
> Carlos, sorry I wasn’t clear in my initial post but what I pasted there
> is from /var/log/messages. It is what I believe what was last recorded
> before it hung.

Check in YaST to see if the magic sysrq keys are enabled - if they’re
not, enable them - then when it hangs, hit:

SysRq+S (Forces the kernel to sync the disks)

SysRq+U (Forces the kernel to remount the partitions read-only)

SysRq+B (causes the system to reboot)

That should get the last of what is in the logs flushed to disk, might
give us more to go on.

10.1 is a long ways out of support, so it’s going to be somewhat
difficult to point you precisely where you need to make those settings -
on 13.1, it’s at YaST -> Kernel Settings; there’s a checkbox to enable
the keys. You can also enable them from the terminal with:

su -
echo 1 > /proc/sys/kernel/sysrq

This won’t persist after a reboot, though.

Jim

Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

knurpht · December 16, 2013, 12:32pm

I’m with Henk here. Software hasn’t changed in years, nor have config files, ergo conclusio: hardware failure. And, since the machine is old, only one conclusion: backup now that it’s still possible, replace.

robin_listas · December 16, 2013, 2:13pm

On 2013-12-16 04:27, Jim Henderson wrote:

> I usually find that hardware-related error messages are somewhat clearer
> in dmesg. Maybe that’s only me, though.

Don’t know, I have not compared both thoroughly. I assumed they contain
the same.

I prefer the logs because they are filtered and timestamped - well, a
timestamp that humans can read, anyway…

If you refer to the booting messages, that contain hardware related
info, previously to systemd they went to /var/log/boot.msg, and now they
go to /var/log/messages.

–
Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

sd_read · December 20, 2013, 5:17pm

Sorry for the delay but I needed time to see if I fixed it and I think I have.

The same day as my last post my server hung again only this time it would not turn on. That is no video, bios code, beeps nothing.

So, the good news was that the board is bad. But there is still good news in that I have another module I can install. You see these are ETX boards from a website

So I simply replaced it and reconfigured the Ethernet since it wasn’t happy that the mac address disappeared.

I guess I can’t complain as these have been running 24/7 (excluding vacations) since I think 2007 with only a few boot drive failures.

Although, they don’t get heavy use.

Thank you for your help - Steve

hendersj · December 20, 2013, 6:42pm

On Fri, 20 Dec 2013 16:26:01 +0000, sd read wrote:

> Although, they don’t get heavy use.

That usually isn’t the thing that causes a failure of this sort - being
powered is what wears out electrical components and increases the chance
of a failure.

So being on 24x7, from an electrical component standpoint, is heavy use,
regardless of how much the CPU/network utilization is over time.

Jim

–
Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

caprus · December 20, 2013, 10:37pm

Several years ago I got a call from an old customer who was ordering a replacement CPU fan. In the course of the conversation I heard him say: “I’m going to hate to have to power this machine down. It’s a Linux server that’s been running continuously since I installed it seven years ago, and it’s never even been rebooted. I wanted to see how long it could go, but now the fan’s so noisy it’ll fail soon if I don’t replace it.”