I have a couple of servers based on 10.1, one is a main server the other does nightly backups and provides printing services and some other things. These are home use. They are many years old (since 10.1 was released) but they run 24/7 flawlessly other than a few drive crashes over the years.
To answer the first question I am sure many are thinking, I haven’t upgraded because when they were new (I replaced an NT4.0 server at the time) I was just happy to get them running then later on they worked as needed so why bother. Today I would like to for security reasons but now this hardware is too old and incompatible.
Also, I have been running a home server for probably 15 years (Win 95) and the only time I have ever lost data was when I migrated from Microsoft to Linux.
So, I am having a problem with the main server. That is it will just suddenly disappear (hang) and at this point I don’t know why. Nothing has changed either with the servers or anything else around them.
Over the last week this has happened 3 times where the only choice is to power cycle. Something is clearly wrong as they never hang.
Here is the log file of the most recent showing the last few entries before the hang:
Dec 12 17:49:56 SambaServer smbd[12549]: read_data: read failure for 4 bytes to client 192.168.0.51. Error = No route to host
Dec 12 18:01:23 SambaServer syslog-ng[2505]: STATS: dropped 0
Dec 12 18:21:07 SambaServer zmd: NetworkManagerModule (WARN): Failed to connect to NetworkManager
Dec 12 18:21:39 SambaServer zmd: Daemon (WARN): Not starting remote web server
Dec 12 18:26:06 SambaServer zmd: ShutdownManager (WARN): Preparing to sleep...
Dec 12 18:26:08 SambaServer zmd: ShutdownManager (WARN): Going to sleep, waking up at 12/13/2013 18:11:06
My question is, what can I do next to determine what is happening?
As it is unlikely that anything changed in the software (simply because there are no updates for years alrready), IMHO it is something in the hardware. And because it seems to happen intermittently, it will be difficult to find. But the messages point to the network. Network card trying to die?
If you are referring to the “Failed to connect to NetworkManager” I think this is normal as both servers are identical and I get this same error all through the log files on both servers.
Regarding the dmesg file I am not sure where to find this log file. It is not in /var/log nor do I see anything in yast?
Typing dmesg in a terminal window shows a bunch of stuff but nothing that sounds like a problem.
Without knowing where the log file is I guess I will run it right after rebooting from the next hang.
Here is a sample from dmesg in a terminal (obviously the server is not hung at this point):
On Sun, 15 Dec 2013 21:06:02 +0000, sd read wrote:
> Here is a sample from dmesg in a terminal (obviously the server is not
> hung at this point):
All that’s showing is iptables information, not anything useful. You
may need to tweak the log settings to disable this. There are bound to
be useful messages in there somewhere, but the iptables info is drowning
it out.
On 2013-12-16 00:21, Jim Henderson wrote:
> On Sun, 15 Dec 2013 21:06:02 +0000, sd read wrote:
>
>> Here is a sample from dmesg in a terminal (obviously the server is not
>> hung at this point):
>
> All that’s showing is iptables information, not anything useful. You
> may need to tweak the log settings to disable this. There are bound to
> be useful messages in there somewhere, but the iptables info is drowning
> it out.
Instead of looking at dmesg, look at the contents of
“/var/log/messages”, it should not have the iptables messages, which
instead should go to “/var/log/firewall”
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
Hi Jim, I don’t know how to tweak the log settings as I don’t even know where to find dmesg. What I posted was the results of typing dmesg in a terminal.
I looked through yast and googled this but was not able to figure out how I would change the log settings.
Carlos, sorry I wasn’t clear in my initial post but what I pasted there is from /var/log/messages. It is what I believe what was last recorded before it hung.
> Carlos, sorry I wasn’t clear in my initial post but what I pasted there
> is from /var/log/messages. It is what I believe what was last recorded
> before it hung.
Oh. Well, you can send iptables messages to the firewall log instead,
but your machine being 10.1 I don’t know what syslog daemon you use.
Knowing that, I might find the rule to add.
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
On Sun, 15 Dec 2013 23:53:06 +0000, Carlos E. R. wrote:
> On 2013-12-16 00:21, Jim Henderson wrote:
>> On Sun, 15 Dec 2013 21:06:02 +0000, sd read wrote:
>>
>>> Here is a sample from dmesg in a terminal (obviously the server is not
>>> hung at this point):
>>
>> All that’s showing is iptables information, not anything useful. You
>> may need to tweak the log settings to disable this. There are bound to
>> be useful messages in there somewhere, but the iptables info is
>> drowning it out.
>
> Instead of looking at dmesg, look at the contents of
> “/var/log/messages”, it should not have the iptables messages, which
> instead should go to “/var/log/firewall”
I usually find that hardware-related error messages are somewhat clearer
in dmesg. Maybe that’s only me, though.
On Mon, 16 Dec 2013 00:26:01 +0000, sd read wrote:
> Hi Jim, I don’t know how to tweak the log settings as I don’t even know
> where to find dmesg. What I posted was the results of typing dmesg in a
> terminal.
>
> I looked through yast and googled this but was not able to figure out
> how I would change the log settings.
>
> Carlos, sorry I wasn’t clear in my initial post but what I pasted there
> is from /var/log/messages. It is what I believe what was last recorded
> before it hung.
Check in YaST to see if the magic sysrq keys are enabled - if they’re
not, enable them - then when it hangs, hit:
SysRq+S (Forces the kernel to sync the disks)
SysRq+U (Forces the kernel to remount the partitions read-only)
SysRq+B (causes the system to reboot)
That should get the last of what is in the logs flushed to disk, might
give us more to go on.
10.1 is a long ways out of support, so it’s going to be somewhat
difficult to point you precisely where you need to make those settings -
on 13.1, it’s at YaST -> Kernel Settings; there’s a checkbox to enable
the keys. You can also enable them from the terminal with:
I’m with Henk here. Software hasn’t changed in years, nor have config files, ergo conclusio: hardware failure. And, since the machine is old, only one conclusion: backup now that it’s still possible, replace.
> I usually find that hardware-related error messages are somewhat clearer
> in dmesg. Maybe that’s only me, though.
Don’t know, I have not compared both thoroughly. I assumed they contain
the same.
I prefer the logs because they are filtered and timestamped - well, a
timestamp that humans can read, anyway…
If you refer to the booting messages, that contain hardware related
info, previously to systemd they went to /var/log/boot.msg, and now they
go to /var/log/messages.
–
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)
Sorry for the delay but I needed time to see if I fixed it and I think I have.
The same day as my last post my server hung again only this time it would not turn on. That is no video, bios code, beeps nothing.
So, the good news was that the board is bad. But there is still good news in that I have another module I can install. You see these are ETX boards from a website
So I simply replaced it and reconfigured the Ethernet since it wasn’t happy that the mac address disappeared.
I guess I can’t complain as these have been running 24/7 (excluding vacations) since I think 2007 with only a few boot drive failures.
On Fri, 20 Dec 2013 16:26:01 +0000, sd read wrote:
> Although, they don’t get heavy use.
That usually isn’t the thing that causes a failure of this sort - being
powered is what wears out electrical components and increases the chance
of a failure.
So being on 24x7, from an electrical component standpoint, is heavy use,
regardless of how much the CPU/network utilization is over time.
Several years ago I got a call from an old customer who was ordering a replacement CPU fan. In the course of the conversation I heard him say: “I’m going to hate to have to power this machine down. It’s a Linux server that’s been running continuously since I installed it seven years ago, and it’s never even been rebooted. I wanted to see how long it could go, but now the fan’s so noisy it’ll fail soon if I don’t replace it.”