suse hangup without reason

Hi all,

I am using SUSE for long but now I have started to have problems with it.

My opensuse is always running, the X is also running and its idle overnight, but something has started to happen to it and this has
happened at least 4 times over the last one month

The problem is when i try to login using exceed it doesn’t work and if i do login over ssh i can login ok but can’t run binaries and do ls.

Also when i go to the actual system, it shows the shell on tty1 whereas i think i had the x server running

when i do login using root
> user login: root

this comes up and it keeps on doing this infinitely.

init: Id “1” respawning too fast: disabled for 5 minutes."
init: Id “1” respawning too fast: disabled for 5 minutes."

The only way out of is doing reboot usign the machine’s restart button.

What could be the problem, anybody can suggest a way out of this?

Best regards,
rui

Just an addition to the already mentioned problem
I just had a look on this and there is no x running on tty7, also if i do ctrl+alt+f10.

I see message like this coming on the screen in loop
rejecting i/o to offline device
and also something related to /var/lib/samba failing

rui123 wrote:
>
> What could be the problem, anybody can suggest a way out of this?

Study the logs to see the reason for X crashing.

Which log, i have analyzed /var/log/messages
/var/log/warn etc but nothing particular is showing

This is what i have in messages file, nothing special logged and the problem came on 24th.

Apr 22 11:52:06 m4msuse01 smbd[11908]: getpeername failed. Error was Transport endpoint is not connected
Apr 22 11:52:06 m4msuse01 smbd[11908]: [2009/04/22 11:52:06, 0] lib/util_sock.c:write_socket_data(413)
Apr 22 11:52:06 m4msuse01 smbd[11908]: write_socket_data: write failure. Error = Connection reset by peer
Apr 22 11:52:06 m4msuse01 smbd[11908]: [2009/04/22 11:52:06, 0] lib/util_sock.c:write_socket(438)
Apr 22 11:52:06 m4msuse01 smbd[11908]: write_socket: Error writing 4 bytes to socket 22: ERRNO = Connection reset by peer
Apr 22 11:52:06 m4msuse01 smbd[11908]: [2009/04/22 11:52:06, 0] lib/util_sock.c:send_smb(630)
Apr 22 11:52:06 m4msuse01 smbd[11908]: Error writing 4 bytes to client. -1. (Connection reset by peer)
Apr 22 11:59:00 m4msuse01 /USR/SBIN/CRON[12276]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Apr 22 12:22:07 m4msuse01 – MARK –
Apr 22 12:29:33 m4msuse01 sshd[15948]: Accepted publickey for ctm from ::ffff:10.9.1.50 port 42674 ssh2
Apr 22 12:37:54 m4msuse01 su: (to root) ctm on /dev/pts/6
Apr 22 12:37:54 m4msuse01 su: pam_unix2: session started for user root, service su
Apr 22 12:59:00 m4msuse01 /USR/SBIN/CRON[17174]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Apr 22 13:03:07 m4msuse01 sshd[17359]: Accepted keyboard-interactive/pam for m4mbuild from ::ffff:10.9.1.87 port 4630 ssh2
Apr 22 13:15:10 m4msuse01 smbd[18781]: [2009/04/22 13:15:10, 0] lib/util_sock.c:get_peer_addr(978)
Apr 22 13:15:10 m4msuse01 smbd[18781]: getpeername failed. Error was Transport endpoint is not connected
Apr 22 13:15:10 m4msuse01 smbd[18781]: [2009/04/22 13:15:10, 0] lib/util_sock.c:write_socket_data(413)
Apr 22 13:15:10 m4msuse01 smbd[18781]: write_socket_data: write failure. Error = Connection reset by peer
Apr 22 13:15:10 m4msuse01 smbd[18781]: [2009/04/22 13:15:10, 0] lib/util_sock.c:write_socket(438)
Apr 22 13:15:10 m4msuse01 smbd[18781]: write_socket: Error writing 4 bytes to socket 5: ERRNO = Connection reset by peer
Apr 22 13:15:10 m4msuse01 smbd[18781]: [2009/04/22 13:15:10, 0] lib/util_sock.c:send_smb(630)
Apr 22 13:15:10 m4msuse01 smbd[18781]: Error writing 4 bytes to client. -1. (Connection reset by peer)

rui123 wrote:
> Which log, i have analyzed /var/log/messages
> /var/log/warn etc but nothing particular is showing
>
>
> This is what i have in messages file, nothing special logged and the
> problem came on 24th.

Those messages are all Samba related.

When the problem occurs, you should look at dmesg (if you can).

You will be most likely to capture the reason for the failure by using
netconsole, but that is a pain to set up. It requires a network connection to
another Linux machine. Do you meet that requirement?

Hi,

Just to be on the safe side, i took the dmesg output into a file which is this
dmesg pastebin

This was when the system was in the strange state i mentioned. I have now rebooted the system and it working fine but the problem can come back anytime to bite.

From the log the only thing i see is:
ReiserFS: sda2: There were 2 uncompleted unlinks/truncates. Completed

The root is mounted at sda2, by the way.

Can you tell me what to do regarding this as i fear if the problem continues the system might be unusable or completly crash, for that matter.

Best regards,
rui

Still waiting for the solution to this problem.
Iwfinger or anybody …

Definitely looks like some piece of hardware is failing. If this is a production server, I’d see that some kind of backup is ready to replace it. ReiserFS is good, I still use it as well on some disks, but nessages like these usually are indications that something is going terribly wrong.