Headless system almost crashes

Hi all,

I have a big problem with my home router / server. The symptom is an alomst complete death. The other computers in the house can use the internet, i.e., the PC is not completely dead, but actually routes. However, it suspends all direct traffic: SSH becomes irresponsive, SMB shares disappear, and so on. It cuts traffic even with a VMWare guest it runs. I can ping it, but that’s about it.

The worrying thing is that it starts slowly. In the beginning it will start by cutting SMB pipes. It will then become very slow, to the point it won’t execute a “shutdown” command.

I use a P2P program on it, and I think it has to do with it.

One pattern I see in logs is:


Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Connection reset by peer
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 51 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 51 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 51 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:738(write_data)
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] lib/util_sock.c:1491(get_peer_addr_internal)
Jul 13 16:11:07 battlecruiser smbd[13036]:   getpeername failed. Error was Transport endpoint is not connected
Jul 13 16:11:07 battlecruiser smbd[13036]:   write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07,  0] smbd/process.c:62(srv_send_smb)
Jul 13 16:11:07 battlecruiser smbd[13036]:   Error writing 75 bytes to client. -1. (Transport endpoint is not connected)

Then I sometimes have flood on port X, “sending cookies”. That is certainly due to the P2P program, but cold a sustained flood affect communications on the virtual network adapter vmnet8?

And then it enters this catatonic state with no special mention on the log.

I have no idea what logs to look at.

TIA.

On Tue July 13 2010 03:56 pm, lucisandor wrote:

>
> Hi all,
>
> I have a big problem with my home router / server. The symptom is an
> alomst complete death. The other computers in the house can use the
> internet, i.e., the PC is not completely dead, but actually routes.
> However, it suspends all direct traffic: SSH becomes irresponsive, SMB
> shares disappear, and so on. It cuts traffic even with a VMWare guest it
> runs. I can ping it, but that’s about it.
>
> The worrying thing is that it starts slowly. In the beginning it will
> start by cutting SMB pipes. It will then become very slow, to the point
> it won’t execute a “shutdown” command.
>
> I use a P2P program on it, and I think it has to do with it.
>
> One pattern I see in logs is:
>
> Code:
> --------------------
>
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
lib/util_sock.c:738(write_data)
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
lib/util_sock.c:1491(get_peer_addr_internal)
> Jul 13 16:11:07 battlecruiser smbd[13036]: getpeername failed. Error was
Transport endpoint is not connected
> Jul 13 16:11:07 battlecruiser smbd[13036]: write_data: write failure in
writing to client 0.0.0.0. Error Connection reset by peer
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
smbd/process.c:62(srv_send_smb)
> Jul 13 16:11:07 battlecruiser smbd[13036]: Error writing 51 bytes to
client. -1. (Transport endpoint is not connected)
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
lib/util_sock.c:738(write_data)
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
lib/util_sock.c:1491(get_peer_addr_internal)
> Jul 13 16:11:07 battlecruiser smbd[13036]: getpeername failed. Error was
Transport endpoint is not connected
> Jul 13 16:11:07 battlecruiser smbd[13036]: write_data: write failure in
writing to client 0.0.0.0. Error Broken pipe
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
smbd/process.c:62(srv_send_smb)
> Jul 13 16:11:07 battlecruiser smbd[13036]: Error writing 51 bytes to
client. -1. (Transport endpoint is not connected)
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
lib/util_sock.c:738(write_data)
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
lib/util_sock.c:1491(get_peer_addr_internal)
> Jul 13 16:11:07 battlecruiser smbd[13036]: getpeername failed. Error was
Transport endpoint is not connected
> Jul 13 16:11:07 battlecruiser smbd[13036]: write_data: write failure in
writing to client 0.0.0.0. Error Broken pipe
> Jul 13 16:11:07 battlecruiser smbd[13036]: [2010/07/13 16:11:07, 0]
smbd/process.c:62(srv_send_smb)
> Jul 13 16:11:07 battlecruiser smbd[13036]: Error writing 51 bytes to
client. -1. (Transport endpoint is not connected)
<snip>
>
> Then I sometimes have flood on port X, “sending cookies”. That is
> certainly due to the P2P program, but cold a sustained flood affect
> communications on the virtual network adapter vmnet8?
>
> And then it enters this catatonic state with no special mention on the
> log.
>
> I have no idea what logs to look at.
>
> TIA.
>
lucisandor;

Most of the above smbd log entries are fairly normal except possibly:

Jul 13 16:11:07 battlecruiser smbd[13036]: write_data: write failure in
writing to client 0.0.0.0. Error Broken pipe

I’m not sure I’ve seen the “Error Broken pipe”.
When a Windows machine tries to connect, it will try both port 139 and 445.
As soon as it connects on one of these, it silently drops the other. This
leads to most of the errors you reported. These log entries are harmless,
but if you want to eliminate these pesky log entries add ONE of the following
to the [global] section of your /etc/samba/smb.conf:


smb ports = 139
smb ports = 445

If you have Win7 or Vista clients use the second, if all the clients are XP or
earlier choose the first. You will need to restart smbd to make these
effective.


su
rcsmb restart

It sounds like this is a home network and not for production, so I would
suggest you turn off the various services one at a time to find the offending
service.

Like you, I suspect the P2P, certainly a “flood” of any network traffic could
slow all the services to a snail’s pace. You should also check:
/var/log/messages
to see if any critical errors are being reported. If not already started you
should enable smartd, there could be a failing hard drive.

Good luck diagnosing this.


P. V.
“We’re all in this together, I’m pulling for you.” Red Green

system almost crashes

… so let’s almost fix it lol! Jokes aside, I have no idea yet what happens. But when it’s becoming slow I would run the command ‘top’ on the server in a console to see if something is hogging the CPU. The other possibility is that your line is flooded by the P2P application. If you feel adventurous you could play a bit with ‘wireshark’, an application which will show you exactly what happens on the outgoing line.

Don’t hesitate to come back with any additional evidence you may gather.

It sounds to me like the system is low on memory. With the main memory full, it falls back to swap, which slows everything way, way down. And then the swap fills, so each service that requests more memory crashes, leading to the services shutting down one by one. As each one shuts down, a bit of memory is freed, letting the system continue forward for a bit longer.

Check either ‘top’ or ‘more /proc/meminfo’. Either one will show you how much memory and how much swap is in use.

Hi all and thanks for your replies.

Unfortunately I run a VMWare server on this computer, and that is guaranteed to top everything else. :frowning:

I have only port 445 in smb.conf. The errors are not completely benign, because I do a lot of things, including heavy file management, on the VMWare Windows guest. These errors prevent almost anything succeeding on the guest, including longer downloads with a regular browser.

I will look into buying more RAM.

I hoped there is a logging option that can be changed so that I can get a better post mortem analysis.

On Wed July 21 2010 12:23 pm, lucisandor wrote:

>
> Hi all and thanks for your replies.
>
> Unfortunately I run a VMWare server on this computer, and that is
> guaranteed to top everything else. :frowning:
>
> I have only port 445 in smb.conf. The errors are not completely benign,
> because I do a lot of things, including heavy file management, on the
> VMWare Windows guest. These errors prevent almost anything succeeding on
> the guest, including longer downloads with a regular browser.
>
> I will look into buying more RAM.
>
> I hoped there is a logging option that can be changed so that I can get
> a better post mortem analysis.
>
>
lucisandor;

For Samba you can increase the detail of the logs by setting the following
parameter in /etc/samba/smb.conf.


log level = value

Where value is in [0,1,2,…,10]. The higher the value the more detailed the
log. Setting a log level greater than 3 will significantly slow the system
and logs are likely to rotate out before you find anything interesting.
Moreover, the information logged I would suggest a level of 3. You can
also set the size of the log file with:


max log size = value

value is the size in kilobytes.

See man smb.conf for details or here:
http://www.samba.org/samba/docs/man/manpages-3/smb.conf.5.html

P. V.
“We’re all in this together, I’m pulling for you.” Red Green