Networks stop working

Hello,
can’t find any help about this problem. I have OpenSuse 11.4, I use it as web server. And I have one smb share mounted. Recently one problem started to appear, after some time of online working (for few days or weeks) it goes offline, and the reason is that there were no network. I couldn’t leave system like that to make a more in depth diagnostics, so rebooting the system fixes the problem for some time.
I couldn’t find any suspicious log entries, only one “kernel: CIFS VFS: No response for cmd 50 mid 15130”.
System is running on hyper-v.

It seamed that it started to happen sooner and sooner. It might be that we are getting more traffic.

Any ideas?

Thanks.

On 10/13/2011 08:26 AM, aladin7 wrote:

> can’t find any help about this problem. I have OpenSuse 11.4, I use it
> as web server. And I have one smb share mounted. Recently one problem
> started to appear, after some time of online working (for few days or
> weeks) it goes offline, and the reason is that there were no network.

is this a rack mounted server in a hosted environment?
or, a privately owned machine at home (or in the office)?

if the latter: in what why is your server connected to the internet?
and, do you have a static IP assigned by your ISP? if not, it could be
the ISP bumps you off line periodically…they do that to encourage you
to pay extra for a static IP…

or there are hundreds of other potential reasons your network goes
down…some inside your machine, and lots out…

> I
> couldn’t leave system like that to make a more in depth diagnostics, so
> rebooting the system fixes the problem for some time.
> I couldn’t find any suspicious log entries, only one “kernel: CIFS
> VFS: No response for cmd 50 mid 15130”.

how have you set up your logging of the networking functions?
and, did they also show nothing?

> System is running on hyper-v.

maybe your hyper-v is set to call home, and when it does if it is told
(every few days or weeks) it has to be ‘updated’ and rebooted a few
times…few networks persist though such…

> Any ideas?

i’m not a networking guru…hopefully one will happen along and help you!!


DD
Caveat-Hardware-Software-
openSUSE®, the “German Automobiles” of operating systems

It is rack mounted server. The network problem persists only on openSuse virtual machine.
I haven’t setup any specific network logging, only what is already logged in /var/log.

On 2011-10-13 10:56, aladin7 wrote:
>
> It is rack mounted server. The network problem persists only on openSuse
> virtual machine.

Then what happens on the host, and what type of host and the virtualization
software used, is relevant.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

No solution yet, restarting network with “/etc/init.d/network restart” does not fix the network problem, though it seems that it connects to the network, because it gets the right ip address by dhcp, so the only solution is cron job, that checks network connectivity and restarts the server.

I don’t even know where to look for possible problems, any advice about that would be great :].

On 2011-10-21 07:16, aladin7 wrote:
> I don’t even know where to look for possible problems, any advice about
> that would be great :].

Well, last time I posted two questions.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Not specific to openSUSE,

In the past I’ve usually traced this type of problem to

  • The NIC “goes to sleep” because of various system problems like power management. Check APCI, openSUSE power management, BIOS settings
  • The NIC “goes to sleep” because of lack of traffic. If for some reason the network thinks your machine has gone offline, the network may no longer maintain records for you. Typical ways to address this include ARP, pinging your machine regularly during slow periods.

Since this is a VM running on hyper-v, you’ll need to do these checks on both your Host and Guest. In fact, I’d highly recommend you disable NIC “autosense” on your Windows so that if you have intermittent connectivity related to your NIC hardware issues your NIC won’t disable itself.

HTH,
Tony

It’s not clear if the problem pertains to Samba, Apache (or whatever web server you’re using), or the whole network. Next time it happens you should run this command to see where the fault lies:

su -c "rcnmb status; rcsmb status; rcapche2 status; rcnetwork status"

This will allow better focus on the culprit.

After running those commands I got the same output when every thing is OK and when it does not works.


Checking for Samba NMB daemon                                        unused
Checking for Samba SMB daemon                                        unused
bash: rcapche2: command not found
Checking optional network interfaces:
    eth0      name: Virtual Ethernet Card 0
    eth0      IP address: 192.168.7.2/24
    eth0                                                             running
    eth1      device: Digital Equipment Corporation DECchip 21140 [FasterNet] (rev 20)
              No configuration found for eth1
    eth1                                                             unused
Checking mandatory network interfaces:
    lo
    lo        IP address: 127.0.0.1/8
    secondary lo IP address: 127.0.0.2/8
    lo                                                               running
Checking service network .  .  .  .  .  .  .  .  .  .  .             running

It looks like it starts to stop working more frequently. There are total of 3 virtual machines on that server and only this one has that problem, so it must be Suse problem.

And there is one more problem, after my script that checks connectivity reboots system, it stop at grub menu, so I have to press enter manually to load SuSe, that sucks!

The response to “rcapache2 status” shows you are not using Apache as a web server, so what web server are you using?
The responses to “rcnmb status” and “rcsmb status” shows that Samba is not on. Interestingly, sisnce you said the response is the same whether the network is working or not working, it follows that you have never had Samba working from the get go.

It’s really hard to divine exactly what your problem is. We don’t know what your web server is and you seem to be not using samba by design, so can you tell us exactly what you mean by “it goes offline”.

So, in summary, there are four question regarding the server that would help us to define your problem:

  1. I think you have nothing connected to eth1, is that right?
  2. Do you have samba switched on (look in yast –> runlevels –> nmb and smb)
  3. What webserver are you using
  4. What exactly is the manifestation of “it goes offline”

PS there’s a Samba/AppArmor bug in 11.4 that could well be throwing an extra spanner into this mix. Do this to fix it: go to Yast → Apparmor and enter the Control Panel → configure profiles area. Highlight usr.sbin.smbd and use the ToggleMode button to flip it to “complain” Similarly flip usr.sbin.nmbd to “complain”. Click Done to exit.

Thanks for the reply. Sorry that I am not too clear about my situation.

The eth1 is set up on hyper-v the same way as eth0, but it just doesn’t work, and no setting up done on suse for that interface. So it is not used, I could remove it if you think there might be something with that.

Samba is switched off, I basically manually mount two windows shares and i don’t use any other samba functionality.

I am using apache2, I am not sure why it didn’t work that time.

rcapache2 status
Checking for httpd2:                                                 running

The problem is this. I cannot ping the server and from server I can not ping anything. The only solution that worked for me is /etc/init.d/reboot

p.s. I changed usr.sbin.smbd to “complain”.

I don’t think eth1 is the culprit, but you could remove it and see if it makes any difference.
And separately: next time the failure occurs, run this command as root and see if eth0 has switched off: ifconfig eth0

Same problem here, opensuse 11.4 and vmware vsphere 4.1. The server goes to sleep and then there is no network. network status says it’s running but you cannot even ping it. But if I ping 8.8.8.8 or its gateway then it works.

I’ve tried to set “Low Latency computing” in pm-profile and change from “kernel-desktop” to “kernel-default” but it happens again. The only temporary solution I found is a cron job with a ping every minute.

So the problem still exists. Sometimes it works for several days, sometimes it fails every day. Today I noticed one thing, that when it stops working (no network connection possible) and if I do network restart the network starts to work for about 5 pings and no network again. ifconfig eth0 looks like everything is ok.
no messages in /var/log/messages can some how set more logging or may be there are other log files that I could look in to?

I had this on 11.4 on hyper-v network loss related to traffic volume

What is your output of uname -a ?

The below is a very important stability fix for hv_vmbus

[PATCH 02/28] staging: hv: use sync_bitops when interacting with the hypervisor](http://driverdev.linuxdriverproject.org/pipermail/devel/2011-April/014864.html)

Backported and included in openSUSE 11.4 2.6.37.6-8 and beyond

Mike