Cannot Ping Remote Host After Network Interruption Without Restarting Firewall

Hello all, I have done plenty of searching and am unable to resolve this problem so I am hoping I may find help here. I am running OpenSuse 13.1 and using it as a host monitoring system on our network; in the most basic sense if it can’t ping a remote branch router it will then trigger an e-mail alert to our IT department so we are aware it is offline. The host also has two NICS, one of which runs in promiscuous mode monitoring internet traffic on a switch mirror port with Ntopng.

            I have a very strange issue when a remote branch router goes offline due to a network outage and returns to an online state it still cannot be pinged from the OpenSuse server.  I can ping any other host on the remote subnet except the router from the OpenSuse server and I can ping the router from any other host on the same subnet as the OpenSuse server.  I am not sure if this is due to some form of IP redirect/routing issue but it only affects the OpenSuse server and no other machines.  I did discover that stopping and starting the firewall on OpenSuse resolves the erroneous state but it does not prevent it from reoccurring.  Does anybody have any insight on such an issue?  Forgive me if I left out any other pertinent information.

My first reaction is that there is probably a “discovery” problem.
So, for instance…
It can make a diff whether you’re pinging the remote machine (the router) by name or IP address.
Is this a router a managed device? Is there an app that “knows” this device and where is the app located, is it part of overall network security?
Looking at how your openSUSE is setup, is it different than your other machines, eg Are your other machines part of a Network Security system like LDAP or AD while your openSUSE is not?

There may be other possible questions, the above all are similar in that they relate to how machines in your network are “found.”

If you deploy Wireshark and analyze packets, you may also get better insight into why and where your openSUSE is failing in finding the remote router.

HTH,
TSU

Thanks for the response Tsu2.

I am pinging by IP address.

The router is a Cisco 2811 managed router.

I am not sure I understand your question about an “app” that “knows” the device.

To answer your question about AD and LDAP I can ping the same host after a network outage from another OpenSuse box on the same subnet as the problematic OpenSuse installation, neither of which are part of Active Directory or LDAP.

I will try to get a tcpdump capture when this occurs and analyze it with Wireshark but it is clearly difficult to anticipate an outage.

Is your other machine running the same openSUSE version, as the one giving you this problem? I ask because I recall this old thread, where it was mentioned that the ARP cache behaviour seems to have changed. I’m not completely sure about the implications of this, but perhaps that is what is affecting you.

The other machine is the same version. I will look at the ARP issue but I don’t believe that it is related because the host that I am pinging is not on the same subnet as the OpenSuse server so it would not be in the ARP cache anyway.

Hard to get a handle on…some questions:

Is there a chance that this particular machine is assigned a duplicate IP address? (You haven’t told us if the addresses are statically or dynamically assigned).
Are you pinging by hostname or IP address?
Can you provide tracroute output?

No duplicate IP and using only static IP addresses on all hosts.
Pinging by IP address only.
I cannot provide traceroute until it is broken again. Here is my ifconfig output and routing table though, not sure if the auto-assigned 169.254 address on ens192:av is of any relevance or not.

ens160    Link encap:Ethernet  HWaddr 00:50:56:93:5B:8C          inet addr:10.1.2.99  Bcast:10.1.255.255  Mask:255.255.0.0
          inet6 addr: fe80::250:56ff:fe93:5b8c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6720235 errors:0 dropped:3442 overruns:0 frame:0
          TX packets:3709952 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3491068182 (3329.3 Mb)  TX bytes:3210855222 (3062.1 Mb)


ens192    Link encap:Ethernet  HWaddr 00:50:56:93:38:5D
          inet6 addr: fe80::250:56ff:fe93:385d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:33492316 errors:0 dropped:707 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:15253933532 (14547.2 Mb)  TX bytes:8245 (8.0 Kb)


ens192:av Link encap:Ethernet  HWaddr 00:50:56:93:38:5D
          inet addr:169.254.6.103  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1


lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:12807199 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12807199 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1609701149 (1535.1 Mb)  TX bytes:1609701149 (1535.1 Mb)


linux-nor8:/etc # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.1.100.1      0.0.0.0         UG    0      0        0 ens160
default         *               0.0.0.0         U     1003   0        0 ens192
10.1.0.0        *               255.255.0.0     U     0      0        0 ens160
loopback        *               255.0.0.0       U     0      0        0 lo
link-local      *               255.255.0.0     U     0      0        0 ens192




Well, the 169.254.x.y subnet is reserved for Automatic Private IP Addressing (APIA). (I could be wrong, but I think openSUSE self-assigns when no IP address is assigned eg by a DHCP server). Anyway, it’s a non-routable link-local address.

http://en.wikipedia.org/wiki/Reserved_IP_addresses

That was my understanding as well but I figured at this point anything could be relevant. The good news is it broke again over the weekend so I can provide the previously requested traceroute. I also performed a packet capture but only see the echo request and nothing else useful. I turned up logging sensitivity on the firewall and it restarted in order to apply the settings automagically fixing the dead host issue in the process, so it definitely has something to do with the firewall just not sure what.

traceroute to 192.168.12.1 (192.168.12.1), 30 hops max, 60 byte packets 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
linux-nor8:/etc #



Well if you think it is a firewall problem then enable firewall logging and look at the logs :slight_smile: Here’s a link to a post on how to enable firewall logging via YaST :
https://forums.opensuse.org/showthread.php/456671-Firewall-logs-are-in-var-log-firewall-warn-messages-clutter?p=2312008#post2312008

Still thinking about possibilities but don’t think that it is definitely the firewall,
Could be a dependency (like a network restart).

So, for instance just throwing mud at the wall…
If it is actually an ARP cache issue, then restarting network services could have re-invoked an arp from your machine. Unlike Windows machines which arp periodically and continuously (creating all sorts of traffic “noise”), Linux by default only arps once, don’t remember for sure if on boot (unlikely) or on network service start (likely). This behavior can of course be modified so that your openSUSE can be just as noisy on the network as any Windows box… :slight_smile:

So, just sayin…
Keep possibilities open…

TSU

That is exactly what I did, just waiting for it to break again so I can poke around the logs. Thanks for the helpful link.

Ok, it broke yesterday. I see an ICMP redirect being sent from our core router in the firewall log:

[2211933.555402] SFW2-IN-ACC-REL IN=ens160 OUT= MAC=0A:50:56:93:5b:8D:ac:f2:c5:49:61:0F:08:00 SRC=10.1.10
0.1 DST=10.1.2.99 LEN=56 TOS=0x00 PREC=0x00 TTL=255 ID=9330 PROTO=ICMP TYPE=5 CODE=0 GATEWAY=10.1.1.5 [SRC=10.1.2.99 DST=192.168.12.1 LEN=74 TOS=0x00 PREC=0x
00 TTL=63 ID=13348 DF PROTO=UDP SPT=47355 DPT=161 LEN=54 ]

I don’t know if it is wise or not but I disabled ICMP redirects according to http://linuxpoison.blogspot.com/2010/01/how-to-disable-icmp-redirects-in-linux.html. I will see if that remedies the problem.

In the event anyone else experiences this issue disabling ICMP redirects fixed my problem. As to whether or not it will cause other problems is yet to be seen.

No joy, disabling ICMP redirects does not fix the problem.