IPV6 failing after 4 hours on OpenSuse 15.6 server at OVH

@arvidjaar Thanks for bearing with me on this, I’m very sorry for the confusion.

So I’m dealing with a physical machine, running OpenSuse Leap 15.6 in Xen Hypervisor mode. It’s primary interface is br0. It is configured with IPV4 and IPV6, and it is experiencing the loss of IPV6 access. This loss of access happens whether there are guests running or not. So I’ve been focusing on just debugging the host, which is why my output has been using br0.

When guests are running, they experience the same thing. The moment the host loses IPV6 access, so do the guests. However, after the access is lost, the guests and the host can still communicate with each other over IPV6… and the guests and the host can still ping fe80::1.

Despite my efforts at sysctl tweaking, IPV6 access again got lost last night. So, right now, I have the host and two guests running on the server. All are accessible to the entire world via IPV4. Over IPV6, the guests can communicate with each other, and with the host, and vice versa. None of them can reach, or be reached by, the outside world… except fe80::1.

All of them can still ping fe80::1, which is the OVH-controlled upstream router. So, for example, from the host:

# ping6 fe80::1%br0
PING fe80::1%br0(fe80::1%br0) 56 data bytes
64 bytes from fe80::1%br0: icmp_seq=1 ttl=64 time=0.258 ms
64 bytes from fe80::1%br0: icmp_seq=2 ttl=64 time=0.203 ms
64 bytes from fe80::1%br0: icmp_seq=3 ttl=64 time=0.203 ms

and from the guests:

# ping6 fe80::1%eth0
PING fe80::1%eth0(fe80::1%eth0) 56 data bytes
64 bytes from fe80::1%eth0: icmp_seq=1 ttl=64 time=0.350 ms
64 bytes from fe80::1%eth0: icmp_seq=2 ttl=64 time=0.299 ms
64 bytes from fe80::1%eth0: icmp_seq=3 ttl=64 time=0.269 ms

The routing tables are still in place. On the host:

# ip -6 r
2607:5000:1:1::/56 dev br0 proto kernel metric 256 pref medium
fe80::/64 dev vif2.0 proto kernel metric 256 pref medium
fe80::/64 dev vif3.0 proto kernel metric 256 pref medium
default via fe80::1 dev br0 metric 1024 onlink pref medium

And on the guest:

# ip -6 r
2607:5000:1:1::/56 dev eth0 proto kernel metric 256 pref medium
default via fe80::1 dev eth0 metric 1024 onlink pref medium

And the neighbor tables from the host:

# ip -6 n
fe80::560f:2cff:fe4d:850d dev br0 lladdr 54:0f:2c:4d:85:0d router STALE
fe80::560f:2cff:fe50:de05 dev br0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router STALE
2607:5000:1:1::b dev br0 lladdr 02:00:01:34:09:71 STALE
2607:5000:1:1::a dev br0 lladdr 02:00:01:34:09:70 STALE
fe80::1ff:fe34:971 dev br0 lladdr 02:00:01:34:09:71 STALE

and the guest:

# ip -6 n
fe80::560f:2cff:fe4d:850d dev eth0 lladdr 54:0f:2c:4d:85:0d router STALE
2607:5000:1:1::5 dev eth0 lladdr c4:70:bd:89:6b:20 STALE
fe80::1 dev eth0 lladdr fe:ed:de:ad:be:ef router STALE
fe80::560f:2cff:fe50:de05 dev eth0 lladdr 54:0f:2c:50:de:05 router STALE

As I’ve typed this, the neighbor tables do not seem to be changing. Nothing is in DELAY or FAILED, everything on both host and guest seems stuck on STALE. In fact, it’s almost like the neighbor tables are just frozen. Yet, IPV6 itself is still up. From the article you linked:

watch -n0.3 ip -6 neigh show

Nothing seems to be changing at all. This is true for both host and guest. So something is “shut down” somewhere.

Now, working again from that article, I will follow the steps to try to unstick the host:

# ip -6 n flush dev br0

# ip -6 n

# ping6 fe80::1%br0
PING fe80::1%br0(fe80::1%br0) 56 data bytes
64 bytes from fe80::1%br0: icmp_seq=1 ttl=64 time=0.482 ms
64 bytes from fe80::1%br0: icmp_seq=2 ttl=64 time=0.209 ms
64 bytes from fe80::1%br0: icmp_seq=3 ttl=64 time=0.234 ms

# ping6 2607:5000:1:1::a
PING 2607:5000:1:1::a(2607:5000:1:1::a) 56 data bytes
64 bytes from 2607:5000:1:1::a: icmp_seq=1 ttl=64 time=0.424 ms
64 bytes from 2607:5000:1:1::a: icmp_seq=2 ttl=64 time=0.120 ms
64 bytes from 2607:5000:1:1::a: icmp_seq=3 ttl=64 time=0.156 ms

This does seem to wake up the neighbor table, sort of:

# ip -6 n
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router STALE
fe80::1ff:fe34:970 dev br0 lladdr 02:00:01:34:09:70 REACHABLE
2607:5000:1:1::a dev br0 lladdr 02:00:01:34:09:70 REACHABLE

I note that subsequent runs of ip -6 n now show the fe80::1 as being REACHABLE at times - the table is “awake” again…

Every 0.3s: ip -6 neigh show                                ovh0: Fri Apr 18 09:03:41 2025

fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router REACHABLE
fe80::1ff:fe34:970 dev br0 lladdr 02:00:01:34:09:70 STALE

However, it does not restore IPV6 service to the host:

# ping6 cnn.com
PING cnn.com(2a04:4e42:e00::773 (2a04:4e42:e00::773)) 56 data bytes
^C
--- cnn.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1006ms

I will next try the same thing on the guest. On the guest, during my changes to the host, the guest state did not change, the neighbor table remains on STALE for everything as shown above. So I use the same process:

# ip -6 n flush dev eth0

# ip -6 n

# ping6 fe80::1%eth0
PING fe80::1%eth0(fe80::1%eth0) 56 data bytes
64 bytes from fe80::1%eth0: icmp_seq=1 ttl=64 time=0.570 ms
64 bytes from fe80::1%eth0: icmp_seq=2 ttl=64 time=0.264 ms
64 bytes from fe80::1%eth0: icmp_seq=3 ttl=64 time=0.209 ms

# ping6 2607:5000:1:1::5
PING 2607:5000:1:1::5(2607:5000:1:1::5) 56 data bytes
64 bytes from 2607:5000:1:1::5: icmp_seq=1 ttl=64 time=0.452 ms
64 bytes from 2607:5000:1:1::5: icmp_seq=2 ttl=64 time=0.176 ms
64 bytes from 2607:5000:1:1::5: icmp_seq=3 ttl=64 time=0.190 ms

This does restore the neighbor table as well:

# ip -6 n
2607:5000:1:1::5 dev eth0 lladdr c4:70:bd:89:6b:20 REACHABLE
fe80::1 dev eth0 lladdr fe:ed:de:ad:be:ef router REACHABLE
fe80::c001:85ff:fe35:4c40 dev eth0 lladdr c4:70:bd:89:6b:20 REACHABLE

But not actual IPV6 service through the host to the outside world.

# ping6 cnn.com
PING cnn.com(2a04:4e42:800::773 (2a04:4e42:800::773)) 56 data bytes
^C
--- cnn.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2047ms

The guests seem dependent upon the host, in that no matter when I start the guests, whether at host boot, or 5 minutes before the host happens to fail, they all fail at the same moment the host does. If I completely shut down and reboot a guest with the host in failed state, IPV6 service does not work for the guest at all.

The only thing that fixes the problem - so far - is a full reboot of the host - and that only fixes it for 1-4 hours, until the fail happens again.

I hope this answers, please let me know, and thank you again for reading!