IPV6 failing after 4 hours on OpenSuse 15.6 server at OVH

I’ve been running OpenSuse in lots of environments for many years, and IPV4 and IPV6 have always worked well. A few of my servers are at OVH locations, and although OVH has a rather strange way of dealing with networking, I’ve still had stable IPV4 and IPV6 service there.

Two weeks ago I leased a new server at a new OVH data center in Toronto. Their way of dealing with networking (what they refer to as their “version 3”), is different even than the other OVH data centers. At this location, on this server, I am encountering a problem, which is that IPV6 stops working a few hours after a reboot, and nothing short of a full reboot can bring it back again.

Server Hardware: OVH “Scale-i1” server
Operating System: OpenSuse Leap 15.6, freshly loaded, server configuration
Network Engine: wicked

IPV4 SETUP (Works)

On the IPV4 side, they use basically /32 netmasks for everything. I’ll use example addresses here for the time being, but the server has:

IPV4 address (sample): 1.2.3.4/32
IPV4 gateway (sample): 5.6.7.8/32

So to make this work, I use /etc/sysconfig/network files as shown:

ifcfg-br0:
IPADDR=‘1.2.3.4/32’

ifroute-br0:
5.6.7.0/24 - - br0
5.6.7.8/32 - - br0

routes:
default 5.6.7.8 - br0

Although this feels ugly to me, it works, and IPV4 is online and fine.

However, with IPV6, it’s even more problematic.

IPV6 SETUP (Fails)

On the IPV6 side, they issue a /56 netblock, but the gateway is just bizarre:

IPV6 netblock (sample): 2607:5000:1:1::/56
IPV6 address (sample): 2607:5000:1:1::5/56
IPV6 gateway (ACTUAL): fe80::1/128

Yes, that is the actual default gateway I’m being told to use.

Some of you may state that using a link-local address, fe80::1, as a default gateway goes against the RFCs, or is otherwise technically problematic. I thought so as well. However, there are articles posted online that claim that using fe80::1 as a default route is perfectly legitimate. I have no idea. I initially thought that my IPV6 block had just not been provisioned correctly, and pushed back on OVH, but they insisted that it is.

Some of you may point out that this is not what the posted OVH documentation says. You would be correct. In their response to my ticket, OVH support wrote, in part:

–snip–
IPV6 gateway: fe80:0000:0000:0000:0000:0000:0000:0001
The information you see on your control panel is not a mistake, and it is in fact the same for all of our 3rd generation Advance, scales and High grade servers.
–snip–

They have also acknowledged that there is, as of yet, no published documentation on this configuration anywhere. But this is how they are rolling now, at least at their new data center, so it’s what I’m forced into.

They have essentially stated that the following three commands should get IPV6 working:

ip addr add 2607:5000:1:1::5/56 dev br0
ip -6 route add fe80:0000:0000:0000:0000:0000:0000:0001 dev br0
ip -6 route add default via fe80:0000:0000:0000:0000:0000:0000:0001 dev br0

So, to make this work, I disabled autoconf and accept_ra in sysctl as per OVH recommendations, and went with the following in my /etc/sysconfig/network:

ifcfg-br0:
IPADDR_1=‘2607:5000:1:1::5/56’

ifroute-br0:
fe80::1/128 - - br0

routes:
default fe80::1 - br0

Although this also feels crazy to me, it does actually work, and IPV6 service does come up and is functional.

However, and this is the issue, after about 4 hours, IPV6 service halts. Nothing short of a physical server reboot will restore it.

systemctl restart network
systemctl restart wickedd
flushing the routes and rebuilding
removing the address and re-adding

None of it works. The routing tables are unchanged, and look correct; however, running the mtr command as they ask:

mtr -6 -r -c 10 some:outside:ipv6::address

produces only the headers, and no further output. That same command run from the outside world against my address results in a loss - somewhere - nearby:

mtr -6 -r -c 10 2607:5000:1:1::5
Start: 2025-04-15T19:52:58-0700
HOST: ovh Loss% Snt Last Avg Best Wrst StDev
1.|-- 2603:5000:2:2bff:ff:ff: 0.0% 10 0.7 0.9 0.7 1.2 0.1
2.|-- 2001:41d0:0:50::2:5348 0.0% 10 1.0 1.1 1.0 1.2 0.1
3.|-- 2001:41d0:0:50::6:892 0.0% 10 0.3 0.3 0.3 0.4 0.0
4.|-- be100-100.bhs-g2-nc5.qc.c 0.0% 10 1.1 0.9 0.8 1.1 0.1
5.|-- be101.yto-tr1-sbb2-8k.on. 0.0% 10 8.5 9.6 8.5 11.4 0.9
6.|-- 2607:5300:50::4 0.0% 10 10.8 10.9 9.9 12.1 0.6
7.|-- fdff:f003:400::17 0.0% 10 8.7 8.8 8.6 8.8 0.0
8.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0

It’s not clear what that hop 8 is, but when IPV6 is working just after an initial reboot, that hop 8 is still there, still showing a 100% loss… but my server shows up as hop 9.

9.|-- 2607:5000:1:1::5 0.0% 10 8.4 8.4 8.4 8.4 0.0

The only difference is that, after a few hours, when IPV6 fails, hop 9 vanishes, and the mtr stops at hop 8.

As you can probably guess from the use of “br0”, my machine is running in Xen mode. When IPV6 fails after about 4 hours, the guests are also impacted. However, IPV6 on the server itself still works. Guests can ping each other and the host, and the host can ping the guests. Critically (I think this is critical anyways), both hosts and guests CAN PING THE GATEWAY:

ping6 fe80::1%br0 (from the host)
ping6 fe80::1%eth0 (from the guests)

all show packet traffic and all show responses as normal. This makes me want to believe that the server itself is doing just fine, and that the problem exists somewhere in OVH, outside of my server.

But I don’t want to get into a “contest” with them, so I am hoping that experts here might see something I’m missing, or suggest things I can try, or give me an opinion of what might be happening here. Any insights would be gratefully appreciated! Thank you!

Glen

Show

ip -6 r

when it works and when it stops working.

Use preformatted text when posting computer output (and input) to make it readable.

Thank you so much for the reply!

I’ve included the (preformatted) output below. Some additional things. OVH has a thing on their bare metal servers called “rescue mode” which, as I understand it, basically swaps in a ramdisk-based preconfigured instance of Debian 12 specially set up to just allow for hardware and network testing when their staff needs to access a server. In my most recent support ticket, they asked me to put the server into rescue mode to see if the problem happens there. I did so, and left it for 7 hours, and IPV6 did not fail. As I said I really don’t want to get into a “contest” with them - but I fear they will just say, “Well, this is your OS, not our problem.”

While I was in there, I took the opportunity to run not only the ip -6 r command but a number of other commands on the rescue OS. I then rebooted into Leap, and ran those same commands on the Leap OS.

I am including the output of these commands below. I absolutely did note the request for ip -6 r after IPV6 fails, and I will post another reply here hopefully within 4 hours but wanted to get this comparison out first. The one thing I note in the ip -6 r output apart from the two extra fe80 entries (which seem to be autocreated even if I remove them from the config files and reboot) is the presence of the keyword ONLINK on the (working) debian route. From the ip-route(8) manual page, I see:

onlink - pretend that the nexthop is directly attached to this link, even if it does not match any interface prefix.

I never would have seen that had I not looked on their live “rescue” config. Could that be the issue here?

Here is the comparative output for everything. IP addresses that matter were modified as previously advised. Thank you again for any insight anyone might have!

uname -a (Debian)

<pre>
Linux rescue12-customer-ca 6.1.51-mod-std #3813959 SMP PREEMPT_DYNAMIC Tue Apr 15 07:59:44 UTC 2025 x86_64 GNU/Linux
</pre>

uname -a (Leap)

<pre>
Linux ovh0 6.4.0-150600.23.42-default #1 SMP PREEMPT_DYNAMIC Fri Mar  7 09:53:00 UTC 2025 (7bf6ecd) x86_64 x86_64 x86_64 GNU/Linux
</pre>

lsmod (Debian, full list)

<pre>
Module                  Size  Used by
zfs                  3616768  0
zunicode              335872  1 zfs
zzstd                 561152  1 zfs
zlua                  172032  1 zfs
zavl                   16384  1 zfs
icp                   299008  1 zfs
zcommon                81920  2 zfs,icp
znvpair                73728  2 zfs,zcommon
spl                    86016  6 zfs,icp,zzstd,znvpair,zcommon,zavl
cdc_ether              20480  0
usbnet                 36864  1 cdc_ether
wmi                    24576  0
backlight              16384  0
</pre>

lsmod (Leap, there are 150 modules, I'm including only those that reference "ip6" or "ipv6" or seem relevant):

<pre>
Module                  Size  Used by
ip6table_raw           12288  1
ip6table_nat           12288  0
ip6table_filter        12288  1
ip6_tables             36864  3 ip6table_filter,ip6table_raw,ip6table_nat
nf_conntrack_ftp       24576  2
xt_CT                  12288  2
xt_tcpudp              20480  9
xt_pkttype             12288  1
xt_state               12288  0
xt_conntrack           12288  3
iptable_raw            12288  1
iptable_nat            12288  0
nf_nat                 61440  2 ip6table_nat,iptable_nat
nf_conntrack          204800  5 xt_conntrack,nf_nat,xt_state,nf_conntrack_ftp,xt_CT
nf_defrag_ipv6         24576  1 nf_conntrack
nf_defrag_ipv4         12288  1 nf_conntrack
iptable_filter         12288  1
bridge                450560  0
stp                    12288  1 bridge
llc                    16384  2 bridge,stp
ip_tables              36864  3 iptable_filter,iptable_raw,iptable_nat
x_tables               65536  13 ip6table_filter,xt_conntrack,ip6table_raw,iptable_filter,ip6table_nat,xt_state,xt_tcpudp,ip6_tables,xt_CT,xt_pkttype,iptable_raw,ip_tables,iptable_nat
libcrc32c              12288  3 nf_conntrack,nf_nat,xfs
</pre>

Debian network configuration: (I manually changed the IPV6 address to end in ::5 for uniformity)

/etc/network/interfaces.d/55-rescue:

<pre>
auto eth0
allow-hotplug eth0
iface eth0 inet dhcp
accept_ra 0
</pre>

/etc/network/interfaces.d/60-rescue-ipv6:

<pre>
iface eth0 inet6 static
    address 2607:5000:1:1::/56
    gateway fe80::1
</pre>

Leap network configuration:

/etc/sysconfig/network/ifcfg-br0:

<pre>
IPADDR='1.2.3.4/32'
IPADDR_1='2607:5000:1:1::5/56'
BOOTPROTO='static'
STARTMODE='auto'
BRIDGE='yes'
BRIDGE_PORTS='eth0'
BRIDGE_STP='off'
BRIDGE_FORWARDDELAY='0'
</pre>

/etc/sysconfig/network/ifroute-br0:

<pre>
5.6.7.8/32 - - br0
5.6.7.0/24 - - br0
fe80::1/128 - - br0
</pre>

/etc/sysconfig/network/routes:

<pre>
default 5.6.7.8 - br0
default fe80::1 - br0
</pre>


ip -6 a (Debian)

<pre>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2607:5000:1:1::5/56 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::c670:bdff:fe89:6b20/64 scope link
       valid_lft forever preferred_lft forever
</pre>

ip -6 a (Leap)

<pre>
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2607:5000:1:1::5/56 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::4483:f2ff:fe8a:f6dd/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
</pre>

ip -6 n (Debian)

<pre>
fe80::560f:2cff:fe50:de05 dev eth0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::560f:2cff:fe4d:850d dev eth0 lladdr 54:0f:2c:4d:85:0d router STALE
fe80::1 dev eth0 lladdr fe:ed:de:ad:be:ef router DELAY
</pre>

ip -6 n (Leap)

<pre>
fe80::560f:2cff:fe50:de05 dev br0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::c670:bdff:fe89:6b20 dev br0 FAILED
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router REACHABLE
</pre>

ip -6 r (Debian)

<pre>
2607:5000:1:1::/56 dev eth0 proto kernel metric 256 pref medium
default via fe80::1 dev eth0 metric 1024 onlink pref medium
</pre>

ip -6 r (Leap)

<pre>
2607:5000:1:1::/56 dev br0 proto kernel metric 256 pref medium
fe80::1 dev br0 metric 1024 pref medium
fe80::/64 dev br0 proto kernel metric 256 pref medium
default via fe80::1 dev br0 metric 1024 pref medium
</pre>

 mtr -6 -r -n -c 10 2607:5000:2:2b26::26 (From Debian to the outside world)

<pre>
Start: 2025-04-17T23:09:11+0000
HOST: rescue12-customer-ca        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- fe80::560f:2cff:fe4d:850d  0.0%    10    0.4   0.4   0.3   0.4   0.0
  2.|-- fdff:f003:400::27          0.0%    10    0.4   0.4   0.4   0.5   0.0
  3.|-- fdff:f003:400::22          0.0%    10    1.7   3.0   1.7   4.4   0.8
  4.|-- 2607:5300:50::7            0.0%    10    3.1   2.9   1.8   4.1   0.8
  5.|-- 2607:5300::2629           20.0%    10    3.3   4.2   3.3   5.5   0.7
  6.|-- 2607:5300::252f           90.0%    10   14.9  14.9  14.9  14.9   0.0
  7.|-- 2607:5300::1c3             0.0%    10    9.0   9.0   8.9   9.1   0.1
  8.|-- 2001:41d0:0:50::6:8a5      0.0%    10    9.6   9.7   9.5   9.8   0.1
  9.|-- 2001:41d0:0:50::2:533b     0.0%    10    9.2   9.2   9.0   9.5   0.1
 10.|-- 2607:5000:2:2b26::26       0.0%    10    8.7   8.8   8.7   8.8   0.0
</pre>

mtr -6 -r -n -c 10 2607:5000:2:2b26::26 (From Leap to the outside world)

<pre>
Start: 2025-04-17T16:32:59-0700
HOST: ovh0                        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- fe80::560f:2cff:fe4d:850d  0.0%    10    0.3   0.3   0.2   0.3   0.0
  2.|-- fdff:f003:400::27          0.0%    10    0.3   0.3   0.3   0.3   0.0
  3.|-- fdff:f003:400::26          0.0%    10    2.7   2.6   1.6   4.1   0.7
  4.|-- 2607:5300:50::7            0.0%    10    2.6   2.8   2.0   3.2   0.4
  5.|-- 2607:5300::2629            0.0%    10    4.1   4.5   3.4   5.0   0.5
  6.|-- 2607:5300::252f           50.0%    10   15.0  11.0   9.7  15.0   2.2
  7.|-- 2607:5300::1c3             0.0%    10    8.8   8.8   8.8   8.9   0.0
  8.|-- 2001:41d0:0:50::6:895      0.0%    10    9.5   9.5   9.4   9.5   0.1
  9.|-- 2001:41d0:0:50::2:533b     0.0%    10    9.0   9.0   9.0   9.2   0.1
 10.|-- 2607:5000:2:2b26::26       0.0%    10    8.6   8.6   8.6   8.6   0.0
</pre>

mtr -6 -r -n -c 10 2607:5000:1:1::5 (From the outside world to Debian)

<pre>
Start: 2025-04-17T16:11:08-0700
HOST: ovh1                        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2607:5000:2:2bff:ff:ff:    0.0%    10    0.7   0.8   0.7   1.2   0.1
  2.|-- 2001:41d0:0:50::2:5348     0.0%    10    1.2   1.1   1.0   1.2   0.1
  3.|-- 2001:41d0:0:50::6:892      0.0%    10    0.5   0.3   0.3   0.5   0.1
  4.|-- 2607:5300::1cc            30.0%    10    1.1   0.9   0.8   1.1   0.1
  5.|-- 2607:5300::ef              0.0%    10    8.6   9.2   8.3   9.7   0.5
  6.|-- 2607:5300:50::4            0.0%    10   10.8  11.5  10.4  12.6   0.8
  7.|-- fdff:f003:400::17          0.0%    10    8.8   8.7   8.7   8.8   0.0
  8.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  9.|-- 2607:5000:1:1::5           0.0%    10    8.5   8.5   8.5   8.6   0.0
</pre>

mtr -6 -r -n -c 10 2607:5000:1:1::5 (From the outside world to Leap)

<pre>
Start: 2025-04-17T16:33:32-0700
HOST: ovh1                        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2607:5000:2:2bff:ff:ff:    0.0%    10    0.8   1.3   0.8   4.5   1.1
  2.|-- 2001:41d0:0:50::2:5348     0.0%    10    1.0   1.1   1.0   1.5   0.1
  3.|-- 2001:41d0:0:50::6:892      0.0%    10    0.3   0.3   0.2   0.3   0.0
  4.|-- 2607:5300::1cc            70.0%    10    0.7   0.8   0.7   1.0   0.1
  5.|-- 2607:5300::ef              0.0%    10    8.8   9.4   8.8  10.8   0.6
  6.|-- 2607:5300:50::4            0.0%    10   10.9  10.9  10.0  11.8   0.5
  7.|-- fdff:f003:400::17          0.0%    10    8.7   8.7   8.7   8.8   0.0
  8.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  9.|-- 2607:5000:1:1::5           0.0%    10    8.4   8.4   8.3   8.5   0.0
</pre>

I forgot one other thing that I did. I basically ran:
sysctl -a | grep ipv6 | grep all
on both operating systems to determine any differences there. I found several.

In their rescue mode, I have:

<pre>
net.ipv6.conf.all.accept_ra = 1
net.ipv6.conf.all.accept_redirects = 1
net.ipv6.conf.all.autoconf = 1
net.ipv6.conf.all.forwarding = 0
</pre>

Ironic, I think, since OVH states that we should disable autoconf and accept_ra “to avoid known issues.”

On the Leap configuration, these are reversed:

<pre>
net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.all.autoconf = 0
net.ipv6.conf.all.forwarding = 1
</pre>

Drilling down farther and looking at the actual interfaces, I see the same differences, plus one more:

Their rescue system shows:

<pre>
net.ipv6.conf.eth0.use_tempaddr = 0
</pre>

whereas Leap shows:

<pre>
net.ipv6.conf.br0.use_tempaddr = 1
</pre>

Finally, I note that there are four parameters in Leap which are not in their Debian rescue system:

<pre>
net.ipv6.conf.all.accept_ra_rt_info_max_plen = 0
net.ipv6.conf.all.accept_ra_rt_info_min_plen = 0
net.ipv6.conf.all.mc_forwarding = 0
net.ipv6.conf.all.seg6_require_hmac = 0
</pre>

These parameters are set the same at the “br0” interface level, but are not present anywhere in the Debian output.

Just thought I should mention this in case any of it might be helpful. Thank you again for reading and for any insights!

@glen use code tag no pre or just use markdown

@malcolmlewis - Thank you so much, will do! Apologies for my ineptness, and thank you for fixing my posts!

All - Well, this time IPV6 didn’t last an hour, and it is now down.

I have now re-run all the commands against Leap in its down condition. Obviously some like uname, lsmod (I checked!), and the network config files are not changed. ip -6 a is also not changed, it is identical. Here is what did change:

ip -6 n (Leap-Up)

fe80::560f:2cff:fe50:de05 dev br0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::c670:bdff:fe89:6b20 dev br0 FAILED
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router REACHABLE

ip -6 n (Leap-Down)

fe80::560f:2cff:fe50:de05 dev br0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::c670:bdff:fe89:6b20 dev br0 FAILED
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router STALE
fe80::560f:2cff:fe4d:850d dev br0 lladdr 54:0f:2c:4d:85:0d router STALE

ip -6 r did not change, it is identical. Here is the down version:

2607:5000:1:1::/56 dev br0 proto kernel metric 256 pref medium
fe80::1 dev br0 metric 1024 pref medium
fe80::/64 dev br0 proto kernel metric 256 pref medium
default via fe80::1 dev br0 metric 1024 pref medium

mtr -6 -r -n -c 10 2607:5000:2:2b26::26 (From Leap to the outside world, offline)
This command literally generates no output at all, other than headers:

Start: 2025-04-17T17:41:19-0700
HOST: ovh0                        Loss%   Snt   Last   Avg  Best  Wrst StDev

mtr -6 -r -n -c 10 2607:5000:1:1::5 (From the outside world to Leap, offline)
The output here is rougly the same as before, except that it now stops at hop 8, the “???” hop, and does not go farther.

Start: 2025-04-17T17:42:26-0700
HOST: ovh1                        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2607:5000:2:2bff:ff:ff:    0.0%    10    1.0   1.4   0.8   4.6   1.2
  2.|-- 2001:41d0:0:50::2:5348     0.0%    10    1.0   1.0   1.0   1.1   0.1
  3.|-- 2001:41d0:0:50::6:892      0.0%    10    0.3   0.3   0.2   0.3   0.0
  4.|-- 2607:5300::1cc            10.0%    10    0.9   0.9   0.8   1.2   0.1
  5.|-- 2607:5300::ef              0.0%    10   10.5  10.4   9.9  11.5   0.5
  6.|-- 2607:5300:50::4            0.0%    10   12.0  11.3  10.1  12.5   0.8
  7.|-- fdff:f003:400::17          0.0%    10    8.7   8.7   8.7   8.7   0.0
  8.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0

I hope this answers, please let me know if anyone sees anything or needs more information. Thank you so much for reading through this!

Glen

Earlier I floated the idea of onlink. With the server now offline for IPV6, I did this:

ip -6 route flush table main
ip -6 route add 2607:5000:1:1::/56 dev br0 proto kernel metric 256 pref medium
ip -6 route add default via fe80::1 dev br0 onlink

My routing table now looks exactly like the Debian rescue mode routing table did:

2607:5000:1:1::/56 dev br0 proto kernel metric 256 pref medium
default via fe80::1 dev br0 metric 1024 onlink pref medium

But alas that did not restore IPV6 traffic flow.

Glen

Does the iproute - Why does an IPv6 neighbour router status become STALE? How can I avoid it? - Unix & Linux Stack Exchange apply here?

@arvidjaar : Thank you for your reply and the pointer.

I’ve read the article, the answers and comments, and reviewed the (rather long) linked Cisco article. The problem of course is that I can’t see or access OVH’s router/switch configuration, so I don’t know what they’re doing.

That said, during the time the server is up, the fe80::1 neighbor entry moves repeatedly between STALE and REACHABLE. Rarely I see it in DELAY, but usually it is either STALE or REACHABLE. This is the case even after IPV6 external access goes down and I can’t connect outside of the physical host. Even in that state, FE80::1 does show REACHABLE from time to time… and I can always ping it. ping6 fe80::1%br0 always works, even when no other external addresses work.

I will definitely try the author’s sequence of steps when IPV6 goes down again (overnight my time) and report back.

As for other items in the comments/answers, I do not have any kind of IPV6 firewall running. In addition, the route does not vanish from my routing table (nor does the neighbor vanish from the neighbor table.) It just… stops routing. It “feels” like the router just forgets my server somehow. Or vice versa.

What really caught my attention about the article is the virtual host/guest aspect. IPV6 forwarding is turned on on that person’s host (of necessity) and I discovered that when forwarding is on, sysctl accept_ra and autoconf are apparently disabled (the latter through the chain of accept_ra->accept_ra_pinfo->autoconf. SInce their “rescue” machine is the exact opposite of their documentation, I’m trying to figure out of that’s relevant.

What I’m doing right now is I’ve forcibly reconfigured all the sysctl parameters to match their debian rescue machine, and - noting the accept_ra and autoconf are dependent upon forwarding being disabled, I’ve done that too. Of course I can’t run Xen with forwarding disabled; at this point, I’m just trying to bracket the problem.

What I would really like to know at this point, I guess is: Why does this get fixed - and ONLY get fixed - on a server reboot? Stopping and restarting wickedd.service, network.service, and doing all kinds of other things does NOT fix the problem. Yet if I reboot the physical host, it’s suddenly fixed again (until it’s not).

If I knew what that extra on-boot step(s) was, maybe I could just trigger it somehow. Which is suboptimal, but would at least help me understand.

If my sysctl changes fix this, I’ll turn forwarding back on. If that breaks it, I’ll try to duplicate on their “rescue” server to “prove” the problem to them. But otherwise I’m completely in the dark here.

Anyway, my answer, I guess, is “Maybe” it’s related? In that example the router never gets past STALE anyway, that’s “proven” to be okay, yet he’s still having problems every 20 minutes or few hours.

In this case I am afraid it is something on your provider side.

Sorry, did pay attention (using preformatted text does help interpreting computer output). The %br0 implies your host, but you said from the very beginning that host worked:

Your guest does not see br0, it sees (or should be seeing) a normal interface. You need to check and show information from your guest, not host. Looking back, all output appears from the host.

@arvidjaar Thanks for bearing with me on this, I’m very sorry for the confusion.

So I’m dealing with a physical machine, running OpenSuse Leap 15.6 in Xen Hypervisor mode. It’s primary interface is br0. It is configured with IPV4 and IPV6, and it is experiencing the loss of IPV6 access. This loss of access happens whether there are guests running or not. So I’ve been focusing on just debugging the host, which is why my output has been using br0.

When guests are running, they experience the same thing. The moment the host loses IPV6 access, so do the guests. However, after the access is lost, the guests and the host can still communicate with each other over IPV6… and the guests and the host can still ping fe80::1.

Despite my efforts at sysctl tweaking, IPV6 access again got lost last night. So, right now, I have the host and two guests running on the server. All are accessible to the entire world via IPV4. Over IPV6, the guests can communicate with each other, and with the host, and vice versa. None of them can reach, or be reached by, the outside world… except fe80::1.

All of them can still ping fe80::1, which is the OVH-controlled upstream router. So, for example, from the host:

# ping6 fe80::1%br0
PING fe80::1%br0(fe80::1%br0) 56 data bytes
64 bytes from fe80::1%br0: icmp_seq=1 ttl=64 time=0.258 ms
64 bytes from fe80::1%br0: icmp_seq=2 ttl=64 time=0.203 ms
64 bytes from fe80::1%br0: icmp_seq=3 ttl=64 time=0.203 ms

and from the guests:

# ping6 fe80::1%eth0
PING fe80::1%eth0(fe80::1%eth0) 56 data bytes
64 bytes from fe80::1%eth0: icmp_seq=1 ttl=64 time=0.350 ms
64 bytes from fe80::1%eth0: icmp_seq=2 ttl=64 time=0.299 ms
64 bytes from fe80::1%eth0: icmp_seq=3 ttl=64 time=0.269 ms

The routing tables are still in place. On the host:

# ip -6 r
2607:5000:1:1::/56 dev br0 proto kernel metric 256 pref medium
fe80::/64 dev vif2.0 proto kernel metric 256 pref medium
fe80::/64 dev vif3.0 proto kernel metric 256 pref medium
default via fe80::1 dev br0 metric 1024 onlink pref medium

And on the guest:

# ip -6 r
2607:5000:1:1::/56 dev eth0 proto kernel metric 256 pref medium
default via fe80::1 dev eth0 metric 1024 onlink pref medium

And the neighbor tables from the host:

# ip -6 n
fe80::560f:2cff:fe4d:850d dev br0 lladdr 54:0f:2c:4d:85:0d router STALE
fe80::560f:2cff:fe50:de05 dev br0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router STALE
2607:5000:1:1::b dev br0 lladdr 02:00:01:34:09:71 STALE
2607:5000:1:1::a dev br0 lladdr 02:00:01:34:09:70 STALE
fe80::1ff:fe34:971 dev br0 lladdr 02:00:01:34:09:71 STALE

and the guest:

# ip -6 n
fe80::560f:2cff:fe4d:850d dev eth0 lladdr 54:0f:2c:4d:85:0d router STALE
2607:5000:1:1::5 dev eth0 lladdr c4:70:bd:89:6b:20 STALE
fe80::1 dev eth0 lladdr fe:ed:de:ad:be:ef router STALE
fe80::560f:2cff:fe50:de05 dev eth0 lladdr 54:0f:2c:50:de:05 router STALE

As I’ve typed this, the neighbor tables do not seem to be changing. Nothing is in DELAY or FAILED, everything on both host and guest seems stuck on STALE. In fact, it’s almost like the neighbor tables are just frozen. Yet, IPV6 itself is still up. From the article you linked:

watch -n0.3 ip -6 neigh show

Nothing seems to be changing at all. This is true for both host and guest. So something is “shut down” somewhere.

Now, working again from that article, I will follow the steps to try to unstick the host:

# ip -6 n flush dev br0

# ip -6 n

# ping6 fe80::1%br0
PING fe80::1%br0(fe80::1%br0) 56 data bytes
64 bytes from fe80::1%br0: icmp_seq=1 ttl=64 time=0.482 ms
64 bytes from fe80::1%br0: icmp_seq=2 ttl=64 time=0.209 ms
64 bytes from fe80::1%br0: icmp_seq=3 ttl=64 time=0.234 ms

# ping6 2607:5000:1:1::a
PING 2607:5000:1:1::a(2607:5000:1:1::a) 56 data bytes
64 bytes from 2607:5000:1:1::a: icmp_seq=1 ttl=64 time=0.424 ms
64 bytes from 2607:5000:1:1::a: icmp_seq=2 ttl=64 time=0.120 ms
64 bytes from 2607:5000:1:1::a: icmp_seq=3 ttl=64 time=0.156 ms

This does seem to wake up the neighbor table, sort of:

# ip -6 n
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router STALE
fe80::1ff:fe34:970 dev br0 lladdr 02:00:01:34:09:70 REACHABLE
2607:5000:1:1::a dev br0 lladdr 02:00:01:34:09:70 REACHABLE

I note that subsequent runs of ip -6 n now show the fe80::1 as being REACHABLE at times - the table is “awake” again…

Every 0.3s: ip -6 neigh show                                ovh0: Fri Apr 18 09:03:41 2025

fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router REACHABLE
fe80::1ff:fe34:970 dev br0 lladdr 02:00:01:34:09:70 STALE

However, it does not restore IPV6 service to the host:

# ping6 cnn.com
PING cnn.com(2a04:4e42:e00::773 (2a04:4e42:e00::773)) 56 data bytes
^C
--- cnn.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1006ms

I will next try the same thing on the guest. On the guest, during my changes to the host, the guest state did not change, the neighbor table remains on STALE for everything as shown above. So I use the same process:

# ip -6 n flush dev eth0

# ip -6 n

# ping6 fe80::1%eth0
PING fe80::1%eth0(fe80::1%eth0) 56 data bytes
64 bytes from fe80::1%eth0: icmp_seq=1 ttl=64 time=0.570 ms
64 bytes from fe80::1%eth0: icmp_seq=2 ttl=64 time=0.264 ms
64 bytes from fe80::1%eth0: icmp_seq=3 ttl=64 time=0.209 ms

# ping6 2607:5000:1:1::5
PING 2607:5000:1:1::5(2607:5000:1:1::5) 56 data bytes
64 bytes from 2607:5000:1:1::5: icmp_seq=1 ttl=64 time=0.452 ms
64 bytes from 2607:5000:1:1::5: icmp_seq=2 ttl=64 time=0.176 ms
64 bytes from 2607:5000:1:1::5: icmp_seq=3 ttl=64 time=0.190 ms

This does restore the neighbor table as well:

# ip -6 n
2607:5000:1:1::5 dev eth0 lladdr c4:70:bd:89:6b:20 REACHABLE
fe80::1 dev eth0 lladdr fe:ed:de:ad:be:ef router REACHABLE
fe80::c001:85ff:fe35:4c40 dev eth0 lladdr c4:70:bd:89:6b:20 REACHABLE

But not actual IPV6 service through the host to the outside world.

# ping6 cnn.com
PING cnn.com(2a04:4e42:800::773 (2a04:4e42:800::773)) 56 data bytes
^C
--- cnn.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2047ms

The guests seem dependent upon the host, in that no matter when I start the guests, whether at host boot, or 5 minutes before the host happens to fail, they all fail at the same moment the host does. If I completely shut down and reboot a guest with the host in failed state, IPV6 service does not work for the guest at all.

The only thing that fixes the problem - so far - is a full reboot of the host - and that only fixes it for 1-4 hours, until the fail happens again.

I hope this answers, please let me know, and thank you again for reading!

There is a new interesting wrinkle here.

IPV6 failed again. I came across a mention of radvd and tried starting it, but that didn’t work. I then tried rebooting the server back to “normal” mode (instead of Xen hypervisor mode), and that didn’t work either - service still went down 4 hours later.

I had previously mentioned that nothing I could do - not even restarting the network - could restore service. My next step in debugging was to eliminate the bridge configuration and just configure eth0 on the host directly. I used yast2 lan to make the changes. When I came out of yast2 lan, and I re-applied the ip -6 addr and ip -6 route commands… IPV6 service came back online immediately, without a reboot..

There are some different additional addresses happening now too.

ip -6 a (Bridging On)

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2607:5000:1:1::5/56 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::4483:f2ff:fe8a:f6dd/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

ip -6 a (Bridging Off)

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2607:5000:1:1::5/56 scope global
       valid_lft forever preferred_lft forever
    inet6 fdd5:21f9:f5e:1:3f50:f36a:f042:5151/64 scope global temporary dynamic
       valid_lft 86395sec preferred_lft 14395sec
    inet6 fdd5:21f9:f5e:1:c670:bdff:fe89:6b20/64 scope global dynamic mngtmpaddr proto kernel_ra
       valid_lft 86395sec preferred_lft 14395sec
    inet6 fe80::c670:bdff:fe89:6b20/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

It’s too early to tell if IPV6 will last more than 4 hours now, but, to quote from Groundhog Day… Anything different is good.

So could the bridging subsystem be the cause of the problem, or be a contributing factor? If so, any recommendations on where to look? Of course I can’t use the host without Xen and bridging, this is just about bracketing the problem, and it may go nowhere.

I’ll report back if it fails again, or lasts for more than 8 hours.

Thank you to everyone for reading! Thoughts welcome!
Glen

@glen So on the guest systems you select the bridge device as the interface eg br0, then use say a e1000 device, not virtio/nat etc?

On the guest systems, there is only one choice, “Virtual Ethernet Card 0 (eth0)” and that’s what I’m selecting for the guests.

On the host system, I see all the physical network cards. The first one is configured to be “Included in br0”. I also see the “Bridge (br0)” and that’s what I’m using for the host.

So the physical host is just br0 → (attached to physical eth0) → cable
And the guests are eth0 → (Xen bridges them onto the host br0)

So on the physical host I had something like this:

# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.ac1f6b40491a       no              eth0
                                                        vif2.0
                                                        vif3.0

where vif2.0 and vif3.0 are the manifestations of the “eth0” on each guest, respectively.

At this point I have the guests shut down and disabled, and am just trying to solve for the host itself. I had switched out of Xen mode, and that didn’t help, so I then deleted the bridge adapter and just set up the host’s “Ethernet Controller 10-Gigabit X540-AT2 (eth0)” directly.

That’s when I noticed that for the first time IPV6 came back without a reboot. I am now waiting to see if it fails or not over time.

I hope I’m answering you correctly, please let me know if I need to clarify anything else, and thank you!

And… there it is.

I removed bridging from the physical host, and IPV6 comes up and stays up reliably. It’s been 8 hours now, and it’s super-stable.

Capturing a few details for the record:

ip -6 addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2607:5000:1:1::5/56 scope global
       valid_lft forever preferred_lft forever
    inet6 fdd5:21f9:f5e:1:3f50:f36a:f042:5151/64 scope global temporary deprecated dynamic
       valid_lft 60168sec preferred_lft 0sec
    inet6 fdd5:21f9:f5e:1:c670:bdff:fe89:6b20/64 scope global deprecated dynamic mngtmpaddr proto kernel_ra
       valid_lft 60168sec preferred_lft 0sec
    inet6 fe80::c670:bdff:fe89:6b20/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

ip -6 n

fe80::560f:2cff:fe50:de05 dev eth0 lladdr 54:0f:2c:50:de:05 router STALE
fe80::c670:bdff:fe89:6b20 dev eth0 lladdr c4:70:bd:89:6b:20 router STALE
fe80::560f:2cff:fe4d:850d dev eth0 lladdr 54:0f:2c:4d:85:0d router STALE
fe80::1 dev eth0 lladdr fe:ed:de:ad:be:ef router DELAY

ip -6 r

2607:5000:1:1::/56 dev eth0 proto kernel metric 256 pref medium
fdd5:21f9:f5e:1::/64 dev eth0 proto kernel metric 256 expires 60097sec pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::1 dev eth0 metric 1024 onlink pref medium

This is with no IPV6 configuration on the server itself, and only the running of these two commands to activate it:

ip -6 addr add 2607:5000:1:1::5/56 dev eth0
ip -6 route add default via fe80::1 dev eth0 onlink

This is, again, on the physical host, which has been taken out of Xen mode and had bridging removed, and is otherwise a stock installation of Leap 15.6. No firewall, no sysctl modifications, nothing. All of that was removed for this test.

So the incompatibility seems to be between OVH and Leap 15.6 Bridging mode.

OVH has been completely unhelpful, and is just giving me copypasted “Go ask an expert” responses. There’s been enough frustration on that side that I’m mentioning it here as a matter of conscience. (Contrast that with the people on this forum, who don’t know me, have nothing to gain, and yet have stepped up with helpful feedback consistently. To you all, I am very grateful.)

My next step now is to change the server to Tumbleweed and see if I get lucky there. If Tumbleweed solves this, then I assume I should consider the matter closed, since Leap 15.6 is already released and 16.0 is on the way. If Tumbleweed does not solve this, and the problem is relating to bridging, then I assume I should open a new thread on the forum with that focus.

Please let me know if any of this is not right, or if anyone has any thoughts or guidance on anything for me. Thank you!

Glen

Try

echo 0 > /sys/devices/virtual/net/br0/bridge/multicast_snooping

This is probably relevant as well:

https://bugzilla.kernel.org/show_bug.cgi?id=99081

Thank you so much for this!

Well, Tumbleweed did NOT want to go on this server - NVME disk problems, power problems, tons of kernel complaints, and it ate itself after the third boot - so I have now fresh-loaded Leap 15.6 back onto the server again. The network is bridged, IPV6 is activated and your command is in the activation chain. Everything is up, armed and ready. Tumbleweed is great, but we can’t take Leap for granted!

It’s midnight here, so I’m going to sleep for a bit, but after 8-9 hours of sleep I’ll report back and let you know the status!

Thank you also for the bugzilla report. It was interesting - and I am cautiously hopeful!

Glen

Argh. I am sorry to report that this did not work. IPV6 still stopped routing.

Thanks. I read through it, and I tried the suggestions I could find in there, including things like:

echo 1 > /sys/devices/virtual/net/br0/bridge/multicast_querier
ifconfig br0 promisc

and trying to use tcpdump to trigger a restoral. None of those things worked.

As for the tcpdump, I got the following output:

# tcpdump -i br0 icmp6
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:28:29.136353 IP6 fe80::1 > ipv6-allnodes: ICMP6, neighbor advertisement, tgt is fe80::1, length 32
10:28:29.136406 IP6 fe80::fced:deff:fead:beef > ipv6-allnodes: ICMP6, neighbor advertisement, tgt is fe80::fced:deff:fead:beef, length 32
10:28:29.902041 IP6 fe80::1 > ipv6-allnodes: ICMP6, neighbor advertisement, tgt is fe80::1, length 32
10:28:29.902096 IP6 fe80::fced:deff:fead:beef > ipv6-allnodes: ICMP6, neighbor advertisement, tgt is fe80::fced:deff:fead:beef, length 32
10:28:30.358965 IP6 fe80::560f:2cff:fe4d:850d > ff02::1:ff89:6b20: ICMP6, neighbor solicitation, who has fe80::c670:bdff:fe89:6b20, length 32
10:28:31.370460 IP6 fe80::560f:2cff:fe4d:850d > ff02::1:ff89:6b20: ICMP6, neighbor solicitation, who has fe80::c670:bdff:fe89:6b20, length 32
10:28:32.394354 IP6 fe80::560f:2cff:fe4d:850d > ff02::1:ff89:6b20: ICMP6, neighbor solicitation, who has fe80::c670:bdff:fe89:6b20, length 32
10:28:32.823083 IP6 fe80::560f:2cff:fe50:de05 > ff02::1:ff89:6b20: ICMP6, neighbor solicitation, who has fe80::c670:bdff:fe89:6b20, length 32
10:28:33.827136 IP6 fe80::560f:2cff:fe50:de05 > ff02::1:ff89:6b20: ICMP6, neighbor solicitation, who has fe80::c670:bdff:fe89:6b20, length 32
10:28:34.851077 IP6 fe80::560f:2cff:fe50:de05 > ff02::1:ff89:6b20: ICMP6, neighbor solicitation, who has fe80::c670:bdff:fe89:6b20, length 32

That cycle repeats every 10 seconds or so.

As for the bridge command mentioned in that article, my output is currently empty.

However, if I restart the network, the command does generate output:

# systemctl restart network
# bridge mdb show
dev br0 port br0 grp ff02::1:ff00:0 temp
dev br0 port br0 grp ff02::2 temp
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ffa8:67 temp

At this point IPV6 is not configured. When I run the config commands, the output changes:

# ip -6 addr add 2607:5000:1:1::5/56 dev br0
# ip -6 route add default via fe80::1 dev br0 onlink
# bridge mdb show                                
dev br0 port br0 grp ff02::1:ff00:5 temp
dev br0 port br0 grp ff02::1:ff00:0 temp
dev br0 port br0 grp ff02::2 temp
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ffa8:67 temp

But of coure ipv6 still doesn’t work at this point.

The thing in the article that seems to be common is this:

Alas, that did not restore service here.

It feels like this should be so simple, like there’s something that can be kicked, or flushed, or something, to restore service… but this is deeper than I’ve ever had to dig before, and I’m just not finding anything so far.

Thank you for your responses and I hope something in this output suggests something else!

Glen