IPV6 failing after 4 hours on OpenSuse 15.6 server at OVH

Sorry, I was sloppy here. I had said “normally” but edited to “currently”. Running the setup sequence step-by-step, restarting the network causes that command to start producing output. Adding the IPV6 configuration commands modifies the output slightly. It is only when I run the

echo 0 > /sys/devices/virtual/net/br0/bridge/multicast_snooping

command that the output from “bridge mdb show” becomes empty. I suspect that’s expected, but am just trying to mention everything since I’m stuck.

Glen

Just in case it helps…

ovh0:/sys/devices/virtual/net/br0/bridge # for i in *; do echo $i: `cat $i`; done

ageing_time: 30000
bridge_id: 8000.c470bd896b20
default_pvid: 1
flush: [write only]
forward_delay: 1500
gc_timer: 1390
group_addr: 01:80:c2:00:00:00
group_fwd_mask: 0x0
hash_elasticity: 16
hash_max: 4096
hello_time: 200
hello_timer: 0
max_age: 2000
multicast_igmp_version: 2
multicast_last_member_count: 2
multicast_last_member_interval: 100
multicast_membership_interval: 26000
multicast_mld_version: 1
multicast_querier: 0
multicast_querier_interval: 25500
multicast_query_interval: 12500
multicast_query_response_interval: 1000
multicast_query_use_ifaddr: 0
multicast_router: 1
multicast_snooping: 0
multicast_startup_query_count: 2
multicast_startup_query_interval: 3124
multicast_stats_enabled: 0
nf_call_arptables: 0
nf_call_ip6tables: 0
nf_call_iptables: 0
no_linklocal_learn: 0
priority: 32768
root_id: 8000.c470bd896b20
root_path_cost: 0
root_port: 0
stp_state: 0
tcn_timer: 0
topology_change: 0
topology_change_detected: 0
topology_change_timer: 0
vlan_filtering: 0
vlan_protocol: 0x8100
vlan_stats_enabled: 0
vlan_stats_per_port: 0

Having nothing to lose, I did try:

echo 1 > /sys/devices/virtual/net/br0/bridge/flush

That did nothing.

Sorry for the caps, but I MAY HAVE FOUND IT…

Having nothing to lose, and with the machine in the broken state, I went poking around /proc/sys/net/ipv6/conf/all. I tried “cycling” IPV6 by writing a “1” to “disable_ipv6” (which did disable it) and then a “0” which enabled it. I reconfigured the network, and that did nothing.

However, I then saw something intriguing:

-rw-r--r-- 1 root root 0 Apr 19 10:50 ignore_routes_with_linkdown

So I did this:

echo 1 > ignore_routes_with_linkdown

AND IPV6 CAME BACK IMMEDIATELY:

# ping6 cnn.com
PING cnn.com(2a04:4e42:e00::773 (2a04:4e42:e00::773)) 56 data bytes
64 bytes from 2a04:4e42:e00::773 (2a04:4e42:e00::773): icmp_seq=1 ttl=54 time=27.7 ms
64 bytes from 2a04:4e42:e00::773 (2a04:4e42:e00::773): icmp_seq=2 ttl=54 time=27.7 ms

I would be interested in thoughts on this?

I’m now going to put this command in sysctl, reboot, and see how long it lasts.

I will report back!

Glen

Ugh.

Nope, that didn’t work. Must have been a fluke. IPV6 lasted 3.5 hours and stopped. Tried toggling that new flag, no result.

ovh0:/proc/sys/net/ipv6/conf/all # for i in *; do echo $i: `cat $i`; done
accept_dad: 0
accept_ra: 1
accept_ra_defrtr: 1
accept_ra_from_local: 0
accept_ra_min_hop_limit: 1
accept_ra_mtu: 1
accept_ra_pinfo: 1
accept_ra_rt_info_max_plen: 0
accept_ra_rt_info_min_plen: 0
accept_ra_rtr_pref: 1
accept_redirects: 0
accept_source_route: 0
accept_untracked_na: 0
addr_gen_mode: 0
autoconf: 1
dad_transmits: 1
disable_ipv6: 0
disable_policy: 0
drop_unicast_in_l2_multicast: 0
drop_unsolicited_na: 0
enhanced_dad: 1
force_mld_version: 0
force_tllao: 0
forwarding: 1
hop_limit: 64
ignore_routes_with_linkdown: 1
ioam6_enabled: 0
ioam6_id: 65535
ioam6_id_wide: 4294967295
keep_addr_on_down: 0
max_addresses: 16
max_desync_factor: 600
mc_forwarding: 0
mldv1_unsolicited_report_interval: 10000
mldv2_unsolicited_report_interval: 1000
mtu: 1280
ndisc_evict_nocarrier: 1
ndisc_notify: 0
ndisc_tclass: 0
proxy_ndp: 0
ra_defrtr_metric: 1024
regen_max_retry: 3
router_probe_interval: 60
router_solicitation_delay: 1
router_solicitation_interval: 4
router_solicitation_max_interval: 3600
router_solicitations: -1
rpl_seg_enabled: 0
seg6_enabled: 0
seg6_require_hmac: 0
suppress_frag_ndisc: 1
temp_prefered_lft: 86400
temp_valid_lft: 604800
use_oif_addrs_only: 0
use_tempaddr: 0

I’m still poking, and hoping anyone has any more ideas here!

Glen

Randomly, after pushing another set of random buttons, and checking after every step, IPV6 came back without a server reboot following:

echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 0 > /proc/sys/net/ipv6/conf/all/disable_ipv6
ip -6 addr add 2607:5000:1:1::5/56 dev br0
ip -6 route add default via fe80::1 dev br0 onlink

This didn’t work in the past (but in the past ignore_routes_with_linkdown was not activated) so I speculate that maybe it does work in combination with linkdown. But not sure if this is “reliable”, not sure if other random buttons impacted this. So, rebooting again, with linkdown still in sysctl, but with the intent of resetting everything else to stock defaults. Then I’ll observe and try this again.

Worst case I can check ipv6 and run a reset sequence under cron control if I have to. Ugh though.

Glen

Fresh reboot. The only change from stock is

net.ipv6.conf.all.ignore_routes_with_linkdown = 1

All previous changes mentioned anywhere in this thread are reverted by the fresh load I did last night.

IPV6 went down at four hours. Logged in and this is what happened. The following commands were issued one right after the other. The whole sequence took 10 seconds:

# echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
# echo 0 > /proc/sys/net/ipv6/conf/all/disable_ipv6
# ip -6 addr add 2607:5000:1:1::5/56 dev br0
# ip -6 route add default via fe80::1 dev br0 onlink

# ping6 cnn.com
3 packets transmitted, 0 received, 100% packet loss

# ping6 cnn.com
3 packets transmitted, 0 received, 100% packet loss

# ping6 cnn.com
3 packets transmitted, 0 received, 100% packet loss

# ip -6 n
fe80::560f:2cff:fe4d:850d dev br0 lladdr 54:0f:2c:4d:85:0d DELAY
fe80::1 dev br0 lladdr fe:ed:de:ad:be:ef router REACHABLE

# ping6 cnn.com
PING cnn.com(2a04:4e42:c00::773 (2a04:4e42:c00::773)) 56 data bytes
64 bytes from 2a04:4e42:c00::773 (2a04:4e42:c00::773): icmp_seq=1 ttl=54 time=27.7 ms
64 bytes from 2a04:4e42:c00::773 (2a04:4e42:c00::773): icmp_seq=2 ttl=54 time=27.7 ms
^C
--- cnn.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms

I’m at a total loss now.

Glen

Your interfaces are set as route. Try setting it to 0 to switch them to host mode. And do not rely on conf/all - it is just one-time change and each individual interface is free to modify any value. Check values for the interface in question.

Thank you for this! I have a question: If I set them to host, that will cut my Xen guests off from the outside world, I think yes? Is there another way to connect the guests to the outside world if I set forwarding to zero?

Mysteriously - and I have changed nothing - all I did was that “resurrect” sequence in my last reply - but mysteriously, IPV6 has now been up and reliable for host and guests for almost 24 hours. If and/or when it goes down next I’ll keep trying to test further, but this is just insane, I have no idea what is happening. I wish I could see things from the OVH side and test from both ends somehow.

No. First, your guests are not routed (at least, to the extent you described your configuration); second, the ipv6/conf/X/forwarding has nothing to do with actual packet forwarding. It just sets the interface behavior.

Thank you so much for this explanation. I will try this next.

At the moment IPV6 has remained up constantly since that last restart, and I have absolutely no idea why. But I can’t be satisfied with that. My plan is to give it the rest of this week. If it stays up, I’m going to reboot and monitor and see if it fails again. I will apply your fix at the next breakdown and report back.

Thank you!
Glen

All -

I just wanted to close this out with a final report. Although no solution has been found, a workaround has been found, and unless someone wants to push on this more, I’m just going with what I have.

PROBLEM SUMMARY

To recap the problem:

  1. IPv6 external routing on Leap 15.6 is failing on one machine 1-4 hours after server boot.
  2. The problem can be reproduced reliably, even on a fresh load of Leap 15.6, server profile only.
  3. It only happens on IPV6, IPV4 is unaffected.
  4. It only happens when the network interface is in bridged mode (br0), regardless of whether the Xen hypervisor is active or not.
  5. It only happens at OVH locations using the new, undocumented, “V3” networking mode for IPV6, characterized by a default gateway of fe80::1. It is not happening at other OVH or non-OVH locations.
  6. Changes to /sys, /proc, and other sysctl structures did not solve the problem (many were tried, including forwarding, multicast_snooping, ignore_linkdown, etc., none of them solved the issue).
  7. No clues, anomalies, or changes are noted in the IP -6 addr, neighbor, or route tables. Clearing or resetting those things does not work.
  8. It happens whether IPV6 is configured in Yast2, via NetworkManager, via Wicked, or through the use of manual setup commands.
  9. When external routing fails, the host can still communicate over IPV6 with itself, with the default gateway at OVH, and with any guests on the host. IPV6 itself is not failing, only routing to and from the outside world.
  10. When external routing fails, any running guests are also cut off for IPV6.
  11. systemctl restart network, or similar attempts to reset the network, do not restore connectivity.
  12. Only a full reboot of the server (physical host) will bring back connectivity, until it fails again 1-4 hours later.

STEPS TO REPRODUCE

This process can be reproduced with the following steps:

  1. Lease a new server from OVH which uses their new V3 networking (expensive, given their setup fees)
  2. Do a fresh load of OpenSuse Leap 15.6, using just the basic server profile.
  3. Enable Xen, which will create a bridged network interface. It is not necessary to boot into Xen Hypervisor mode, just having the br0 interface in use is enough.
  4. Leave IPV4 as is (DHCP) or configure it manually, it makes no difference.
  5. Configure IPV6.
  6. Wait 1-4 hours.

IPV6 routing connectivity will fail reliably at this point, and only a reboot appeared to bring it back.

The simplest way to configure IPV6 in this case is using manual commands. I used:

ip -6 addr add 2607:5000:1:1::5/56 dev br0
ip -6 route add default via fe80::1 dev br0 onlink

to bring up IPV6 service on boot. As noted, the method doesn’t matter, the failure will happen regardless of configuration method. The 2607 address shown in this thread is an example only. The fe80::1 is the actual default gateway for all IPV6 servers on the new V3 OVH networks.

As noted, no solution has been found for this issue as of the time of this writing. However, by random chance through ongoing testing, I have found a workaround:

WORKAROUND:

After the server boots, and IPV6 has come up, execute the following command sequence:

#!/bin/bash
# /usr/local/bin/recycleipv6
#
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
sleep 1
echo 0 > /proc/sys/net/ipv6/conf/all/disable_ipv6
sleep 1
/sbin/ip -6 addr add 2607:5000:1:1::5/56 dev br0
/sbin/ip -6 route add default via fe80::1 dev br0 onlink
exit 0

Substitute in your own IPV6 address of course.

Running these commands will cause IPV6 to come up and stay up permanently. It is not necessary to wait for IPV6 to fail before running these commands (although they will work to restore service reliably even after the failure.) You can just run them 5 seconds after the server boots. Making a new systemd service that runs “After=network.service” and then calls a script to run these commands is sufficient. For example:

# /usr/lib/systemd/system/recycleipv6.service
#

[Unit]
Description=Corrects OVH IPV6 routing issue at system startup.
After=network.service
After=mysql.service

[Service]
Type=oneshot
User=root
ExecStart=/usr/local/bin/recycleipv6
KillSignal=SIGHUP

[Install]
WantedBy=multi-user.target

Make sure you have “exit 0” in the script to keep systemd happy.

This is not a solution to the problem. It’s a workaround. I still don’t know why this is happening. I cannot of course get access to the OVH routers, or any useful help from OVH at all, they are not interested in this. I cannot say why the disable_ipv6 works, and corrects the problem permanently, when restarting the whole network does not. I surmise that there is something about the initial initialization of IPV6 in the Leap stack that is being “triggered” by OVH’s crazy network setup. Perhaps something is being done out of order, or in an unexpected order, that is only exposed in bridge mode, and only at OVH. But such things are beyond my ability (or resources) to debug further or understand.

Regardless, resetting just the IPV6 stack as illustrated here after each reboot does return stability to IPV6, for the host, and for any guests running on the host. Indeed, it is not necessary to run this on Xen guests. You need only run it on the physical host, and the problem is prevented for host and guests.

So, while this is just a hack, not a solution, I will nevertheless attempt to mark this as a solution so that in case anyone trips over this in the future they might find this thread. OVH has informed me that they’re rolling out this new network to all new servers in all data centers, so unless we find something on our end, other people wanting to run Xen on Leap at OVH will probably trip over this.

@arvidjaar - Thank you for your patience and help, I am very grateful.
@malcolmlewis - Thank you for overseeing this and educating me on this forum. I am very grateful to you also.

Thank you,
Glen

3 Likes

Wow, @glen thanks for the big report.

Output of

grep -r . /proc/sys/net/ipv6

before and after this script would be interesting.

@knurpht It was my pleasure. I love OpenSuSE, have for decades, happy to try to help others on the rare occasions that I can.

@arvidjaar More interesting than I could possibly have imagined!!!

reboot
# wait for server to reboot
grep -r . /proc/sys/net/ipv6 > proc-1
/usr/local/bin/recycleipv6
grep -r . /proc/sys/net/ipv6 > proc-2
diff proc-1 proc-2

187c187
< /proc/sys/net/ipv6/conf/eth0/disable_ipv6:1
---
> /proc/sys/net/ipv6/conf/eth0/disable_ipv6:0

I have NO idea what to make of that! I was literally connected in to the server over IPV6 after the reboot. And disable_ipv6 was set to 1!

It’s NOT in /etc/sysctl.conf… and there’s only one file in /etc/sysctl.d:

cat /etc/sysctl.d/70-yast.conf

net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.all.disable_ipv6 = 0

And this is an absolutely fresh load with nothing added or touched, just Xen enabled and the network configured: IPV4 during install, and IPV6 with the two commands.

Unfortunately this machine is now more or less “in production” so I can’t do another reload, but during testing I observed that removing the bridge configuration solved the problem, and re-adding it broke it again.

Could the bridge adapter be somehow forcing IPV6 off after sysctl is loaded, but before my recycle script runs? Bizarre!

Glen

1 Like

What does sysctl --system show?

Nice! I didn’t know about that option to sysctl! Something new - thanks for that!

Here’s the output:

# sysctl --system
* Applying /boot/sysctl.conf-6.4.0-150600.23.47-default ...
kernel.hung_task_timeout_secs = 0
kernel.msgmax = 65536
kernel.msgmnb = 65536
kernel.shmmax = 0xffffffffffffffff
kernel.shmall = 0x0fffffffffffff00
vm.dirty_ratio = 20
* Applying /usr/lib/sysctl.d/50-default.conf ...
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
net.ipv6.conf.default.use_tempaddr = 1
net.ipv4.ping_group_range = 0 2147483647
fs.inotify.max_user_watches = 65536
kernel.sysrq = 184
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
kernel.kptr_restrict = 1
* Applying /usr/lib/sysctl.d/51-network.conf ...
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0
* Applying /etc/sysctl.d/70-yast.conf ...
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.all.disable_ipv6 = 0
* Applying /usr/lib/sysctl.d/99-sysctl.conf ...
kernel.panic = 5
vm.panic_on_oom = 2
vm.min_free_kbytes = 524288
vm.swappiness = 0
* Applying /etc/sysctl.conf ...
kernel.panic = 5
vm.panic_on_oom = 2
vm.min_free_kbytes = 524288
vm.swappiness = 0

Just for interest -

I tried removing my solution of using the above-documented solution script post-bootup, and, based on the discovery that ipv6_disable was setting itself to 1 somehow, I tried replacing my script with just

echo 0 > /proc/sys/net/ipv6/conf/all/disable_ipv6

prior to the initial setup and configuration of IPV6.

The routing died 3 hours later. The reset script brought it back online.

So it seems the method documented in the solution is necessary, at least in this case.

Glen

With the knowledge that /proc/sys/net/ipv6/conf/all/disable_ipv6 is the culprit I would go one step further and try to find out what/who is changing the value to 1.

That can be done using auditd functionality:

> sudo auditctl -w /proc/sys/net/ipv6/conf/all/disable_ipv6  -p war -k monitor-ipv6
> sudo systemctl restart auditd
> cat /proc/sys/net/ipv6/conf/all/disable_ipv6
> sudo ausearch -ts today -k monitor-ipv6
----
time->Wed Apr 30 20:36:13 2025
type=CONFIG_CHANGE msg=audit(1746038173.105:350): auid=1000 ses=2 subj=unconfined op=add_rule key="monitor-ipv6" list=4 res=1

Not sure if this is giving the definite answer but at least it will gives clues.

For understanding the log line, see 7.6. Understanding Audit Log Files | Red Hat Product Documentation

1 Like

Hi, and thank you for this. I am so sorry for my ignorance here, but I’m struggling with this.

I ran the commands you mentioned, and did get the output you cited when the ausearch command was run:

----
time->Wed Apr 30 15:41:56 2025
type=CONFIG_CHANGE msg=audit(1746052916.050:85670): auid=0 ses=2808 op=add_rule key="monitor-ipv6" list=4 res=1

However, after I rebooted, there was no additional output.

I then looked up the references you gave me and found that I needed to put the rules in /etc/audit/audit.rules, which I did, and then I rebooted again, but I discovered after more debugging that the reboot wiped them out again. I then read the file comments and made a new file in /etc/audit/rules.d/glen.rules. That seemed to make the rule persistent, but the output makes me feel like it’s still not working:

# ausearch -ts today -k monitor-ipv6
----
time->Wed Apr 30 16:53:27 2025
type=PROCTITLE msg=audit(1746057207.848:18): proctitle=2F7362696E2F617564697463746C002D52002F6574632F61756469742F61756469742E72756C6573
type=PATH msg=audit(1746057207.848:18): item=0 name="/proc/sys/net/ipv6/conf/all/" inode=18993 dev=00:27 mode=040555 ouid=0 ogid=0 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1746057207.848:18): cwd="/"
type=SOCKADDR msg=audit(1746057207.848:18): saddr=100000000000000000000000
type=SYSCALL msg=audit(1746057207.848:18): arch=c000003e syscall=44 success=yes exit=1108 a0=3 a1=7fff3fd02cc0 a2=454 a3=0 items=1 ppid=1124 pid=1198 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditctl" exe="/usr/sbin/auditctl" key=(null)
type=CONFIG_CHANGE msg=audit(1746057207.848:18): auid=4294967295 ses=4294967295 op=add_rule key="monitor-ipv6" list=4 res=1
----
time->Wed Apr 30 16:53:27 2025
type=PROCTITLE msg=audit(1746057207.848:19): proctitle=2F7362696E2F617564697463746C002D52002F6574632F61756469742F61756469742E72756C6573
type=PATH msg=audit(1746057207.848:19): item=0 name="/proc/sys/net/ipv6/conf/br0/disable_ipv6" nametype=UNKNOWN cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1746057207.848:19): cwd="/"
type=SOCKADDR msg=audit(1746057207.848:19): saddr=100000000000000000000000
type=SYSCALL msg=audit(1746057207.848:19): arch=c000003e syscall=44 success=yes exit=1108 a0=3 a1=7fff3fd02cc0 a2=454 a3=0 items=1 ppid=1124 pid=1198 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditctl" exe="/usr/sbin/auditctl" key=(null)
type=CONFIG_CHANGE msg=audit(1746057207.848:19): auid=4294967295 ses=4294967295 op=add_rule key="monitor-ipv6" list=4 res=0
----
time->Wed Apr 30 16:53:27 2025
type=CONFIG_CHANGE msg=audit(1746057207.848:20): auid=4294967295 ses=4294967295op=remove_rule path="/proc/sys/net/ipv6/conf/all/disable_ipv6" key="monitor-ipv6" list=4 res=1

As I said I have no experience with this, but it looks like the rule is being removed right after it’s being added?

And, crucially, if I run my recycle script, which writes twice to proc/sys/net/ipv6/conf/all/disable_ipv6, no further output is generated. After much experimenting I’m left with:

# cat /etc/audit/rules.d/glen.rules
-w /proc/sys/net/ipv6/conf/all/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/br0/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/default/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/eth0/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/lo/disable_ipv6 -p warx -k monitor-ipv6

And:

# cat /etc/audit/audit.rules
## This file is automatically generated from /etc/audit/rules.d
-D

-a task,never
-w /proc/sys/net/ipv6/conf/all/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/br0/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/default/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/eth0/disable_ipv6 -p warx -k monitor-ipv6
-w /proc/sys/net/ipv6/conf/lo/disable_ipv6 -p warx -k monitor-ipv6

But:

# auditctl -l
-a never,task

I’m afraid that to dig this out I’m going to need a little more guidance please.

Thank you and sorry!

Glen

Good to read that the command I gave gives you the same output, that is a good start.

I never needed to have the rules in place after a reboot, this is a debug facility, after a reboot I either use the command history or write a small script to do the settings.

So instead of debugging why things after a reboot do not work (could it be that every rules needs a unique key?), could you undo the recycleipv6 at boot, run the auditctl command that is working and wait until the problem happens.

On the other hand, the fact that recycleipv6 at boot fixes the problem likely makes that capturing the event after days will not give too much valuable information, it will likely indicate the problem is triggered from within the kernel without value other clues.