Default Route on Leap 15.4 Inconsistent when Toggling Wicked

alex.gooch · June 13, 2024, 7:51pm

Hello All!

Up Front Info
I have a bare metal system with Leap 15.4 installed. Hardware has 4 NICs – of which I have eth0 and eth1 cabled. The eth0 network is the primary network. It gets its IP from DHCP. The eth1 interface is what we consider the camera/accessory network. It also receives its IP from DHCP.

The eth0 interface is a part of a different network segment than the eth1. We can call the eth0 segment “facility” and the eth1 segment “management.”

I have a couple of hundred of these machines in the field. Of these, maybe 10 or so have had this same issue. Each location has its own DHCP server.

My Issue and Troubleshooting
The issue that I am having with my system is that when I toggle wicked.service (or reboot the box), it is a crapshoot as to whether eth0 or eth1 will come up as the primary interface. When the primary interface is switched to eth1, we lose the ability to ssh to the machine, even though both eth0 and eth1 show a status of UP. We were experiencing this issue before changing any configuration files or default routes.

e.g.
While my machine was pinging, I bounced wicked.service (instead of rebooting – which is where we also see this issue manifest). After that first service restart, the machine only dropped off of the network for a few seconds and than began pinging. When I run the ‘ip route’ command, I see that eth0 is showing as the default.

*I restarted the service again (5 more times) and each time, the box stopped pinging indefinitely (I am consoled in) and the default route showed the 192 (eth1) ip address when running ‘ip route’ again. *

Once I restarted the service again, we were back to working condition with ‘ip route’ showing the expected eth0 interface and IP.

As I have been troubleshooting this, I have become more familiar with some of the wicked configuration files (but I am by no means an expert – so please show some grace). My ifcfg-eth* files are shown below. The additions I made to troubleshoot are followed by an asterisk.

eth0:
STARTMODE=auto
BOOTPROTO=dhcp
DEFROUTE=yes
ZONE=public
DHCLIENT_SET_HOSTNAME=yes
DEBUG=yes*
PERSISTENT=yes*

eth1:
STARTMODE=auto
BOOTPROTO=dhcp
DEFROUTE=no
DEBUG=yes*

My /etc/udev/rules.d/70-persistent-net.rules look like this (scrubbed for personalized info):

SUBSYSTEM==“net”, ACTION==“add”, DRIVERS==“?“, ATTR{address}==”<some_mac>:b7", ATTR{dev_id}==“0x0”, ATTR{type}==“1”, KERNEL=="eth”, NAME=“eth3”
SUBSYSTEM==“net”, ACTION==“add”, DRIVERS==“?“, ATTR{address}==”<some_mac>3:b5", ATTR{dev_id}==“0x0”, ATTR{type}==“1”, KERNEL=="eth”, NAME=“eth1”
SUBSYSTEM==“net”, ACTION==“add”, DRIVERS==“?“, ATTR{address}==”<some_mac>:b6", ATTR{dev_id}==“0x0”, ATTR{type}==“1”, KERNEL=="eth”, NAME=“eth2”
SUBSYSTEM==“net”, ACTION==“add”, DRIVERS==“?“, ATTR{address}==”<some_mac>:b4", ATTR{dev_id}==“0x0”, ATTR{type}==“1”, KERNEL=="eth”, NAME=“eth0”

There were no ‘route’ files in the path /etc/sysconfig/network until I manually created “routes.” Once I went into Yast and configured eth0 as the default route, the other files were created (and routes was zeroed).

$ ls -l route
-rw-r–r-- 1 root root 28 Jun 13 14:48 ifroute-eth0
-rw-r–r-- 1 root root 29 Jun 13 14:48 ifroute-eth0.YaST2save
-rw-r–r-- 1 root root 0 Jun 13 14:48 routes
-rw-r–r-- 1 root root 0 Jun 13 14:48 routes.YaST2save

I have since removed all route files except for ifroute-eth0, which contains the following line:

default <router_ip> - eth0

*Note: I have also tried the segment IP (ending in .1) and this does not work either.
*Another Note: We do not have route files configured in ANY of the ~200 machines in the field, so I am not sure how impactful this is.

From the console, I had one session redirecting dmesg output, while in another session I restarted wicked. On restarts that resulted in the default route being switched to eth1, nothing stood out to me in dmesg (or journalctl -u wicked.service) that would indicate why/how this primary route flip occurred. Admittedly, this could be because I dont know exactly what to look for.

When a box working properly, we see this line in the output from the ‘wicked show all’ command under the eth0 interface:

route: ipv4 default via <segment_ip> 1 proto dhcp

What I am struggling to find is the root cause of what is causing my the primary/default route to change once the machine is rebooted or wicked.service is restarted. I have run as many checks as I know of, but we are still in the same state on this machine, where a reboot could make the machine inaccessible via ssh.

I read through many similar issues on this forum, but nothing resolved my issue. Any help would be greatly appreciated. Thank you all in advance! -Alex

sboehringer · June 13, 2024, 8:33pm

Do you have udev rules in place to make device naming persistent?

arvidjaar · June 14, 2024, 4:03am

You forgot to show us the result of this command. Show

ip a
ip -4 r
ip -6 r

both in “good” and “bad” case.

hcvv · June 14, 2024, 7:12am

And this is a bit old fashioned. It is End-of-Life since ~ half a year. It will get more problematic for others to check things or re-create your problem on their system because they will not have a 15.4 one anymore. Also, I think being correct when warning that Wicked is deprecated. Maybe not on 15.4, but knowledge to help you will dwindle.

alex.gooch · June 14, 2024, 12:23pm

Good morning and thank you for the reply! I looked through that link and the only udev rules that I have in place are the “/etc/udev/rules.d/70-persistent-net.rules” that I showed in my original post.

I believe this is what you are asking. If not, please let me know!

alex.gooch · June 14, 2024, 12:26pm

I appreciate the info! This was the OS that was decided on for the machines a year or two ago, so I have no say in that. As far as wicked goes, it was the default for a minimal install. We will be looking to do a product migration at some point, but right now I am trying to get these problem boxes stable.

alex.gooch · June 14, 2024, 12:47pm

Good call!

In a good state (eth0 as primary), here are the returns for those commands:

**ip a**
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b4 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f0
    altname ens13f0
    inet 10.xx:xx.187/24 brd 10.xx:xx.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 xxxx::xxxx:xxxx:xxxx:43b4/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b5 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f1
    altname ens13f1
    inet 192.xx:xx.204/23 brd 192.xx:xx.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 xxxx::xxxx:xxxx:xxxx:43b5/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b6 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f2
    altname ens13f2
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b7 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f3
    altname ens13f3
6: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 76:5d:22:d8:d0:47 brd ff:ff:ff:ff:ff:ff
    altname enp0s20f0u13u6

**ip -4 r**
default via 10.xx:xx.1 dev eth0 proto dhcp
10.xx:xx.0/24 dev eth0 proto kernel scope link src 10.xx:xx.187
192.xx:xx.0/23 dev eth1 proto kernel scope link src 192.xx:xx.204

**ip -6 r**
::1 dev lo proto kernel metric 256 pref medium
xxxx::/64 dev eth0 proto kernel metric 256 pref medium
xxxx::/64 dev eth1 proto kernel metric 256 pref medium

Here are the returns for those commands when in a bad (eth1 primary) state:

**ip a**
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b4 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f0
    altname ens13f0
    inet 10.xx:xx.187/24 brd 10.xx:xx.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 xxxx::xxxx:xxxx:xxxx:43b4/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b5 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f1
    altname ens13f1
    inet 192.xx:xx.204/23 brd 192.xx:xx.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 xxxx::xxxx:xxxx:xxxx:43b5/64 scope link
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b6 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f2
    altname ens13f2
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether xx:xx:xx:xx:xx:b7 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f3
    altname ens13f3
6: usb0: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 76:5d:22:d8:d0:47 brd ff:ff:ff:ff:ff:ff
    altname enp0s20f0u13u6

**ip -4 r**
default via 192.xx:xx.1 dev eth1 proto dhcp
10.xx:xx.0/24 dev eth0 proto kernel scope link src 10.xx:xx.187
192.xx:xx.0/23 dev eth1 proto kernel scope link src 192.xx:xx.204

**ip -6 r**
::1 dev lo proto kernel metric 256 pref medium
xxxx::/64 dev eth0 proto kernel metric 256 pref medium
xxxx::/64 dev eth1 proto kernel metric 256 pref medium

Please let me know if I can provide anything else that may help.

arvidjaar · June 15, 2024, 5:47am

So on both networks DHCP servers return default route. Pragmatic answer - do not do it. If network does not have external connectivity, why it claims to have default route?

The order in which servers respond is indeterminate. I do not know if it is wicked or kernel who drops the second default route. If you were not paranoid to obfuscate private addresses I would suggest enabling wicked debug log, but I am not interested in reading reducted version.

This is RH compatibility option. On SUSE use DHCLIENT_SET_DEFAULT_ROUTE.

alex.gooch · June 19, 2024, 2:38pm

Using the proper option resolved my issue. Thanks!

system · June 26, 2024, 2:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.