I manage several labs with computers that dual-boot Windows 10 and OpenSUSE Leap 15.1 that sometimes fail to pull DHCP IP addresses when booting into OpenSUSE.
When these computers have been idle for a time, over night or over a weekend, and they are rebooted from Windows (which they usually run) to Linux, some percentage (roughly a quarter to half) of the machines fail to get and IP address. If I log in to the computers, I can see that the interface is up, I can do a tcpdump on the interface and see packets, but nothing I do once the interface has failed to pull an IP address seems to correct the problem. I have restarted the network with “service network restart” and wicked with “service wicked restart”, but neither results in obtaining an IP address. I can assign an address manually.
If I reboot the computer into Windows, wait for it to obtain an IP address, and reboot into Linux, the Linux computer will pull a DHCP IP. After booting into Linux and successfully pulling a DHCP IP address, the computer will continue to work and get an IP address for some indeterminate amount of time.
When booting Windows or the previous Linux version (OpenSUSE Leap 42.3), we never experienced this problem. I have also installed the wicked* packages from Leap 42.3 on our Leap 15.1 computers and it seems to resolve the problem–at least until and auto update script that I run on boot updates the system and replaces the old wicked* packages with the new ones.
I realize that I could simply install and lock the old version of wicked*, but I would prefer to figure out what is happening and why.
This also happens when I disable wicked and use network manager instead.
The inconsistent nature of the problem and the fact that once a machine pulls an IP address in Linux has made troubleshooting this quite difficult.
Does anyone know what might be going on or have suggestions for debugging this behavior? I have turned on debug logging in wicked, but so far haven’t found the cause.
Thanks! I’ll look at the bug report in just a moment and see if it holds the key to fixing this.
Yes, we tried NetworkManager instead of Wicked and had essentially the same results. We narrowed this down to the DHCP helper application in NetworkManager, ran that by itself, and it basically just entered a loop of trying, and failing, to get and address.
Your problem is not a bug or anything particularly surprising if you understand how DHCP works…
When a machine configured as a DHCP client boots up and contacts the DHCP server for the first time, the DHCP will look in its records and find this is a new machine so will hand out an IP address to this machine previously identified only by its MAC address.
The DHCP lease will have a TTL (Time to live) and for a long as that lease is valid, that machine and only that machine (identified by its MAC address) will be allowed to use that IP address. When the DHCP lease life is half over, the DHCP server will be looking for, and expect that the machine assigned that IP address will come online and use that address. If the machine connects, then the lease is automatically renewed, and if it doesn’t then when the lease expires completely is returned to the pool of assignable addresses.
When you boot only one OS assigned that DHCP lease, then everything works reliably.
Now, consider what happens when you dual-boot and boot using the other OS.
Unless you’ve artificially over-ridden the MAC address burned into your NIC by the manufacturer, you’re using the same MAC address as the other OS but your second OS won’t know that IP has already been assigned. Your OS will ask for a DHCP address to be assigned to it, but your DHCP server seeing that an IP address has already been assigned to the machine with that MAC address will be silent and unresponsive.
If you understand the above, there are some obvious solutions…
In your second OS, over-ride the MAC address so that the DHCP server recognizes the machine as uniquely different than when your machine is running the first OS.
Use a different NIC
Connect using a different network connection that is using a different NetworkID because DHCP servers are rarely configured to serve different address ranges.