PXE installation, suddenly DHCP configuration fails

Hi all,

I am having problems installing openSUSE 15.6 (older versions behave the same) via PXE for a few weeks.

I am installing openSUSE for years using opsi (software deployment system). But suddenly this stopped working. I already spent a lot of time searching for the solution.

The workflow is as follows:

  1. opsi-linux-bootimage boots via PXE
  2. opsi-linux-bootimage fetches opensuse kernel and initrd from server
  3. opsi-linux-bootimage executes the new kernel
  4. linuxrc runs and starts the GUI with automated opensuse installation using an autoyast.xml

Everything works fine until 4. Linuxrc stops and shows the message: “Please make sure your installation medium is available”.
The pop up window contains the correct https URL to the isocontent folder. If I try again, it says: “DHCP configuration failed”.

I have already discovered the following during troubleshooting:

  • the problem is, that linuxrc does not get an IP address and therefore can not connect to the isocontent folder
  • during the installation process PXE and the opsi linux boot image already received the right IP address, so network connection basically works
  • I am aware of this opensuse topic (SLES 15 SP5 | Deployment Guide | Preparing network boot environment) and the client is configured to send the dhcp clientID using the RFC2131 standard through this line of pyhton code run from the opsi-linux-bootimage:
execute(f"{which('kexec')} --load {kernel_file} --append='install=https://{fqdn}:{pckey}@{depotAddress}:4447/depot/{productId}/isocontent/ {append_line} ifcfg={hardwareAddress}=dhcp,DHCLIENT_CREATE_CID=rfc2132' --initrd={inird_file}")

  • I tried to restart the network interface using wicked ifdown/ifup or ìp link set down/up` from the linuxrc shell after the error occurs, both do not work to get an IP address
  • the only way to get an IP address is to unplug the ethernet cable at the and plug it in again, after that linuxrc automatically gets the IP address, also wicked down/up works correctly after this

I need to automate the installation process. So unplugging the cable is not an satisfying option. I don’t understand why wicked and ip link set do not do the same as unplugging and plugging in the cable. If this would work I could try to implement wicked ifdown/ifup in the startup process of linuxrc.

I have no more ideas and really hope someone has experienced similar things and con help me with this problem.

Thanks a lot in advance.

Best regards,

segler

Capture network traffic, check whether client attempts any DHCP transaction at all.

Hi @arvidjaar ,

thanks for your post. I got set up a port mirror and used wireshark to check the dhcp traffic when the client boots and starts the installation.

As I understand a DHCP request creates these 3 packages. 1 DHCP discover package which contains the client ID of the client requesting an IP address. And 2 DHCP offer packages, one from each DHCP server in our network.

I could figure out, at the step where the installation fails, the client does send a DHCP request and the DHCP servers do offer the correct IP address. The client ID Option 61 in the DHCP discover packages also look right according to the RFC2131 standard.

The request is also repeated every few seconds. But the client somehow does not get the IP address.

Can you narrow down the error with that?

Thanks lot

Best regards

If you were

and then

then it does not sounds like openSUSE problem. Something changed or happened in your environment. If unplugging and plugging cable again make it work, it may be the switch port that gets reset on upon replugging.

We already checked the ports. The problem happens at every port of this switch. I also tested a different switch. And different computers.

There is one strange behaivior with linuxrc, which is part of openSuse.

When i try wicked ifdown and wicked ifup in linuxrc, it does not really turn off the port. At the switch we saw, that the ports gets down and immidiately turns back on after wicked ifdown was entered. Ip a shows the port is down.

Is there a command that really turns down the port?

I found a workaround for the main problem. I seems to be related to the current opsi-linux-bootimage in my configuration.

You can find more information in this opsi forum thread:

openSuse Netboot DHCP Fehler seit OPSI 4.3 stable - August 2024 - Seite 2 - forum.opsi.org

Best regards

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.