dhcpcd not sending DHCPDISCOVER/-REQUEST at boot, timing out

Hello,

we have a problem with dhcpcd at boot time on any openSUSE version from 11.0 to 11.3. It seems that a number of workstations never send out DHCPDISCOVER or DHCPREQUEST at boot time, we have verified this with packet dumps. The dhcp client progress bar is displayed on the console but eventually times out, goes into background and the system continues booting. This is a problem because the timeout takes a long time and users have to wait. Sometimes the display manager is even started but users cannot login yet since we’re using LDAP authentication. Eventually these systems just continue to use their old lease and networking works. Curiously, when we do a network restart after boot, the clients send DHCPDISCOVER/-REQUEST normally, we only have this problem at boot time.

On the server side we’re using ISC dhcpcd-1.3.22pl4-223.13 on SLES 10 SP2. I have read about others who had the same problem, they switched from dhcpcd to dhclient. I have also tried this, but for us dhclient is not an option for a number of other reasons. Another thing I have tried is setting DHCLIENT_SLEEP (“Some interfaces need time to initialize. Add the latency time in seconds”) to up to two minutes to give the interface time to initialize. Unfortunately this didn’t change anything.

Does anyone have any other ideas that we could try or research?

Regards

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

How are you doing the LAN trace, meaning with which command on on which
box? What do you see when you do so? If you are doing it from another
box are you using a hub or a switch? Does the LAN trace show what you
expect on a box that is working? Have you tried tracing when just
restarting the network service? Does it work when using dhclient (even
though that’s apparently not an option in the long term for you)? Do the
logs on your system show anything?

Good luck.

On 08/02/2010 04:36 AM, gundelgauk wrote:
>
> Hello,
>
> we have a problem with dhcpcd at boot time on any openSUSE version from
> 11.0 to 11.3. It seems that a number of workstations never send out
> DHCPDISCOVER or DHCPREQUEST at boot time, we have verified this with
> packet dumps. The dhcp client progress bar is displayed on the console
> but eventually times out, goes into background and the system continues
> booting. This is a problem because the timeout takes a long time and
> users have to wait. Sometimes the display manager is even started but
> users cannot login yet since we’re using LDAP authentication. Eventually
> these systems just continue to use their old lease and networking works.
> Curiously, when we do a network restart after boot, the clients send
> DHCPDISCOVER/-REQUEST normally, we only have this problem at boot time.
>
> On the server side we’re using ISC dhcpcd-1.3.22pl4-223.13 on SLES 10
> SP2. I have read about others who had the same problem, they switched
> from dhcpcd to dhclient. I have also tried this, but for us dhclient is
> not an option for a number of other reasons. Another thing I have tried
> is setting DHCLIENT_SLEEP (“Some interfaces need time to initialize. Add
> the latency time in seconds”) to up to two minutes to give the interface
> time to initialize. Unfortunately this didn’t change anything.
>
> Does anyone have any other ideas that we could try or research?
>
> Regards
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJMVr2hAAoJEF+XTK08PnB5mCkP/2RJW2L454ATEiTNK8T7mYLb
llqHTMr0kxYy0kLMTlM7p1hsZ/v2kNPrC5ste3jc+9SJMnvJcYAaOqWmmIHI7hVb
HDwfKfUPNjeKPkWHGFMXJXalHyTcNvpSLne9jXWouBMJg7RQJ5p4VF/6kHajdf7K
vTwo+1cdQ6FGmh8doNHwRGHOQP/JeOtSUuKyGHcEY5Ttah1bM5nYFlwnBDmj60aM
WapBkwJ6D6wGX3MYkqQ7SiUpInL8USiRYbgIWsicTfzG2IoMJNQ1wTQ6hUqnnwzL
zn1lCKG2vqbgfyG+NxDCD3Fusa3ZdFRBID0cHwvfVPw0XvhYT1vkRmssw7IO4/+1
vDEAiunGBd7oR95x7S6Aq1K6b0IwADDn2FY3Uvlb0vdHcj83cXj8SwRovUk+ZHE3
H30NJDXMv7eFfUVC2FuoE7VP6/Zttmqu+2QgOIVwRIFsNwJO98b38NK74KlU43Ll
PJXZ2Vr29x3GvorC/T8C4om8C6ZmaYvIVWDUQOkx67sWBdYWVPAb/dDt6NaVBoZc
XFmFxKOk7/MXucneGowWE2QYnakX3A4lPeIRT/7olacPQ+42rvfoDdY/CyHX/4B+
E3GnbER9JFtimfDCRriZIsJXbJ/RxFwHDF9vopgsTo34V4fwavHs8LzTQoMF6Uwc
FVIvmnpS7L+N7W8STJYB
=9jyY
-----END PGP SIGNATURE-----

Thank you for your reply,

this is a switched network. I’m doing the packet dump on the SLES 10 box that is running dhcpd with tcpdump -s 65535 -w dumpfile and later inspect it with Wireshark. Usually (with machines that work normally or after boot on problematic machines) I see the normal DHCP traffic. When one of the problematic machines runs dhcpcd during boot, I see nothing at all. It’s like dhcpcd is trying to send DHCPDISCOVER or -REQUEST (as indicated by the progress bar) but actually doesn’t, for some reason.

Again, if I do a network restart after boot, I see the normal DHCP traffic in the packet dumps. The same is the case with dhclient.

What’s really interesting, at home in my private network, I have seen the same effect with both openSUSE 11.1 and 11.3, here the DHCP server is also ISC dhcpd (I don’t know which version right now) on a Linux based embedded IPFire firewall/router. So I’m guessing this might not be caused by our network equipment.

I have looked through the log files, but nothing that I researched indicated any relation with this problem.

The network interfaces of affected machines are “Intel Corporation 82566DM Gigabit Network Connection (rev 02)”, some others have “Intel Corporation 82578DC Gigabit Network Connection (rev 05)” (both on-board, e1000e kernel module). I have also tried to download and compile the latest Intel drivers instead of using the one that ships with openSUSE - to no avail.

I’d be glad for any thoughts/ideas, I have been on this for months now. :slight_smile:

Regards

> How are you doing the LAN trace, meaning with which command on on which
> box? What do you see when you do so? If you are doing it from another
> box are you using a hub or a switch? Does the LAN trace show what you
> expect on a box that is working? Have you tried tracing when just
> restarting the network service? Does it work when using dhclient (even
> though that’s apparently not an option in the long term for you)? Do the
> logs on your system show anything?
>
> Good luck.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On a switched network I would be suspect of the switch just in case it is
being clever somehow. Also, are the machine requesting the IP addresses
only running a single NIC with a single (once assigned) IP or do you have
multiple NICs or are you doing any aliasing on the cards? I’d try to use
a hub between the requesting machine and the switch if possible, or see if
you can set your switch to forward all data to a single connection where
you have a box dedicated to tracing. I’m not trying to say this isn’t a
problem with your machine, but I’ve had switches do quirky things in the
past along these lines and hubs have usually shown what really happened
because of their lack of intelligence.

As a note just for fun, tcpdump can have a ‘0’ (zero) as the parameter
after ‘-s’ to capture all information to save you the trouble of typing
65535 over and over (plus teaching other what 2^16-1 is and why that’s
relevant to tcpdump currently).

Good luck.

On 08/02/2010 08:06 AM, gundelgauk wrote:
>
> Thank you for your reply,
>
> this is a switched network. I’m doing the packet dump on the SLES 10
> box that is running dhcpd with tcpdump -s 65535 -w dumpfile and later
> inspect it with Wireshark. Usually (with machines that work normally or
> after boot on problematic machines) I see the normal DHCP traffic.
> When one of the problematic machines runs dhcpcd during boot, I see
> nothing at all. It’s like dhcpcd is trying to send DHCPDISCOVER or
> -REQUEST (as indicated by the progress bar) but actually doesn’t, for
> some reason.
>
> Again, if I do a network restart after boot, I see the normal DHCP
> traffic in the packet dumps. The same is the case with dhclient.
>
> What’s really interesting, at home in my private network, I have seen
> the same effect with both openSUSE 11.1 and 11.3, here the DHCP server
> is also ISC dhcpd (I don’t know which version right now) on a Linux
> based embedded IPFire firewall/router. So I’m guessing this might not be
> caused by our network equipment.
>
> I have looked through the log files, but nothing that I researched
> indicated any relation with this problem.
>
> The network interfaces of affected machines are “Intel Corporation
> 82566DM Gigabit Network Connection (rev 02)”, some others have “Intel
> Corporation 82578DC Gigabit Network Connection (rev 05)” (both on-board,
> e1000e kernel module). I have also tried to download and compile the
> latest Intel drivers instead of using the one that ships with openSUSE -
> to no avail.
>
> I’d be glad for any thoughts/ideas, I have been on this for months now.
> :slight_smile:
>
> Regards
>
>
>> How are you doing the LAN trace, meaning with which command on on
> which
>> box? What do you see when you do so? If you are doing it from
> another
>> box are you using a hub or a switch? Does the LAN trace show what
> you
>> expect on a box that is working? Have you tried tracing when just
>> restarting the network service? Does it work when using dhclient
> (even
>> though that’s apparently not an option in the long term for you)? Do
> the
>> logs on your system show anything?
>>
>> Good luck.
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJMVtROAAoJEF+XTK08PnB5fIwP/iQ9nnWCo9vwzd8FaK+K1vbM
IRgr/64nlWLv1pA6KvqoWBXecLcqJHtzOX2rCn4J1wzFD1pq5/++P2cdB6FDJHOO
keQt8n7921XMnx1T1CvYww/uIFLw19GbJESRPYBqpvDGXsoxJ6t9KQgoaUx5P9iD
MUJoBpFLzd5RkUz0VGaYs/ki16aDiyrEynf67Fx55rgLVjD867onbB4XRoNR9iCn
346BTg9vDveaog8Vl4PpNUKO1BqxgCzehU8SDk6M5lelJeDOUL6LyGbA65cAY+3+
KGHFhQP6hUbI6YWcPBG4O3I+DPA+7MOk9aIBKq90DELeOEyC9OxFjEQHa7ChNX75
0VyRn1k1lnY5Fmd13HPXTijmhzqTRUP0OH3zQhfEsZTFLCtuKqeApaXSL4MRrp/V
2mhiD5OrzC45iyNvzKgyLexLcE8eRt1nyPfiNGiZQpecPVk2akwvu+/nkCQRbl3h
CnkLtSQAKqs6Nr/q/MIcadD5hXaGrHGNDb96plIOtP8to7MtV0+H4+BGfmI9ixoc
Hhk4/grawmFhKBKmzO+VzlEtvQDTC00cB/evvO9VzgwrePJMS+xEiCjLiG3yrRMC
E2Zlcc27Gb9GkJg+uIg/98MOp2+kXvuOZoO6PF+aKOXiPrTB7CebbePVn02rUrOC
h+cIu0WzEpttzy1wQHsj
=6RQT
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Checking with the comms gurus they said to check for a switch feature
called “portfast”. Google finds quite a bit on it if you are not familiar
with it already.

http://articles.techrepublic.com.com/5100-10878_11-6073805.html

Good luck.

On 08/02/2010 08:21 AM, ab@novell.com wrote:
> On a switched network I would be suspect of the switch just in case it is
> being clever somehow. Also, are the machine requesting the IP addresses
> only running a single NIC with a single (once assigned) IP or do you have
> multiple NICs or are you doing any aliasing on the cards? I’d try to use
> a hub between the requesting machine and the switch if possible, or see if
> you can set your switch to forward all data to a single connection where
> you have a box dedicated to tracing. I’m not trying to say this isn’t a
> problem with your machine, but I’ve had switches do quirky things in the
> past along these lines and hubs have usually shown what really happened
> because of their lack of intelligence.
>
> As a note just for fun, tcpdump can have a ‘0’ (zero) as the parameter
> after ‘-s’ to capture all information to save you the trouble of typing
> 65535 over and over (plus teaching other what 2^16-1 is and why that’s
> relevant to tcpdump currently).
>
> Good luck.
>
>
>
>
>
> On 08/02/2010 08:06 AM, gundelgauk wrote:
>
>> Thank you for your reply,
>
>> this is a switched network. I’m doing the packet dump on the SLES 10
>> box that is running dhcpd with tcpdump -s 65535 -w dumpfile and later
>> inspect it with Wireshark. Usually (with machines that work normally or
>> after boot on problematic machines) I see the normal DHCP traffic.
>> When one of the problematic machines runs dhcpcd during boot, I see
>> nothing at all. It’s like dhcpcd is trying to send DHCPDISCOVER or
>> -REQUEST (as indicated by the progress bar) but actually doesn’t, for
>> some reason.
>
>> Again, if I do a network restart after boot, I see the normal DHCP
>> traffic in the packet dumps. The same is the case with dhclient.
>
>> What’s really interesting, at home in my private network, I have seen
>> the same effect with both openSUSE 11.1 and 11.3, here the DHCP server
>> is also ISC dhcpd (I don’t know which version right now) on a Linux
>> based embedded IPFire firewall/router. So I’m guessing this might not be
>> caused by our network equipment.
>
>> I have looked through the log files, but nothing that I researched
>> indicated any relation with this problem.
>
>> The network interfaces of affected machines are “Intel Corporation
>> 82566DM Gigabit Network Connection (rev 02)”, some others have “Intel
>> Corporation 82578DC Gigabit Network Connection (rev 05)” (both on-board,
>> e1000e kernel module). I have also tried to download and compile the
>> latest Intel drivers instead of using the one that ships with openSUSE -
>> to no avail.
>
>> I’d be glad for any thoughts/ideas, I have been on this for months now.
>> :slight_smile:
>
>> Regards
>
>
>>> How are you doing the LAN trace, meaning with which command on on
>> which
>>> box? What do you see when you do so? If you are doing it from
>> another
>>> box are you using a hub or a switch? Does the LAN trace show what
>> you
>>> expect on a box that is working? Have you tried tracing when just
>>> restarting the network service? Does it work when using dhclient
>> (even
>>> though that’s apparently not an option in the long term for you)? Do
>> the
>>> logs on your system show anything?
>>>
>>> Good luck.
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJMVtV9AAoJEF+XTK08PnB5REoQALn0/tpSB2/derQw/yWjLgZ3
C6bVRIASS1PJ181T5yXN0H6/xJTRS6CWMdQ2RI37x6qfIB3VITMkhHLz3/AzC2Dn
usdqfBf1Qy1acyW9kiMFI88thLHEDw2Qs/Nmjc2F+uHgXCqycql4j+ZJPSawugFD
cVy6RhSlcTxO5asS8GMbZv1JjoUocqeAZNOKfx4UXM/ub7tndZhnjP3qmceyLoYD
EMPCnAW1iSVKJuOyFkZ4h5XNdDerFGStWY/VbvB88GTjNWBfoi06Ps9ucnOqK+2L
axO9/9tDGM7nPfdnuY0S6WD0ztVio6DMLBlvt4jvb4NKqilkY8n2Vb4UGAK6McK5
PJT+xUfiZ4vDCE6zuO7Oa5e7KYbkAM6ZmtDXXD5/9Lh5jHz2W0+E3bm1GDLr2A0c
tv3fSXEhpX64Ql5lMkgTr4eCt2+AjRwVA7OAqWr0Xqt226Tb9j5gnQDpE0Dffpux
nxY2KW/IP37DUy8e1y90uDcl4+NCpY/OmI3YUtWJqtzMRs8rMJImn54x6JD94Jbc
7EfVS85x/XQ+IBc51CC5Cr7JVbkDv+jhXIeRpdZe899c5+KwAAzvBFU4lO1pxTGy
R5k5qe+VOpe9B4L3s6Lr893A2S94aaGFxRkSpJrMLSMwVzbEzIc9c6DLZ4cSAqG+
inWuu2gKdcB1PC7RcKb8
=DQCi
-----END PGP SIGNATURE-----

On 02/08/2010 15:27, ab@novell.com wrote:

> Checking with the comms gurus they said to check for a switch feature
> called “portfast”. Google finds quite a bit on it if you are not familiar
> with it already.
>
> http://articles.techrepublic.com.com/5100-10878_11-6073805.html

“portfast” is usually only a feature found on Cisco and some HP switches.

What make/model switches are you using?

HTH.

Simon
Novell Knowledge Partner (NKP)


Do you work with Novell technologies at a university, college or school?
If so, your campus could benefit from joining the Novell Technology
Transfer Partners (TTP) group. See www.novell.com/ttp for more details.

Hello,

I have been suspecting the switches as well of course. I will try to set up a small test environment as you suggested in the next few days; put the client and a trace box on the same hub in front of the actual switch, or use a mirror port. I will also try some other NICs.

As for “portfast”, we do have HP edge switches that support Spanning Tree Protocol, however we have it globally disabled. Still, I will try to do some tests with this in the next few days as well.

As for the other questions:

  • All of the client machines use only this single on-board interface.
  • Usually their IP address says the same, unless the lease is not renewed for a very long time and another machine grabs the old address.
  • We’re not doing any aliasing, bonding or VLAN tagging on the client machines.

Thank you all for all the suggestions so far, I appreciate that. Also thanks for the tip about tcpdump. I didn’t know that. :slight_smile:

Regards