NIC resetting after sys brd replacement

Theo11 · September 16, 2011, 5:30pm

I’m having an issue of the NIC not being recognized as eth0 after a system board is replaced, with the same hdd installed. eth0 is required for our setup, and this breaks dhcp halting all accessibility for the machine.

These systems are remote, and it is cumbersome to walk a non technical end user (or technician) through the commands to manually reset it, not to mention providing the individual a password. I have a shortcut workaround in place, but I want to determine the root cause and fix the issue at the source.

Any thoughts? Please let me know what further information would be necessary to diagnose.

DenverD · September 16, 2011, 6:03pm

On 09/16/2011 05:36 PM, Theo11 wrote:
> Any thoughts?

sorry i can’t actually help, but:

-was NIC damaged during MB swap?

-does MB has a jumper-switch locking out the NIC?

-is nic locked out in BIOS?

-does the MB need a BIOS update?

-is the nic correctly seated?

-does the correctly seated NIC have greasy fingerprints on contacts?

-is MB is correctly grounded?

-is the NIC in different slot? (hmmmm…could that make a difference?)

-was external cable plugged into nic damaged during MB swap?

-is the external cable is plugged into (the correct) nic?

i wonder:
-does this machine have multiple NICs
-if yes, did they swap places during MB swap
-if yes, do(es) the other(s) work
-what operating system and version is the machine running
-what is the error seen when the NIC is “not being recognized as eth0”
-is the NIC being recognized as another (fill in blank eth__)
-is the system seeing the NIC at all?
-was the exact same MB swapped in? (with the same chips)?

–
DD
Caveat
openSUSE®, the “German Automobiles” of operating systems

Theo11 · September 16, 2011, 8:44pm

The NIC is on the new board, so this would be a new MAC address that SUSE cannot assimilate. The current workaround copies the config from eth0 to an incremented interface (eth1, eth2, etc.) and restarts network services. The problem is this creates “one-offs” and a preferred environment would have consistency across all systems for simplicity of administration/support.

I’ll answer these in order…

NIC undamaged, this is actually a recurring problem every time a technician swaps the motherboard. This has happened each (and every) time.
no MB jumper
NIC not locked out in BIOS, and this is a SUSE Linux supported version of the BIOS on the IBM hardware it’s running on
MOBO has known working and supported version of BIOS for platform/OS.
NIC is correctly seated; its on the new MB
MOBO is correctly grounded, and installed by a certified IBM technician
NIC resides on MB, sorry I should have specified: new NIC with new MB as replacement system board
no damage (again, not an isolated issue)
cable is coming from same switch port, same cable, going into (new) NIC/MB.

The machine does not have multiple NIC’s on board, but as stated this is a new NIC and therefore new MAC that SUSE will not automatically associate to eth0 even though the previous NIC no longer exists on the system. Machine is running SLES 11 SP1. The NIC is not recognized at all, that’s the problem. The system simply does not see the NIC until the I (or the script workaround) copies the configs from eth0 to eth1 and then restarts network services.

Again, I have a temporary fix, I’m looking for root cause so i can isolate the issue and fix it at the source.

knurpht · September 16, 2011, 10:56pm

Remove the “old” card in Yast, configure the new one. Since the new one has a different MAC addresses, this is necessary.

deano_ferrari · September 17, 2011, 1:05am

Further to Knurpht’s advice, /etc/udev/rules.d/70-persistent-net.rules contains rules for any previous ethernet card previously installed. Use YaST to reconfigure. It should also be possible to simply to remove the old rule, and edit the one pertaining to the new hardware, adjusting it to associate as ‘eth0’, then reboot. In fact, removing all uncommented rules, then rebooting should be enough to have the required rule automatically created at boot time (udev).

ken_yap · September 17, 2011, 2:50am

The NIC is on the new board, so this would be a new MAC address that SUSE cannot assimilate.

The MAC address is on the NIC so changing the mobo wouldn’t change that. It’s when you put in a new NIC that the MAC address changes.

As deano_ferrari explains, this is due to a udev rule. If you have access to the console, e.g. remote console, delete the old rule and the next boot will create a new one. Otherwise do it manually.

DenverD · September 17, 2011, 10:28am

i’d bet a dime to a doughnut that the OP is not running openSUSE, and
ask Theo to please show us the terminal output from


cat /etc/SuSE-release

the correct forum for SUSE Linux Enterprise Server or Desktop (SLES/D)
is over here: http://tinyurl.com/422mrnu (sign in ID/Pass here, works
there)

which is not to say you can’t get help here, just be advised that most
(almost all?) here have never run SLE …and, while there is kinship
between the two, SLES/D and openSUSE are absolutely NOT identical twins…

and, the folks with the most knowledge of SLE don’t routinely read here…

ps: luckily, the answer givers you have attracted are conversant in both
the enterprise and open versions…

–
DD
openSUSE®, the “German Automobiles” of operating systems

Theo11 · September 19, 2011, 3:13pm

@Knurpht - it was thought best to remove Yast from the OS for this particular KIWI’d image of SLES; as many of these machines are somewhat public facing. Though password and employee protected, most GUI tools were removed for added security. I will do some research to see what commands are run by Yast for those functions and implement that on a test box.

@deano_ferrari - I will implement your proposed solution as well and replace the MB on a test system, find out if that solution works, and report back.

@ken_yap - the NIC is integrated on the sys brd being replaced, so it is in fact a new MAC address.

@DenverD - I am in fact running SLES, I will make a point to direct future inquires correctly, thank you.

???@??? :~> cat /etc/SuSE-release
Linux Enterprise Server 11 (i586)
VERSION = 11
PATCHLEVEL = 1

Thank you to all, it is quite nice to have such a response to a problem that has befuddled me. Hopefully your advice will guide me to a solution =)

DenverD · September 19, 2011, 4:45pm

On 09/19/2011 03:16 PM, Theo11 wrote:
> Thank you to all, it is quite nice to have such a response to a problem
> that has befuddled me. Hopefully your advice will guide me to a solution
> =)

i think we all enjoyed it also…so, keep in touch. (just note that
sometimes a new user drops in on Tuesday, and is answering questions on
Thursday…between games…)

(it kinda nice every once in a while to get something harder to deal
with than: “How do i change the background”? or "Linux ate my RAM!!!)

–
DD
openSUSE®, the “German Automobiles” of operating systems

please_try_again · September 20, 2011, 5:04am

Why is there a lsb_release command (normally on all lsb compliant distros) … and why did I (well … indirectly) submit a bug report to help the openSUSE team fixing the missing codename (Celadon) if people are advised to use cat /etc/SuSE-release instead?

**# lsb_release -a**
LSB Version:    core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description:    openSUSE 11.4 (x86_64)
Release:        11.4
Codename:       Celadon

**# zypper info lsb-release**
Repository: @System
Name: lsb-release
Version: 2.0-9.1
Arch: noarch
Vendor: openSUSE
Installed: Yes
Status: up-to-date
Installed Size: 17.0 KiB
Summary: Linux Standard Base Release Tools
Description: 
Tools from the Linux Standard Base project to determine the used distribution

Notice that the command is lsb_release and the package lsb-release. I think the package is installed by default on 11.4 (it wasn’t in earlier versions).

please_try_again · September 20, 2011, 5:21am

It will work for sure because it is the solution (Most Linux systems, including openSUSE - not all though - write this static udev rule during the installation). Everytime you replace a netcard (whether it is integrated on the MB or in a extension card) eth get incremented. Thus, if you replace your netcard 99 times, you’ll end up with device eth99. Your setup is wrong if it defaults to eth0. You might use something like that to find out the device name used for the nic (which carries the default route):

route | awk '/default/ { print $NF}

That’s what I’m using in scripts. Editing the udev rule will fix the problem in your case.

Fedora since version 15 has uses a different method: Features/ConsistentNetworkDeviceNaming - FedoraProject

DenverD · September 20, 2011, 5:58am

On 09/20/2011 05:06 AM, please try again wrote:
> Why is there a lsb_release command (normally on all lsb compliant
> distros) … and why did I (well … indirectly) submit a bug report to
> help the openSUSE team fixing the missing codename (Celadon) if people
> are advised to use cat /etc/SuSE-release instead?

i am confused by your statement which seems to say that i shouldn’t use
the cat, and should use the lsb_release ??

but, the ‘why’ here is because the ‘people’ sitting in my chair have
been using one of the following two for years (and, i didn’t get the
notice saying i needed to change to something else for some reason):


cat /etc/SuSE-release
cat /etc/issue

i just checked and see that both of those still work (and return an
easy/quick to read answer almost instantaneously)…and like so many
things in linux which have multiple ways of doing the same thing, i find
hard to say that “cat /etc/SuSE-release” is bad and “lsb_release -a” is
good (or even better)…but, i’ll listen as you do…

however, i like the following better than just with the -a switch
(because it give a clean easy to read output):


lsb_release -sd |cut -f2 -d ""\"

do you think it would it be ok if the people i ask are given any of
these that i might pick:


cat /etc/SuSE-release
cat /etc/issue
lsb_release -sd |cut -f2 -d ""\"
lsb_release -a
zypper info lsb-release

unless you know a reason to not use some of those i’d like the freedom
to pick and choose according to how hard or easy i wanted to make it on
the other party…that is, “cat /etc/issue” is easier to type than
lsb_release -sd |cut -f2 -d “”" but that is so much easier to read
than “lsb_release -a” (and of course most would copy/paste, if possible)
and the also hard to read “zypper info lsb-release” takes longer to
process, and . . . . . )

i noticed you used both of the last two as root, but that doesn’t seem
required here…

–
DD
openSUSE®, the “German Automobiles” of operating systems

please_try_again · September 20, 2011, 10:36am

What makes you think that? The “#” in the code I posted? I can type a $ next time or a pound if I happen to find one (no chance on my keyboard). Anyway, I didn’t mean root as I wrote ‘#’ in front of my command but I guess I should quit this bad habit.

zypper info lsb-release is not relevant. I just wanted to show that lsb-release was installed.

Actually lsb_release is intended to be a standard tool. There is no /etc/SuSE-release on Fedora or Mandriva, and those *release files have different names. Trying to find out the release when you don’t know which Linux is running is very tricky and would require long pipes without a standard tool like lsb_release.

But be my guest, if you prefer to use cat. You can even use dog (from the contrib repo) if you like.
That would work too:

$ dog /etc/SuSE-release

And notice the ‘$’ sign this time!

DenverD · September 21, 2011, 3:15pm

On 09/20/2011 10:46 AM, please try again wrote:

> Actually lsb_release is intended to be a standard tool. There is no
> /etc/SuSE-release on Fedora or Mandriva, and those *release files have
> different names. Trying to find out the release when you don’t know
> which Linux is running is very tricky and would require long pipes
> without a standard tool like lsb_release.

AH!!! of course…now i understand why you said what you did! so, if
i one day assume that a generic Linux (non-suse) version is running, i
will ask for a lsb_release output!!

thanks for the info/education…

> And notice the ‘$’ sign this time!

ah…i was kinda wondering if my system was wrongly accepting a simple
user to so interrogate exactly what is running, in such detail…

hmmmmm, if a cracker gains access to a user account s/he might then use
lsb_release (and uname, and others) to learn exactly what is running,
and then attack known distro-specific weaknesses…[makes my head hurt]

–
DD
openSUSE®, the “German Automobiles” of operating systems

Theo11 · September 21, 2011, 6:59pm

SOLUTION:

I created a script that runs on startup to check if /etc/udev/rules.d/70-persistent-net.rules contains the pattern “eth1” and deletes that file if so, then reboots (auto re-creating the file on reboot as stated in the beginning of the file).

I also tried to have the script delete the file, then run /lib/udev/write_net_rules to to recreate the file, to avoid a complete system reboot, but this way only works sometimes, and generates the error “missing $INTERFACE” other times.

Thanks again everyone.

please_try_again · September 22, 2011, 5:46am

Just make him/her believe the machine is a 486 running MS DOS!

please_try_again · September 22, 2011, 6:00am

IMO, you could simply delete (move) this rule! It is not required and is missing on some distros. If you have only one ethernet device, it should be OK. If there are more than one, you need a static rule or they might appear each time in a different order (and get a different name).