et0 not (auto) enabled at boot time, no static ip addresses are assigned

Hi all,

I’ve a very ugly issue: I’m running an OpenSuse Leap 15.0 VPS, for years now. Usually rock solid and stable. A few days ago the hosting company has had multiple issues on the host system, so they had to “hard reboot” my VPS a couple of times. They’re using KVM.
Since these multiple reboots, which I thought shouldn’t really be a problem, for whatever reason my network config for the eth0 virtual nic seems to messed, somehow.

What I’ve found out already:

/etc/sysconfig/network/ifcfg-eth0 has STARTMODE=‘auto’

Nevertheless journalctl -b shows eth0 as device-not-running, and systemctl status wickedd-nanny.service says

device eth0: call to org.opensuse.Firewall.firewallUp() failed: DBus method call timed out
eth: failed to bring up device, still continuing

“continuing” forever, VPS will never get eth0 enabled anymore.

Connected via VNC to virtual console I’ve tried three things:

  • started yast, tried to start System/Network Settings: HANGS forever
  • tried ifup eth0: results in device-not-running
  • tried ip link set eth0 up: Works, eth0 is getting enabled. But NO static ip addresses are assigned. Not ipv4, not ipv6.

Compared /etc/sysconfig/network/ifcfg-eth0 from my VPS and my Leap 15 box @home, except the ip addresses they’re the same, so the VPS one is correct.

Any hints how to narrow down this issue? This is really painful, as there’s a postfix instance running on my VPS, and I’m loosing maybe relevant emails because of this :frowning:

Regards,
Michael

One step further, but still no solution:

As described before, no static ip addresses are assigned. And as I was getting DBUS timeouts related to firewalld, I’ve stopped firewalld.

Voila, ifup eth0 works as expected, etho is now “really” up with assigned ips, both v4 and v6. Restarted firewalld immediately after enabling eth0, of course.

Whyever, this manual enablement of eth0 does not survive the next reboot, so something seems to have got corrupted in my firewall settings. Unchanged by me for months, promised :slight_smile:

No idea why firewalld might be able to “block” enabling eth0, mystic …

Any hints would be great, would this be worth to be tracked in bugzilla?

opened bugzilla issue for this: https://bugzilla.opensuse.org/show_bug.cgi?id=1126094

First,
Although this is a networking issue, it’s more specifically networking in a virtualized environment so you’d probably get more targeted help in the Virtualization forum.

You may have to verify that your Cloud Provider has fixed all their problems, that no one else is having a similar problem.
From what you’ve posted you seem to suspect the problem is not that the networking service can’t connect to any other machines during bootup, it’s an internal DBus issue caused by timing.

Generally speaking your description suggests to me you have an image corruption problem, your bootlog should be more descriptive about what dependency is missing. About your firewalld, it’s simply a iptables/ebtables manager, so the actual problem might be related to what firewalld does but not actually firewalld itself.

The first thing I"d probably try is to go into YaST, and flip your Network management to NM, and then back to Wicked, to see if that resets and fixes your bootup

To address that kind of issue would probably require a “repair” but how that might be done depends greatly on how your Provider supports images… ie

  • Is the base image provided to you by the Provider or the result of a fairly normal install? If the image is provided by the Provider, you’ll have to ask them what their supported procedures are to do a repair, otherwise you can try various well known options…
  • If you have access to a virtual optical disk, point to an openSUSE LEAP 15 DVD and go through the repair process…Remember that this essentially returns your system to the day the DVD was launched, so you’ll have to follow up with an update to install all patches and feature updates released since then.
  • run a “zypper dup” relying on your online repos to provide the packages required to re-install your system.

HTH,
TSU

TSU, thank you so much for answering in that detail!

They will never disclose this :slight_smile: But as said: Stopping firewalld immediately enables that eth0 could be enabled, and from then on I can also restart e.g. sshd, and connect. But only until the next reboot…
Plus: Provider offers in parallel to the VPS itself several “rescue systems” (below more about in context), starting a separate VM with full access to your own VPS’s virtual disks, not mounted at first. And all “rescue systems” I’ve tested have no issue with network connectivity, same ipv4/6 settings.

So I strongly believe (now) that my issue is an my-VPS-internal-only issue.

Trying with journalctl -b and dmesg, I can’t really find any error msg or dependency missing msg, except the ones to be expected, like

systemd[1]: nss-lookup.target: Dependency Before=nss-lookup.target dropped
  • same msg on my local well working Leap 15 box.

Same idea I had, but unfortunately I can start yast, but not the System / Network module ( /sbin/yast lan ) hangs after started, infinite. Probably same underlying issue

My VPS was started by me from a provider-preinstalled “basic as possible” system. Leap 42.1 that time, as far as I remember the “server only” install option was added to OpenSuse somewhat later. So some sort of custom install, but very minimum. For whatever software I’ve later added mysef, I’ve had to resolve much more dependencies than on my Leap box at home. Which is as it should be, for s server, IMHO.

Distribution upgrades have been made by me all the way, from 42.1 to now Leap 15.0. Smaller issues, but solvable in short time. More zypper dup issues on my box at home :slight_smile:

The provider offers several different rescue systems, probably widely known “SystemRescueCD”, a “Debain 9 Live” system. Plus Clonezilla.

I did very frequent backups by rsnapshot(underlying rsync), triggered from my machine at home, fetching all data (files, along with Percona online backups of all mysql databases). So I have all relevant files before the corruption at hand.
I also made once, to test, a clonezilla based backup, worked fine, so I also could “take” the VPS, get it to my local Leap box, an run it in qemu.

But currently I still have the hope that this issue might be related to corrupted config files, caused by the “hard reboots”.

I’m currently trying to compare alle (hopefully) relevant files (/etc/wicked, /etc/sysconfig/network, /etc/firewalld) by size and content between latest backup and current vps.

Until now no differences found…

Because you say that you can enable networking by executing an “ifup” I suspect that whatever problem that is happening during boot is resolved by the time you can log in, which suggests to me it’s not a problem with a config file but a problem in the system boot parallellism which is a bit of a mystery. From what I understand and conjecture, the boot process has various “waypoints” or critical nodes in the flow where no matter what order the components are loaded up to then, have to be completed before moving beyond that point. Something has happened that networking requirements cannot complete before starting networking, so networking can’t start during boot.

You may have to have an in-depth discussion with your Provider on your Repair options, you need to know best to repair your image. This is not the same as the boot repair disks you describe, which are mainly used to recover when a system won’t boot at all.

You may want to think about rebuilding your system from scratch, and this is an example of why I am an advocate of creating a Build script that memorializes what is installed in your machine and how it’s configured, this was asked about in a recent Forum post

https://forums.opensuse.org/showthread.php/534959-IDEA-Configuration-scripts-for-rpm-s?p=2894435#post2894435

You can also take a look at your available backups, whether you can restore your system without also restoring your problems, in particular how usable are your backups that were made before your Provider experienced their disaster?

You should also ask your Provider if under the circumstances they can provide you with a VPS to recover or rebuild your system without destroying your existing VPS, then when you have a replacement to your satisfaction you can simply “flip a switch” to replace old with new.

TSU

SOLVED!

Documented in my bug report here: https://bugzilla.opensuse.org/show_bug.cgi?id=1126094.

Reason was a “stale” lock file, left over by a process called ebtables, when my VPS “crashed” the last time.

File is /var/lib/ebtables/lock.
Deleted.
Done.
Sometimes the most painful errors have such small reasons, a pity that they’re that hard to find :\

@TSU: Thank you very, very much, again!

Congrats on finding your problem.

Although I don’t know for sure in your case,
lck files are typically found in virtual machines, not installations on metal and are more typical of memory backing files which have to be deleted before a virtual machine which had a dirty shutdown can be booted back up again.

TSU

After reading your referenced RHEL Bug report and thinking about it a bit…
I wonder if this lock method and file is because of firewalld’s runtime vs permanent settings modes so is a fundamental flaw in firewalld design.

Someone may want to evaluate whether this could be a show-stopper moving forward or something sufficiently addressed by better documentation. It does look like they tried to address the problem but still assumes that systemd would “behave properly.” I’d also suspect that they probably implemented their flock solution incorrectly, it’s probably invoked as a normal method instead of as a recovery, ie a default behavior when a lock is encountered… But of course that should be evaluated based on the real reason the lock is implemented (because I’m guessing on that point) so as not to possibly open a security hole.

Speculating,
TSU

A tad late to the action in this thread, I’d just like to mention a third option here: systemd-networkd.

As most who install openSUSE, I started off five years ago with wicked by default, and I experienced 5 to 10 seconds of delay while booting into KDE/Plasma. Apparently, wicked and my DSL router misunderstood each other when it came to IPv6 and/or DHCP.

I then switched to NetworkManager (NM), and boot times improved instantly: NM only needed around 100 milliseconds to bring up eth0 with a DHCP-assigned dynamic IPv4 address. I also tried reserving a static IP address in my DSL router and bypass DHCP, but this didn’t improve things noticeably. My impression was that without DHCP and manually assigned DNS servers, name resolution even seemed a bit more sluggish than with DHCP enabled. Further experiments with IPv6 lead me to the conclusion to leave IPv6 disabled system-wide. My NM config hadn’t changed for over 2 years.

Two weeks ago then, I tested out systemd-networkd for the first time, motivated by the verbosity of NM messages (and by curiosity; ok, and by the prospect of being able to free up a bit of disk space occupied by NM).
Here’s what I did:

  • systemctl stop NetworkManager.service

as root

  • systemctl disable NetworkManager.service

  • systemctl mask NetworkManager.service

  • mv -i /etc/sysconfig/network/ifcfg-eth0 /root/off/etc_sysconfig_network_ifcfg-eth0

back up any previous eth0 configs

  • cp -i /etc/resolv.conf /root/off/etc_resolv.conf

back up DNS stuff for reference

  • vi /etc/systemd/network/50-static.network

create a static entry for eth0, see contents below

  • add to /etc/resolv.conf the nameservers
    my ISP recommends
  • make sure that any DHCP functionality is disabled in /etc/sysconfig/network/dhcp
  • add the names of my openSUSE Leap 15.0 main rig and of my DSL router to /etc/hosts, including their static ipv4 addresses, very old-school :wink:
  • sudo systemctl enable systemd-networkd
  • uninstall NetworkManager and ModemManager (agents, CLI and GUI tools as well as NM widgets)
  • create a new initrd with dracut --hostonly --force --no-compress --omit “img-lib cifs fcoe fcoe-uefi rdma multipath iscsi qemu lvm mdraid dm dmraid cdrom pollcdrom plymouth btrfs wacom convertfs wicked ipv6 mtp-probe”
    (YMMV, your mileage may vary), just to make sure that no artifacts of wickedd or NM mess up the start of the newly configured system
  • reboot

Contents of my above mentioned /etc/systemd/network/50-static.network (again, YMMV):

[Match]
Name=eth0

[Network]
Address=192.168.1.2/24
Gateway=192.168.1.1

Result: systemd now manages to bring up my eth0 in 19ms (for comparison: NM was around 100ms, wicked 5s to 10s), and it works quietly and flawlessly. Together with other optimizations (one single SSD-aligned ext4 partition for everything; MBR instead of GPT; minimal KDM instead of SDDM; exim for postfix; chronyd for ntpd; no Plymouth, no btrfs, no compressed initrd’s etc.), my boot times according to systemd-analyze are around 1.5 seconds now; personal best was 1.319s five days ago.

For completeness, add to that about 2s of BIOS initializations (testing RAM, building hardware tree, doing ACPI stuff) and about a second of KDM+KDE+Plasma startup. Not bad for a five years old Core-i5 rig.

Conclusion: for minimal static IP configs (usually found in virtualized guest operating systems, but apparently perfectly suited for home/desktop/workstation use as well), I can recommend networkd. Give it a try. Cheers!

@unix111:

Did you try the DHCP option? – the “systemd.network” man page:


       Example 2. DHCP on ethernet links

           # /etc/systemd/network/80-dhcp.network
           [Match]
           Name=en*

           [Network]
           DHCP=yes

       This will enable DHCPv4 and DHCPv6 on all interfaces with names starting with "en" (i.e. ethernet interfaces).

The man page also mentions “bridge” configurations – for VMs? …

Does anyone know anything about openSUSE’s position with respect to the systemd resolved service?

It’s not installed with Leap 15.0 and, there doesn’t seems to be a “standard” package available.
It’s mentioned in the man pages, there’s an ArchWiki article about it and, there seems to be some issues around DNS resolve misbehaviour …

Not yet, but I have little doubt that it is functional; from what I’ve read, systemd is supposed to have basic networking functionality for most standard scenarios, including bonding/bridging/TUN/TAP/VLAN stuff for real servers and for VM guests. Look up the manpage for systemd.netdev (5) to get further examples.

I have been wondering that myself, and a small part of me suspects that SuSE nefariously want to set their SLES (enterprise server distro) apart from openSUSE by excluding the latter from certain server-savvy functionality. On the other hand, why would they? I’d really like to find out myself.

erlangen:~ # zypper if nss-resolve
Loading repository data...
Reading installed packages...


Information for package nss-resolve:
------------------------------------
Repository     : Haupt-Repository (OSS)                                   
Name           : nss-resolve                                              
Version        : 239-5.1                                                  
Arch           : x86_64                                                   
Vendor         : openSUSE                                                 
Installed Size : 321.3 KiB                                                
Installed      : Yes                                                      
Status         : up-to-date                                               
Source package : systemd-239-5.1.src                                      
Summary        : Plugin for local hostname resolution via systemd-resolved
Description    :                                                          
    This package contains a plug-in module for the Name Service Switch
    (NSS), which enables host name resolutions via the systemd-resolved(8)
    local network name resolution service. It replaces the nss-dns plug-in
    module that traditionally resolves hostnames via DNS.

    To activate this NSS module, you will need to include it in
    /etc/nsswitch.conf, see nss-resolve(8) manpage for more details.

erlangen:~ # systemctl status systemd-resolved 
● systemd-resolved.service - Network Name Resolution
   Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:systemd-resolved.service(8)
           https://www.freedesktop.org/wiki/Software/systemd/resolved
           https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
           https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
erlangen:~ #

It’s only offered for Tumbleweed – nether Leap 15.0 nor SLE-15 are listed – except for a couple of private builds …

AFAICS it ain’t offered, yet for SLE-15.
BTW, comparing SUSE to Red Hat, Red Hat’s CentOS is updated to a new version at least a month after the update was applied to RHEL, especially with respect to Kernel versions (CentOS still hasn’t got a Kernel which remotely supports the AMD Ryzen APUs) …

  • With SUSE it’s the other way around: first Tumbleweed, then openSUSE, then SLE …

They are trying hard to streamline development: Developing SLE, Factory and Leap distributions at the same time, impossible? https://www.youtube.com/watch?v=zpZ4HCP-VJ0

I considered setting up systemd-networkd several times, but postponed the change mostly due to lack of knowledge. The above recipe being pretty intimidating, I tried the following shorter version:

  1. Yast2 Network Settings > Global Options > Network Setup Method: Network Services Disabled
  2. Create /etc/systemd/network/20-wired.network
  3. Remove link /etc/resolv.conf and create file with the same name
  4. Run systemctl enable systemd-networkd.service

Contents of files:

### /etc/resolv.conf file autogenerate
nameserver 192.168.178.1
erlangen:~ # 
erlangen:~ # cat /etc/systemd/network/20-wired.network
[Match]
Name=enp0s31f6

[Network]
Address=192.168.178.30/24
Gateway=192.168.178.1

erlangen:~ # 

Logs are pretty terse:

erlangen:~ # journalctl -b -u systemd-networkd.service --output short-precise
-- Logs begin at Mon 2019-02-11 16:20:00 CET, end at Fri 2019-03-15 07:57:12 CET. --
Mar 15 00:06:58.514429 erlangen systemd[1]: Starting Network Service...
Mar 15 00:06:58.550526 erlangen systemd-networkd[522]: Enumeration completed
Mar 15 00:06:58.550783 erlangen systemd[1]: Started Network Service.
Mar 15 00:06:58.904311 erlangen systemd-networkd[522]: wlan0: Interface name change detected, wlan0 has been renamed to wlp3s0.
Mar 15 00:06:59.104308 erlangen systemd-networkd[522]: eth0: Interface name change detected, eth0 has been renamed to enp0s31f6.
Mar 15 00:06:59.141949 erlangen systemd-networkd[522]: lo: Link is not managed by us
Mar 15 00:06:59.141956 erlangen systemd-networkd[522]: wlp3s0: Link is not managed by us
Mar 15 00:07:06.028149 erlangen systemd-networkd[522]: enp0s31f6: Gained carrier
Mar 15 00:07:07.896552 erlangen systemd-networkd[522]: enp0s31f6: Gained IPv6LL
Mar 15 00:07:09.304621 erlangen systemd-networkd[522]: enp0s31f6: Configured
erlangen:~ # 

erlangen:~ # systemd-analyze blame|grep network
            34ms systemd-networkd.service
erlangen:~ # 

Hehe, I know, I surely did some superfluous things there. In the case of wicked/NM/networkd/IPv6/DHCP, I tried quite a few variations over months and years, and while wanting to log possibly important changes, I also try to avoid the debris of those trial-and-error sessions cluttering up my system.

One one hand, I really do like how YaST manages things overall, and I know I can trust the tool to do its job (or, better, its many many jobs).
On the other hand, I enjoy micromanaging configs and settings, trying out things as close to the metal as possible, learning stuff about the guts of Linux, dracut, systemd — even of YaST2 itself since it’s largely re-implemented in Ruby, my programming language of choice. The approaches I end up taking are often those weird mixtures of manually hacking part of it while also relying on automatisms built into the tools mentioned. And, after reaching a satisfying resolution, I often wonder how much more easily I could have experimented within a virtual guest system, and with how much less risk, while learning possibly new tricks about virtualization.

One of the skills I need to improve a lot (a skill which I think you, Karl, and many other regular contributers here are really good in practicing) is boiling it all down to the steps actually needed in order to achieve any given task. Cheers!

With SSDs becoming very affordable, I bought a Samsung 850 EVO 500GB and use it for a backup version of Tumbleweed and other distributions. Thus I always have pristine instances of the latest versions.

One one hand, I really do like how YaST manages things overall, and I know I can trust the tool to do its job (or, better, its many many jobs).
YaST2 network configuration is easy and very reliable. However frequent updates and booting other OSs revealed some glitches. For my desktop machine directly connected to a FRITZ!Box it is overkill. As a physicist I always go for a simple and good enough solution. I was skeptical about networkd but I learned otherwise from your post.:wink: Thus I gave it a try. Wired connections worked on the first try. Wireless was a little more difficult. But a properly configured WPA Supplicant is little effort and also worked on the first try.

One of the skills I need to improve a lot (a skill which I think you, Karl, and many other regular contributers here are really good in practicing) is boiling it all down to the steps actually needed in order to achieve any given task. Cheers!
With UEFI multi booting is very easy. I use the additional systems for testing.

Sorry to come back to this. I hope some of the contributers above will pick this up.

What about IPv6 when switching from Wicked (configured using YaST) to systemd-network?

Using Wicked with no DHCP. It seems that my IPv6 addresses are created automaticaly (I do not configure anything except that it is switched on in YaST):

boven:~ # ifconfig -a 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.154  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 2001:980:91a0:1:f123:fe20:4098:ffa8  prefixlen 64  scopeid 0x0<global>
        inet6 2001:980:91a0:1:ee8e:b5ff:feda:d0d  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::ee8e:b5ff:feda:d0d  prefixlen 64  scopeid 0x20<link>
        ether ec:8e:b5:da:0d:0d  txqueuelen 1000  (Ethernet)
        RX packets 20331  bytes 16911334 (16.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18457  bytes 2883747 (2.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 457  bytes 3332068 (3.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 457  bytes 3332068 (3.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        ether 94:e9:79:76:e6:cd  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

boven:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether ec:8e:b5:da:0d:0d brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.154/24 brd 10.0.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2001:980:91a0:1:f123:fe20:4098:ffa8/64 scope global temporary dynamic 
       valid_lft 6259sec preferred_lft 3230sec
    inet6 2001:980:91a0:1:ee8e:b5ff:feda:d0d/64 scope global mngtmpaddr dynamic 
       valid_lft 6259sec preferred_lft 3230sec
    inet6 fe80::ee8e:b5ff:feda:d0d/64 scope link 
       valid_lft forever preferred_lft forever
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 94:e9:79:76:e6:cd brd ff:ff:ff:ff:ff:ff
boven:~ # 

Will this “automatic” creation of IPv6 addresses also happen while using systemd as described above?