I have a 1U server from 2008 that has been running various versions of openSUSE for 9 years. I recently did a fresh install of 2017-07-05 .iso of 42.3 and it was working fine. Two days ago I did a zypper update and rebooted and the ethernet disappeared. hwinfo lists the NICs as 80003ES2LAN. I did “journalctl -k | egrep e1000e” and found “probe failed with error -2”. yast2 network shows the two NICs, but as greyed out and I am unable to edit them. I tried “rmmod” and “modprobe e1000e” again, but it gave the same error. Rebooting did not help.
I would post exact log snippets and the like, but without an ethernet, I would have to use flash drives to move the data here. I would first like to see if someone has any simple fixes to try.
Thanks for the suggestion. I tried that, and it didn’t help. I tried a newer version of openSUSE 42.3, and that didn’t work. I then downgraded to openSUSE 42.2, and still neither of the two e1000e ports works. I am beginning to suspect that something like the firmware update that I think I saw in a recent zypper update may have been the problem. Does that sound plausible to anyone?
Did you try cold restart (shutdown; unplug mains; if you have notebook, remove battery for several minutes)? I have at least once experience problems with LAN port after warm reboot.
I do not think firmware is ever updated (meaning - flashed to non-volatile RAM) automatically. It is possible that newer firmware got installed that was incompatible with your card and picked by driver; check when files were updated, try to remove the latest ones if they were installed around this time.
Of course it could simply be coincidence and port really died. Did you try to boot any other live distro (knoppix, systemrescuecd) and verify whether LAN works there?
I’ve met this twice on the same machine, never found what actually went wrong, but my solution was from here Gentoo Forums :: View topic - e1000e Problem [SOLVED] , did what arvidjaar did and the NIC was back working again. Happened one more time, after that never again. No traces in logs re. the device
Thank you for the suggestion. This is a 1U server, so no battery. It does happen after a cold restart.
Yes, I considered that, but the server has two Ethernet ports, and both are dead in the same way. Of course the failure could be in an ASIC that provides both ports.
I tried an openSUSE NET install CD (e1000e fails in the net installer) and my old openSUSE 42.2 DVD. I may try another live distro, as you suggest.
I used a USB flash drive to transfer the “journalctl -k” output to a machine with internet access. It is below:
Jul 24 20:01:37 maple kernel: i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
Jul 24 20:01:37 maple kernel: shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
Jul 24 20:01:37 maple kernel: EDAC MC: Ver: 3.0.0
Jul 24 20:01:37 maple kernel: intel_powerclamp: No package C-state available
Jul 24 20:01:37 maple kernel: intel_powerclamp: No package C-state available
Jul 24 20:01:38 maple kernel: EDAC MC0: Giving out device to module i5000_edac.c controller I5000: DEV 0000:00:10.0 (POLLED)
Jul 24 20:01:38 maple kernel: EDAC PCI0: Giving out device to module i5000_edac controller EDAC PCI controller: DEV 0000:00:10.0 (POLLED)
Jul 24 20:01:38 maple kernel: pps_core: LinuxPPS API ver. 1 registered
Jul 24 20:01:38 maple kernel: pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
Jul 24 20:01:38 maple kernel: PTP clock support registered
Jul 24 20:01:38 maple kernel: gpio_ich: ACPI BAR is busy, GPI 0 - 15 unavailable
Jul 24 20:01:38 maple kernel: gpio_ich: GPIO from 462 to 511 on gpio_ich
Jul 24 20:01:38 maple kernel: e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
Jul 24 20:01:38 maple kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Jul 24 20:01:38 maple kernel: iTCO_vendor_support: vendor-support=0
Jul 24 20:01:38 maple kernel: iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
Jul 24 20:01:38 maple kernel: iTCO_wdt: Found a 631xESB/632xESB TCO device (Version=2, TCOBASE=0x1060)
Jul 24 20:01:38 maple kernel: iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.0 eth0: (PCI Express:2.5GT/s:Width x4) 00:30:48:62:9c:18
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.0 eth0: Intel(R) PRO/1000 Network Connection
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.0 eth0: MAC: 5, PHY: 5, PBA No: 2050FF-0FF
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.1: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.1 eth1: (PCI Express:2.5GT/s:Width x4) 00:30:48:62:9c:19
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.1 eth1: Intel(R) PRO/1000 Network Connection
Jul 24 20:01:38 maple kernel: e1000e 0000:04:00.1 eth1: MAC: 5, PHY: 5, PBA No: 2050FF-0FF
My log contains very similar messages to the excerpt at the link you cite (“Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode” and “error -2”). Unfortunately shutdown -h, unplugging, replugging, and poweron doesn’t fix it in my case.
Not a solution as such, but I wonder if a warm-reboot of the device (removing the device and re-scaninng the bus again) might change the behaviour here?
Thank you for the suggestion! After doing a warm-reboot of 0000:04:00.0 and 0000:04:00.1, eth1 exists and works (I didn’t try eth0). See below, where the “error -2” is gone. This is helpful, but for a colocated server, this isn’t a complete solution, as you said. Any thoughts on what I could try to effect a complete solution? Any idea what might have changed to cause this to start to happen in openSUSE?
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: [8086:1096] type 00 class 0x020000Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: reg 0x10: [mem 0xd8020000-0xd803ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: reg 0x14: [mem 0xd8000000-0xd801ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: reg 0x18: [io 0x2000-0x201f]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: PME# supported from D0 D3hot D3cold
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: [8086:1096] type 00 class 0x020000
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: reg 0x10: [mem 0xd8060000-0xd807ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: reg 0x14: [mem 0xd8040000-0xd805ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: reg 0x18: [io 0x2020-0x203f]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: reg 0x30: [mem 0x00000000-0x0000ffff pref]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: PME# supported from D0 D3hot D3cold
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: BAR 0: assigned [mem 0xd8000000-0xd801ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: BAR 1: assigned [mem 0xd8020000-0xd803ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: BAR 0: assigned [mem 0xd8040000-0xd805ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: BAR 1: assigned [mem 0xd8060000-0xd807ffff]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: BAR 6: assigned [mem 0xd8080000-0xd808ffff pref]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: BAR 6: assigned [mem 0xd8090000-0xd809ffff pref]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.0: BAR 2: assigned [io 0x2000-0x201f]
Jul 26 16:42:27 maple kernel: pci 0000:04:00.1: BAR 2: assigned [io 0x2020-0x203f]
Jul 26 16:42:27 maple kernel: pci 0000:01:00.3: PCI bridge to [bus 05]
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.0 eth0: (PCI Express:2.5GT/s:Width x4) 00:30:48:62:9c:18
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.0 eth0: Intel(R) PRO/1000 Network Connection
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.0 eth0: MAC: 5, PHY: 5, PBA No: 2050FF-0FF
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.1: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.1 eth1: (PCI Express:2.5GT/s:Width x4) 00:30:48:62:9c:19
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.1 eth1: Intel(R) PRO/1000 Network Connection
Jul 26 16:42:27 maple kernel: e1000e 0000:04:00.1 eth1: MAC: 5, PHY: 5, PBA No: 2050FF-0FF
Yes it is a workaround and I’m not sure what is the root cause of this might be. A bug report is the best course of action so it can be investigated and fixed. Some more rigorous testing (eg kernel bisecting) may be required.
Anyway, for now you could automate this workaround by including the commands in a boot script. Adding them to /etc/init.d/boot.local might be ok for this purpose. Make sure that rc-local.service is enabled and started as well.