" e1000e: Detected Hardware Unit Hang repeatedly" shows in logs - temporary loss of conectivity

Hello!
I’ve updated a machine today from an ancient CentOS distribution to the latest openSUSE, 13.2.
Everything worked fine except i got regular network failures (every minute or so, depending on network traffic). The network would go down and the back up again in about two seconds.
Upon checking journalctl, i’ve found the following:

Dec 03 22:48:16 gw1 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
TDH <23>
TDT <35>
next_to_use <35>
next_to_clean <21>
buffer_info[next_to_clean]:
time_stamp <10047c8ff>
next_to_watch <23>
jiffies <10047cbb5>
next_to_watch.status <0>
MAC Status <802a3>
PHY Status <792d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 03 22:48:18 gw1 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
TDH <23>
TDT <35>
next_to_use <35>
next_to_clean <21>
buffer_info[next_to_clean]:
time_stamp <10047c8ff>
next_to_watch <23>
jiffies <10047cda9>
next_to_watch.status <0>
MAC Status <802a3>
PHY Status <792d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 03 22:48:20 gw1 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
TDH <23>
TDT <35>
next_to_use <35>
next_to_clean <21>
buffer_info[next_to_clean]:
time_stamp <10047c8ff>
next_to_watch <23>
jiffies <10047cf9d>
next_to_watch.status <0>
MAC Status <802a3>
PHY Status <792d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 03 22:48:22 gw1 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
TDH <23>
TDT <35>
next_to_use <35>
next_to_clean <21>
buffer_info[next_to_clean]:
time_stamp <10047c8ff>
next_to_watch <23>
jiffies <10047d191>
next_to_watch.status <0>
MAC Status <802a3>
PHY Status <792d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 03 22:48:22 gw1 kernel: e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
Dec 03 22:48:25 gw1 kernel: e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

This keeps repeating over and over at about two minutes interval.
The system is based on an Intel DP65LT motherboard that has an Intel 82566DC integrated Gbit network controller (according to the manual). It has the latest BIOS update.
Searching for the error message, i’ve found this solution and it helped.
I ran

ethtool -K enp0s25 tso off

and the journalctl messages disappeared and the network connection ran fine, as before.
Why is this happening?
I never had problems with the older Linux distro that was on this machine but i never checked ethtool to see if the older driver/kernel used the tso feature… (i didn’t bother to back it up before erasing it since openSUSE ran on all of my other machines flawlessly)
Is tso broken in this chip or in the driver?
I understand that there may be a performance penalty if i disable the tso feature of the chip so is there anything i can do to fix this?

Regards,
Andy.

From what I can remember, it’s a problem with the cards EEPROM which has some broken settings and causes issues if TSO is enabled - you may be able to find an update for it but just disabling tso should work for you. You’ll run into this issue on all distributions since it’s kernel / driver related.

The old kernel versions didn’t enable it most likely and thus the issue never popped up or it was manually disabled by the software vendor.

Hello and thanks for your reply!
Could thisbe what you were thinking of?
Intel says:

82573(V/L/E) TX Unit Hang Messages

Several adapters with the 82573 chipset display “TX unit hang” messages
during normal operation with the e1000 driver. The issue appears both with
TSO enabled and disabled, and is caused by a power management function that
is enabled in the EEPROM. Early releases of the chipsets to vendors had the
EEPROM bit that enabled the feature. After the issue was discovered newer
adapters were released with the feature disabled in the EEPROM.

If you encounter the problem in an adapter, and the chipset is an 82573-based
one, you can verify that your adapter needs the fix by using ethtool:

ethtool -e eth0

Offset Values


0x0000 00 12 34 56 fe dc 30 0d 46 f7 f4 00 ff ff ff ff
0x0010 ff ff ff ff 6b 02 8c 10 d9 15 8c 10 86 80 de 83

My “ethtool-e” EEPROM does not look anything like that…
And i’ve even found a script here that checks it for me but it also says it does not apply to my card.
If i can’t fix it, how can i make the driver disable the tso at the earliest possible stage of booting?

Thanks!