LEAP 15.3: odd zypper/wget throughput behavior after changing mainboard

(Apologies for the long post…I’m including a lot of test results here in hopes that someone recognizes a config problem)

I’m seeing zypper/wget/curl slowness on a Leap 15.3 machine that I cannot reproduce on a Debian machine that resides on the same network. I don’t remember having this problem until after I swapped mainboards a couple weeks ago. At first, I figured the problem was the Realtek NICs on the new mainboard so I grabbed an Intel gigabit NIC from the cabinet but the problem persists.

Current machine details:
CPU: Ryzen 5950x
Mainboard: MSI B550 Tomahawk
Network card: Intel dual gigabit card (Intel 82571)
OS: Leap 15.3
Current kernel: 5.14.15-lp153.2.g3416a5a-default (from the Kernel:/stable:/Backport/standard repo)

One wrinkle is that my NIC is bridged so that I can run VMs on the bridged interface. So eth5 is bound to br0.


> ethtool eth5
Settings for eth5:
    Supported ports:  TP ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Speed: 1000Mb/s
    Duplex: Full
    Auto-negotiation: on
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    MDI-X: off (auto)
netlink error: Operation not permitted
        Current message level: 0x00000007 (7)
                               drv probe link
    Link detected: yes

For the following tests, I’ll show throughput measurements from this machine (‘slowbox’) and a Debian machine connected to the same local network (‘server’)…

Test #1: Use ‘wget’ to grab http://download.opensuse.org/repositories/mozilla/openSUSE_Leap_15.3/x86_64/MozillaThunderbird-translations-common-91.3.0-lp153.3.1.x86_64.rpm (though I see similar behavior with files from other repositories):


slowbox:                          server:
2021-11-04 13:54:40 (2.14 MB/s)   2021-11-04 13:57:35 (4.12 MB/s)
2021-11-04 13:55:19 (635 KB/s)    2021-11-04 13:57:47 (4.08 MB/s)
2021-11-04 13:55:59 (461 KB/s)    2021-11-04 13:58:00 (5.05 MB/s)

Notes:  slowbox speed was erratic ranging from 48KB/sec to over 5MB/sec briefly

So despite being on the same local network, ‘server’ grabbed the files much quicker.

Test #2: Copy that RPM to a co-located virtual machine on the US east coast and then use ‘wget’ to download it from there to my local machines:


slowbox:                          server:
2021-11-04 14:11:23 (720 KB/s)    2021-11-04 14:13:01 (6.38 MB/s)
2021-11-04 14:11:55 (709 KB/s)    2021-11-04 14:13:04 (12.1 MB/s)
2021-11-04 14:12:34 (1.30 MB/s)   2021-11-04 14:13:06 (12.8 MB/s)

Again, ‘server’ is about 10x faster than ‘slowbox’ despite being on the same local network

Test #3: Use ‘scp’ (with AES, no compression) to download the same file from the co-located VM:


         slowbox:    server:
Test 1:  8.2MB/s     8.8MB/s
Test 2:  5.8MB/s     12.0MB/s
test 3:  6.5MB/s     12.7MB/s

So, ‘scp’ speeds are comparable between the two machines. The RPM is small enough that the diff between 6MB/sec and 12MB/sec gets lost in the noise. What is interesting is that ‘slowbox’ appears to be able to ‘scp’ about 10x faster than plain ‘wget’ from the remote host.

Test #4: To rule out local network issues as well as ‘wget’ weirdness, I copied the RPM to a webserver hosted on ‘server’ and then used ‘wget’ to download that file to ‘slowbox’:

slowbox:


2021-11-04 14:03:58 (112 MB/s)
2021-11-04 14:04:53 (112 MB/s)
2021-11-04 14:05:04 (112 MB/s)

So this suggests that there’s nothing wrong with the network interface or local network: we’re seeing the expected gigabit speeds. This is also confirmed by measuring NFSv4 throughput between ‘server’ and ‘slowbox’.

In the tests above, I used ‘wget’ but I see similar performance with ‘curl’. And, as mentioned at the beginning, I see similar performance using ‘zypper’.

So…

Test #4 suggests that the network config is okay. We get the expected throughput on the local network.

Test #3 suggests that we can use ‘scp’ to fetch files from the internet with no performance issues.

Tests #1 and #2 suggest that fetching files using wget (and curl and zypper) are for some reason much slower on this Leap 15.3 machine.

Any ideas what I might have misconfigured?

You write that curl is also slow, but curl can also do scp, so by giving that a try we can more or less find if it is a tool or protocol issue.

Okay. Here are the curl results for tests #2 and #3:

Test #5: using curl to download the RPM from co-located machine using HTTP:


slowbox:      server:
703k          10.5M
1179k         11.1M
1312k         11.7M

Test #6: using curl in scp-mode to download the RPM from the same co-colocated machine:


slowbox:     server:
1057k        4506k
522k         10.8M
625k         11.0M

So ‘server’ is about 10-20x faster than ‘slowbox’ for curl operations despite having a slower processor (Ryzen 2700x vs Ryzen 5950x).

Honestly, though, if slowbox’s zypper transfer rates were even this high, I might not have noticed. What started me down this path of investigation was repeated kernel updates that would sometimes drop below 100KB/sec.

Maybe damaged ethernet cable - I have seen some that will not do 1000mb/s that did when new. Corrosion on cable ends can do that.

Could also be defective port in the router/hub - try a different port.

In general changing a Mainboard/Motherboard doesn’t need any changes to a Linux system, except for –

  1. UEFI – you’ll need to run ‘/usr/sbin/efibootmgr
    ’ to sort out the EFI records. 1. Network interface – you’ll need to check ‘/etc/udev/rules.d/70-persistent-net.rules
    ’ for any changes to the “eth0” / “eth1” devices – old board “eth0” – new board “eth1” … 1. Changes to the board’s micro-code – check that, the correct micro-code packages have been installed.
  2. Devices plugged into the board’s slots – check that, the correct driver packages have been installed.

Probably also good to check the statistics for your Ethernet for error, as example for my system:

$ sudo ethtool -S enp3s0    
NIC statistics: 
     tx_packets: 26685 
     rx_packets: 40520 
     tx_errors: 0 
     rx_errors: 0 
     rx_missed: 0 
     align_errors: 0 
     tx_single_collisions: 0 
     tx_multi_collisions: 0 
     unicast: 40484 
     broadcast: 8 
     multicast: 28 
     tx_aborted: 0 
     tx_underrun: 0

Perhaps analyze the date transfer with tcpdump (or similar). That might show what’s going on. I’m wondering if packet fragmentation might be an issue here?

Though I can’t explain it, the problem seemed to be somehow tied to the ethernet switch. I’d tried to plug ‘slowbox’ into each of the unused ports on the switch and didn’t see any change in behavior. However on a whim I power-cycled the switch and the problem went away on all ports. ‘slowbox’ speeds are now comparable to ‘server’ in every test. I’ve even gone back to the Realtek NIC and speeds are still normal.

This is just a dumb layer-2 switch so maybe there was a problem with its internal forwarding/ARP table. Strange that it only affected this machine (even when I changed NICs) but not ‘server’ which is plugged into port 1. So maybe the switch is slowly dying? Dunno. Dumb switches are cheap enough that I pre-emptively replaced it with one I had on the shelf and the problem hasn’t recurred.

I think we can mark this one as resolved even though I’m not sure what the problem was.

Possibly probably not – if it’s “just a dumb layer-2 switch” that’s been in service for more than 12 months then, a power cycle is often “something which needs to be done” –

  • If the micro-code is “simple” and not error free, then as you point out a power cycle will forcibly clear all the internal buffers and counters – exactly how theses things handle counter overflows after months in operation is often a good question …