USB ethernet interface name inconsistency

Hello,

I have a pretty weird issue that showed up between kernels 6.4.0-150600.23.25 and 6.4.0-150600.23.30 . As in it was working as expected in .25 and broke/changed in .30 .

For some background, I have LEAP running on a bunch of POS systems with payment terminals connected, the terminals/POS’s communicate via Ethernet over USB using ECM, basically just a /30 network between them, and this is all setup using some udev rules to ensure the interface gets the correct IP config and so on, this has been working flawlessly for quite some time.

Anyways, when doing the usual patching/updating song and dance in February the networking side of things broke entirely. As it turns out this is because the kernel started naming the ECM interface “eth0” by default instead of “usb0” as I would expect and as has been the case previously. This in turn causes a conflict with the actual “main” Ethernet interface which then gets renamed, the system thus loses its connection to the outside world and can’t finish its installation and configuration (which ironically would have solved the issue by installing the aforementioned udev rules for the USB interface).

As an example of what it actually looks like, here’s the relevant part of the dmesg:
Kernel <= 6.4.0-150600.23.25:

Feb 26 15:30:35 hostname kernel: cdc_ether 3-11.1:1.0 usb0: register 'cdc_ether' at usb-0000:00:14.0-11.1, CDC Ethernet Device, 52:14:32:fb:9e:5d

Kernel >= 6.4.0-150600.23.30

Feb 26 15:23:52 hostname kernel: cdc_ether 3-11.1:1.0 eth0: register 'cdc_ether' at usb-0000:00:14.0-11.1, CDC Ethernet Device, 52:14:32:fb:9e:5d

These are from the same system using the same terminal, I can just up/downgrade the kernel and it will cause/fix the issue. It’s the same on different systems and terminals.

I’ve ruled out some basics, like the L/A bit in the MAC address on the is set correctly which as far as I know tells the kernel it should be called usbX rather than ethX. I’ve also tried using RNDIS instead of ECM but it behaves the same way.

I guess my next step will be to compile my own kernel and see where that takes me, and having some udev rules in place from the start to rename the USB interface so something else entirely should be able to work it out I imagine, but I figured I’d ask if anyone has any clue what might have caused this, seeing as it’s hardly a major kernel update.
Speaking of, I also sat down with a pkgdiff between kernels .25 and .30 and couldn’t find anything that seemed related, some crypto stuff with Poly1305 and some NVMe changes.

Anyways I hope some of that made sense, it’s kind of a hard problem to condense into one digestible post, and English isn’t my native language. :slight_smile:

Examine the entries in /etc/udev/rules.d/70-persistent-net.rules, and adjust as required.

My bad, I should have mentioned it’s not related to ye olde 70-persistent-net. Udev shouldn’t really matter at all other than being a potential solution if I can’t figure out what changed with the way the kernel initially names the interface.

It is this commit:

commit a9af20a1d3507c1a6d9fbaa36f63b7e21147cdd8
Author: Oliver Neukum <oneukum@suse.com>
Date:   Fri Oct 25 10:01:04 2024 +0200

    net: usb: usbnet: fix name regression (get-fixes).
    
    suse-commit: 05e377865f263102308cd246ad4a470a4c2dbc11

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 4f5a3a4aac89..9f66c47dc58b 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1771,7 +1771,8 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
                // can rename the link if it knows better.
                if ((dev->driver_info->flags & FLAG_ETHER) != 0 &&
                    ((dev->driver_info->flags & FLAG_POINTTOPOINT) == 0 ||
-                    (net->dev_addr [0] & 0x02) == 0))
+                    /* somebody touched it*/
+                    !is_zero_ether_addr(net->dev_addr)))
                        strscpy(net->name, "eth%d", sizeof(net->name));

Open bug report. But be prepared that the answer will be - you relied on the incorrect behavior in the past. Because that is what is now upstream, so if you disagree, you need to raise it upstream.

And upstream has more detailed explanation:

commit 8a7d12d674ac6f2147c18f36d1e15f1a48060edf
Author: Oliver Neukum <oneukum@suse.com>
Date:   Thu Oct 17 09:18:37 2024 +0200

    net: usb: usbnet: fix name regression
    
    The fix for MAC addresses broke detection of the naming convention
    because it gave network devices no random MAC before bind()
    was called. This means that the check for the local assignment bit
    was always negative as the address was zeroed from allocation,
    instead of from overwriting the MAC with a unique hardware address.

P.S. and it has really very little to do with hardware. Everything that happens on computers happens on some hardware; but we have specific categories for a reason.

Hmm … but it sounds fishy. The “MAC is not locally-administered” is most certainly not equivalent to the “MAC is not zero”. I suspect the right condition should be

!is_zero_ether_addr(net->dev_addr) && net->dev_addr [0] & 0x02) == 0

Open bug report. This is regression and it comes from SUSE anyway, so the openSUSE bugzilla is the right place, at least for a start.

I’ll trust you on the code part seeing as I wrote my last C code some 30’ish years ago. :slight_smile:

I’ll go ahead and open a bug, thanks for the help.

Update linux kernel before you open a bug report!

SLED15 SP6 with Kernel 6.4.0-150600.23.38-default works fine:

# uname -rsm
Linux 6.4.0-150600.23.38-default x86_64

# more /etc/udev/rules.d/70-persistent-net.rules |grep -i eth1
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:22:44:4a:93", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"

# ip link show |grep -i eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc cake state UP mode DEFAULT group default qlen 1000

# dmesg |grep -i cdc
[    7.521114] i915 display info: has_cdclk_crawl: no
[    7.521115] i915 display info: has_cdclk_squash: no
[ 2362.742391] cdc_acm 1-2:1.3: ttyACM0: USB ACM device
[ 2362.742428] usbcore: registered new interface driver cdc_acm
[ 2362.742430] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[ 2362.744173] usbcore: registered new interface driver cdc_ether
[ 2362.768116] cdc_ncm 1-2:1.0: MAC-Address: 00:e0:22:44:4a:93
[ 2362.768337] cdc_ncm 1-2:1.0 eth1: register 'cdc_ncm' at usb-0000:00:14.0-2, CDC NCM (NO ZLP), 00:e0:22:44:4a:93
[ 2362.768391] usbcore: registered new interface driver cdc_ncm
[ 2362.773850] usbcore: registered new interface driver cdc_wdm
[ 2362.777384] usbcore: registered new interface driver cdc_mbim

Hello,

I tried .38 in my testing and it behaves the same, from your dmesg you’re using an ACM device(unless I’m missing something), which wouldn’t be affected by this.

It is a USB Gadget with ACM and NCM function (PlutoSDR).
Your POS system could be a USB Gadget with ECM function.
https://developer.toradex.com/linux-bsp/application-development/peripheral-access/usb-device-mode-linux

https://www.kernel.org/doc/html/latest/usb/gadget-testing.html

NCM and EEM are the faster and more advanced standards of “USB over ethernet” (than ECM) from USB-IF.
https://en.wikipedia.org/wiki/Ethernet_over_USB

https://usblan.belcarra.com/2011/02/cdc-eem-vs-cdc-ecm-protocols.html

https://bugzilla.suse.com/show_bug.cgi?id=1238028

We are indeed using ECM, things in this world don’t move very fast so I’m sure we’ll get more modern protocols somewhere in the next decade. Or century. :wink:
Let’s just say we still use RS232 an awful lot.

About your device, from the MAC address in your dmesg it doesn’t have the LAA bit set so it wouldn’t be affected by the regression arvidjaar found anyway.
Anyway it seems fairly clear from looking at the code arvidjaar posted, since the conditional that looks at that bit was removed.

@Svyatko yes that’s my bug report. Looking at it now I think I should stop trying to write stuff late in the evenings…

Why you need this name?
Because from /drivers/net/usb/usbnet.c

  // heuristic:  "usb%d" for links we know are two-host,
  // else "eth%d" when there's reasonable doubt.  userspace
  // can rename the link if it knows better.

Very long story very short, it initializes before the regular NIC, which causes a conflict with the latter, which causes provisioning to fail. Amusingly the latter actually does rename the interface later on via udev, so simply provisioning without the terminal connected avoids the issue. Unfortunately that’s not a viable solution.

Well, that is why you are expected to post link here after you created it - for cross-reference, documentation and to avoid confusion …

My bad, I shut down my brain and took the evening off after realizing I wrote LLA instead of LAA all over that bug report.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.