Excessive out-of-order UDP packets with recent Leap 15.2 and 15.3 kernels.

Hi everyone,

I have observed that Leap 15.2 kernels 5.3.18-lp152.72.1 onwards frequently deliver UDP packets out of order.
The current Leap 15.3 kernel 5.3.18-59.19 has the same issue, but older Leap 15.3 kernel 5.3.18-57.3 from http://download.opensuse.org/distribution/leap/15.3/repo/oss/ is ok.

The issue is 100% reproducable on Haswell hardware - for example an Intel(R) Core™ i3-4130 CPU @ 3.40GHz machine, and on Broadwell - for example an intel NUC with an Intel(R) Core™ i5-5300U CPU @ 2.30GHz.

I know UDP makes no guarantee of packet delivery order, but previous kernels always seemed to deliver packets in order. Later kernels seem to suffer from excessive out-of-order packet delivery.
Did something change in kernel 5.3.18-lp152.72.1 that would affect this?

We noticed this affecting video streaming, but it is easily reproducable using iperf.
Steps to Reproduce:

  • Run an iperf server on the Haswell box.
iperf3 -s
  • Now run a set of 10 tests, asking a remote client to receive 300Mbit blocks of UDP data.
for iter in 0 1 2 3 4 5 6 7 8 9; do echo "iter: $iter"; iperf3 -c 192.168.1.10 -u -b 300M -R | grep "datagrams received out-of-order"; done

Using kernel 5.3.18-lp152.60 I see no out-of-order datagrams.
Using kernel 5.3.18-lp152.87 I see > 10k out-of-order datagrams on each run.

I created a bugzilla ticket for this a while back. Is anyone able to direct me to the maintainers best placed to investigate this?

Thanks,
Simon

@SimonLogan:

Is this something related to “Transmit Packet Steering” (XPS)?

  • AFAICS, XPS is enabled – “/usr/src/linux-5.3.18-59.19-obj/x86_64/default/.config:CONFIG_XPS=y”.

<https://www.kernel.org/doc/html/latest/networking/scaling.html#xps-transmit-packet-steering>

@dcurtisfra Thanks for the suggestion.

This link says XPS was incorporated into kernel 2.6.38. Since we’re on kernel 5.3 I think it’s unlikely that the introduction of this feature is to blame.
Maybe some default behaviour has changed recently though.

XPS appears to be hardware-specific (only active on network devices with multiple transmit queues) and I am unable to write to /sys/class/net/eth0/queues/tx-0/xps_cpus on my NUC.
I don’t know whether this means my network device doesn’t have multiple tx queues.

Thanks,
Simon

@SimonLogan:

You’ll have to inspect the Kernel sources – ‘/lib/modules/5.3.18-59.19-default/source/Documentation/networking/scaling.rst’.

  • get_xps_queue
    ” is being used by the Kernel, for example in ‘/lib/modules/5.3.18-59.19-default/source/net/core/dev.c’.

On the other hand ‘/lib/modules/5.3.18-59.19-default/source/drivers/net/hyperv/netvsc_drv.c’ isn’t using XPS, yet – to do (some dayone dayover the rainbow) …