Update on 19 November 2024 broke docker container networking (docker 26.1.5_ce-9.1)

The last working version had been docker 26.1.5_ce-8.1, which got installed on 3 November 2024.

On 2024-11-15 12:28 CET I ran my usual zypper dup routine. During this update, docker networking broke.

Snapshot 851 - pre zypper - works. Snapshot 852 - post zypper - is broken:

851  │ pre    │       │ Tue 19 Nov 2024 23:16:34 CET  │ root │ number  │ zypp(zypper)                                                    │ important=yes
852- │ post   │   851 │ Tue 19 Nov 2024 23:20:58 CET  │ root │ number  │                                                                 │ important=yes

/var/log/zypp/history excerpt:

$ rg  '2024-11-19.*docker' /var/log/zypp/history      
46340:2024-11-19 23:18:39|install|docker-buildx|0.17.1-9.1|x86_64||download.opensuse.org-oss|d30e9c409af9526f9ebb31185b626d2bdee96cd4b2ccd1c7f6501c24c249eda4d1599cbf85eda0ff4f45c9a890b99cbf79c06ba381ae929850c21fde5c8d663b|
46341:# 2024-11-19 23:18:40 docker-26.1.5_ce-9.1.x86_64.rpm installed ok
46346:2024-11-19 23:18:40|install|docker|26.1.5_ce-9.1|x86_64||download.opensuse.org-oss|3e2d3671ea0f5ba71ac47a9bd17c7d5e3ecccd2f71b1b7f48023a1f1081fc9fe257d250c9615014c8412a9b0f421fa0d00e3a997292c9810a020562ec2cf1044|
46347:2024-11-19 23:18:40|install|docker-rootless-extras|26.1.5_ce-9.1|noarch||download.opensuse.org-oss|4dba9288a3b3997580e9de96325008e12195e2c9c0227f7ea9f53fc17b608b5b725820ef8a635998ccde9e8bbb85b2c974db02f20847aa0ca3948621e639ba5a|
46365:2024-11-19 23:19:05|install|docker-zsh-completion|26.1.5_ce-9.1|noarch||download.opensuse.org-oss|695d594d67baac3054292189310c92f46029251730942d0230dc4cc3463c4e43311dd2155e5081e8236748764a2b419d733452d3bd87dbd4764d000532e220a1|
46366:2024-11-19 23:19:05|install|docker-bash-completion|26.1.5_ce-9.1|noarch||download.opensuse.org-oss|6d36d57d719094a298a1eed68ade5cad25384cf7980e34026bf5afd9ff72719e9ee42e110a074883753ca691499754b92fc70f96aa9e1c49ace88ffa15d01bb2|

Full zypper history of this run: openSUSE Paste

Symptoms:

Since 2024-11-19 / since installing docker 26.1.5_ce-9.1, container networking times out for almost all containers. Some work. Ping from host to all containers in all networks, and ping between containers (which otherwise time out) works.

Rolling back to the zypper pre snapshot before docker 26.1.5_ce-9.1 (which is docker 26.1.5_ce-8.1) makes container networking function normally again.

Someone else (@MNeugebauer) also seems to be affected: Can't rollback via snapper (Loading Linux xxx-default ...) - #3 by MNeugebauer

Docker package changelog: File docker.changes of Package docker - openSUSE Build Service

Snapshot diff of docker service:

# snapper diff 851..852 /usr/lib/systemd/system/docker.service
--- /.snapshots/851/snapshot/usr/lib/systemd/system/docker.service      2024-10-17 00:24:53.000000000 +0200
+++ /.snapshots/852/snapshot/usr/lib/systemd/system/docker.service      2024-11-12 07:34:29.000000000 +0100
@@ -16,7 +16,7 @@
 # enabled by default because enabling socket activation means that on boot your
 # containers won't start until someone tries to administer the Docker daemon.
 Type=notify
-ExecStart=/usr/bin/dockerd --add-runtime oci=/usr/sbin/runc $DOCKER_NETWORK_OPTIONS $DOCKER_OPTS
+ExecStart=/usr/bin/dockerd --add-runtime oci=/usr/sbin/runc $DOCKER_OPTS
 ExecReload=/bin/kill -s HUP $MAINPID
 
 # Having non-zero Limit*s causes performance problems due to accounting overhead

$DOCKER_NETWORK_OPTIONS is not set in /etc/sysconfig/docker, and I do not see how the removal of an empty environment variable from a service CLI invocation might have caused the issue. But of course I might be overlooking something.

Can there be any other cause?

After having rolled back (so container networking works), the following indeed has no negative effect:

  1. edit /usr/lib/systemd/system/docker.service
  2. remove $DOCKER_NETWORK_OPTIONS
  3. save
  4. run systemctl daemon-reload
  5. run systemctl restart docker
  6. start the docker compose environment which had been broken by docker 26.1.5_ce-9.1
  7. Container networking works.

The zypper history from 2024-11-19 at openSUSE Paste also shows an update of runc to 1.2.2-1.1.
The previous version had been runc 1.2.1-1.1, installed on 2024-11-04.

Might that be the cause? Notably containerd was not updated.

runc changelog: File runc.changes of Package runc - openSUSE Build Service
runc upstream changelog: Release runc v1.2.2 -- "Specialization is for insects." · opencontainers/runc · GitHub

@Labyrinth Thanks for the ping. I have the same issues with docker after upgrade. Last version without issues is

Docker version 26.1.5-ce, build 411e817ddf71

I did not investigate any further so far cause those network issues are way over my had. Let me know if any tests on my system can be of any help.

1 Like

I can confirm the problems. I could find that networks with a UUID-like name (generated from testcontainers) did not work while ones with a manually given name worked well under the affected docker version. There were no other significant configuration differences between working and non-working networks apart from the name.

1 Like

I locked the docker package to 26.1.5_ce-8.1 and retried the upgrade. It still broke networking.

runc was updated from 1.2.1-1.1 to 1.2.2-1.1, and other docker-related packages like docker-rootless-extras were not locked to 26.1.5_ce-8.1 and thus updated to 26.1.5_ce-9.1.

I will roll back again and retry the upgrade while locking runc and docker-* as well.

I rolled back, locked multiple docker-related package patterns and re-upgraded. Docker networking still broke.

The locked patterns:

zypper ll

# | Name       | Type    | Repository | Comment
--+------------+---------+------------+--------
1 | containerd | package | (any)      | 
2 | docker     | package | (any)      | 
3 | docker-*   | package | (any)      | 
4 | runc       | package | (any)      | 

Here the zypper log with all updates. One of these updated packages breaks docker networking: openSUSE Paste

I now wonder if the update of nftables to 1.1.1-1.2 might cause the problem. It’s a firewall, and docker historically has had issues with firewalls.

PS: Some promising discussion indeed: Native support for nftables · Issue #1472 · docker/for-linux · GitHub

PPS: Also this Reddit - Dive into anything

Alright, after locking the following tables, then running zypper dup, docker networking still works:

> zypper ll                

#  | Name            | Type    | Repository | Comment
---+-----------------+---------+------------+--------
1  | containerd      | package | (any)      | 
2  | docker          | package | (any)      | 
3  | docker-*        | package | (any)      | 
4  | iptables        | package | (any)      | 
5  | iptables-*      | package | (any)      | 
6  | libnftables1    | package | (any)      | 
7  | libxtables12    | package | (any)      | 
8  | nftables        | package | (any)      | 
9  | runc            | package | (any)      | 
10 | xtables-plugins | package | (any)      | 

I strongly suspect one of the following to cause the problems:

  • iptables + iptables-*
  • nftables + libnftables1
  • xtables-plugins + libxtables12

These also seem to be related to each other.

What I will try next is unlock containerd, docker, docker-* and runc, run a zypper dup and reboot. I assume that docker networking is going to continue working well after these updates.

The following 6 items are locked and will not be changed by any action:
 Installed:
  iptables              1.8.10-3.1  x86_64  @System  openSUSE
  iptables-backend-nft  1.8.10-3.1  x86_64  @System  openSUSE
  libnftables1          1.1.1-1.1   x86_64  @System  openSUSE
  libxtables12          1.8.10-3.1  x86_64  @System  openSUSE
  nftables              1.1.1-1.1   x86_64  @System  openSUSE
  xtables-plugins       1.8.10-3.1  x86_64  @System  openSUSE

The following 7 packages are going to be upgraded:
  containerd              1.7.23-2.1 -> 1.7.23-3.1        x86_64  Main Repository (OSS)  openSUSE
  docker                  26.1.5_ce-8.1 -> 26.1.5_ce-9.1  x86_64  Main Repository (OSS)  openSUSE
  docker-bash-completion  26.1.5_ce-8.1 -> 26.1.5_ce-9.1  noarch  Main Repository (OSS)  openSUSE
  docker-buildx           0.17.1-8.1 -> 0.17.1-9.1        x86_64  Main Repository (OSS)  openSUSE
  docker-rootless-extras  26.1.5_ce-8.1 -> 26.1.5_ce-9.1  noarch  Main Repository (OSS)  openSUSE
  docker-zsh-completion   26.1.5_ce-8.1 -> 26.1.5_ce-9.1  noarch  Main Repository (OSS)  openSUSE
  runc                    1.2.1-1.1 -> 1.2.2-1.1          x86_64  Main Repository (OSS)  openSUSE

7 packages to upgrade.

Edit: After updating, docker networking still works.

This means the breakage is now narrowed down to one (or all) of:

  • iptables + iptables-*
  • nftables + libnftables1
  • xtables-plugins + libxtables12

I opened a bug report about this issue: 1233980 – Recent update to uptables/nftables/xtables breaks docker inter-container networking

I was also able to narrow it down to four packages, which all depend on each other and must be updated in tandem:

  • iptables + iptables-*
  • xtables-plugins + libxtables12

Updating nftables + libnftables1 does not break docker networking.

But updating these four packages does, and reproducibly so.

Could you share a reliable way to reproduce this issue?

Using the same package versions, I was not able to reproduce, the Docker network works, both with iptables-nftables or iptables-legacy.

If you can share a compose file or a set of container images where this is reproducible it will help to debug.

hi,

just bumped into this thread after opening this

long story short, fresh start docker service, create a network, delete that network and everything breaks. copying steps from the github issue:

  1. restart daemon: systemctl restart docker
  2. observe iptables: iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
  1. create a network: docker network create my-net
  2. remove network: docker network rm my-net
  3. observe iptables: iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

The entries for the default network (docker0 ) have disappeared.
6. try to access the internet from within a container attached to the default network:

 docker run --rm alpine ping -c 4 172.217.17.228
PING 172.217.17.228 (172.217.17.228): 56 data bytes

--- 172.217.17.228 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

(pinging www.google.com)

what seems to be the root cause, is that iptables won’t match correctly any rules containing an output interface. this causes the initial rules not to be added when creating a new docker network and, even worse, when deleting the network the rules for the derfault docker0 are removed instead.
check my later comment in the github issue for more details.

docker aside, manually issuing iptables commands also highlights the problem. for example, consider the following “initial” state of the filter / FORWARD rules, as reported by iptables -L -n -v:

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
    0     0 DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0

testing for a forward rule with another interface name, even a clearly non-existent one, returns “true” instead of the expected “iptables: Bad rule” message:
iptables --wait -t filter -C FORWARD -o br-xxxxxxxxxxxx -j DOCKER
(verbatim interface name, as shown)

trying to delete the non-existent rule, somehow manages to delete the similar rule for docker0:
iptables --wait -t filter -D FORWARD -o br-xxxxxxxxxxxx -j DOCKER

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
    0     0 DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0

(note the missing 4th line)

when removing a docker network, dockerd will issue a series of -D commands that end up deleting the default docker network configuration.

hope this helps a bit with the investigation, my iptables knowledge stops around here :slight_smile:

my opensuse version:

> cat /etc/os-release
NAME="openSUSE Tumbleweed"
# VERSION="20241126"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20241126"
PRETTY_NAME="openSUSE Tumbleweed"

also what appears to be the same issue from the debian tracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087210

there is a fix mentioning fixing -C commands. i just hope the -D commands will also be fixed as part of that…

I managed to reproduce a scenario where the container does not have internet.

It only happens when using iptables v1.8.11 (nf_tables) not iptables v1.8.11 (legacy).

The current workaround is to remove iptables-backend-nft, reset iptables rules, and restart Docker.

I’ll see if we can send a package update soon.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.