Rootless docker: i/o timeout with docker pull

Dear openSuUSE Community,

I have set up rootless docker following this documentation Run the Docker daemon as a non-root user (Rootless mode) | Docker Docs in openSUSE Tumbleweed (currently version 20230917) but every docker pull gives a i/o timeout. The same applies, e.g., for docker login.

There are no proxy settings in ~/.docker/config.json .

someuser@somehost:~> docker pull hello-world
Using default tag: latest
Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.100:48971->10.0.2.3:53: i/o timeout

On the same host, there is a docker installation for root, where pulling of docker images works fine. I have tried to add additional DNS servers. No change. I also tried to reinstall rootless docker. No change. Furthermore, I have actually read any forum post one may find with google searching for ā€œdocker pull i/o timeoutā€, but nothing helped.

The current workaround: Pull the images with root docker, save the images, change permissions and ownership of the saved images and load them with rootless docker. Works, but this is an annoying workaround.

Do you have further hints on that? Any hint highly appreciated!

Output of docker version:

someuser@somehost:~> docker version
Client:
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:30:51 2023
 OS/Arch:           linux/amd64
 Context:           rootless

Server: Docker Engine - Community
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:32:17 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.3
  GitCommit:        7880925980b188f4c97b462f709d0db8e8962aff
 runc:
  Version:          1.1.9
  GitCommit:        v1.1.9-0-gccaecfc
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
 rootlesskit:
  Version:          1.1.0
  ApiVersion:       1.1.1
  NetworkDriver:    slirp4netns
  PortDriver:       builtin
  StateDir:         /tmp/rootlesskit878318901
 slirp4netns:
  Version:          1.2.1
  GitCommit:        unknown
root@somehost:~> docker version
Client:
 Version:           24.0.6-ce
 API version:       1.43
 Go version:        go1.20.8
 Git commit:        1a7969545d73
 Built:             Thu Sep 14 00:00:00 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.6-ce
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.8
  Git commit:       1a7969545d73
  Built:            Thu Sep 14 00:00:00 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.6
  GitCommit:        091922f03c2762540fd057fba91260237ff86acb
 runc:
  Version:          1.1.9
  GitCommit:        v1.1.9-0-gccaecfcbc907
 docker-init:
  Version:          0.1.7_catatonit
  GitCommit:

Output of docker info:

someuser@somehost:~> docker info
Client:
 Version:    24.0.6
 Context:    rootless
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.11.2
    Path:     /usr/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.21.0
    Path:     /usr/lib/docker/cli-plugins/docker-compose

Server:
 Containers: 18
  Running: 6
  Paused: 0
  Stopped: 12
 Images: 6
 Server Version: 24.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: false
  userxattr: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7880925980b188f4c97b462f709d0db8e8962aff
 runc version: v1.1.9-0-gccaecfc
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  rootless
  cgroupns
 Kernel Version: 6.4.11-1-default
 Operating System: openSUSE Tumbleweed
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 7.747GiB
 Name: somename
 ID: someid
 Docker Root Dir: /somefolder/.local/share/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No cpuset support
WARNING: No io.weight support
WARNING: No io.weight (per device) support
WARNING: No io.max (rbps) support
WARNING: No io.max (wbps) support
WARNING: No io.max (riops) support
WARNING: No io.max (wiops) support
root@somehost:~> docker version
Client:
 Version:    24.0.6-ce
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.11.2
    Path:     /usr/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.21.0
    Path:     /usr/lib/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 6
 Server Version: 24.0.6-ce
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 oci runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 091922f03c2762540fd057fba91260237ff86acb
 runc version: v1.1.9-0-gccaecfcbc907
 init version:
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.4.11-1-default
 Operating System: openSUSE Tumbleweed
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 7.747GiB
 Name: somename
 ID: someid
 Docker Root Dir: /somedir/dockered
 Debug Mode: false
 Username: someuser
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Thanks in advance!

I’ve just tried this myself on a clean install of Tumbleweed (20230922), and it worked OK - so that means that the issue is probably related to something in your configuration.

What are the contents of ~/.docker/config.json? You note that there are no proxy settings in there, but are there proxy settings for the host?

The timeout appears to be on the DNS lookup - so what happens on the host if you run:

dig @10.0.2.3 registry-1.docker.io

That’s where I’d be inclined to start - looks like a DNS server issue.

Thank you for the very prompt reply!

There are no proxy settings for the host. The content of the ~/.docker/config.json :

{
        "auths": {},
        "currentContext": "rootless",
        "mtu": 1454
}

I also previously added ā€œdnsā€: [ā€œ8.8.8.8ā€, ā€œ8.8.4.4ā€, ā€œ10.0.0.2ā€] to this JSON, but that didn’t help.

Output of ā€œdig @10.0.2.3 registry-1.docker.ioā€

someuser@somehost:~> dig @10.0.2.3 registry-1.docker.io
;; communications error to 10.0.2.3#53: timed out
;; communications error to 10.0.2.3#53: timed out
;; communications error to 10.0.2.3#53: timed out

; <<>> DiG 9.18.18 <<>> @10.0.2.3 registry-1.docker.io
; (1 server found)
;; global options: +cmd
;; no servers could be reached

Best,

That your host is getting timeouts trying to run the dig command means that the issue isn’t a Docker issue, but a DNS resolution issue for the host.

Can you ping 10.0.2.3? Given your entries that you tried in the JSON configuration file don’t include it, I’m wondering where that address is coming from - it’s not a typical locally-administered address range (it’s within the ranges for private networks, just not a usual thing to see that particular subnet used in most small networks).

What is in your /etc/resolv.conf file?

I cannot ping 10.0.2.3. I guess its comming from slirp4netns, but that did not provide a way to solve the problem to me. See --disable DNS in:

somuser@soomehost: > slirp4netns 
Usage: slirp4netns [OPTION]... PID|PATH [TAPNAME]
User-mode networking for unprivileged network namespaces.

-c, --configure          bring up the interface
-e, --exit-fd=FD         specify the FD for terminating slirp4netns
-r, --ready-fd=FD        specify the FD to write to when the network is configured
-m, --mtu=MTU            specify MTU (default=1500, max=65521)
-6, --enable-ipv6        enable IPv6 (experimental)
-a, --api-socket=PATH    specify API socket path
--cidr=CIDR              specify network address CIDR (default=10.0.2.0/24)
--disable-host-loopback  prohibit connecting to 127.0.0.1:* on the host namespace
--netns-type=TYPE 	 specify network namespace type ([path|pid], default=pid)
--userns-path=PATH	 specify user namespace path
--enable-sandbox         create a new mount namespace (and drop all caps except CAP_NET_BIND_SERVICE if running as the root)
--enable-seccomp         enable seccomp to limit syscalls (experimental)
--outbound-addr=IPv4     sets outbound ipv4 address to bound to (experimental)
--outbound-addr6=IPv6    sets outbound ipv6 address to bound to (experimental)
--disable-dns            disables 10.0.2.3 (or configured internal ip) to host dns redirect (experimental)
--macaddress=MAC         specify the MAC address of the TAP (only valid with -c)
--target-type=TYPE       specify the target type ([netns|bess], default=netns)
-h, --help               show this help and exit
-v, --version            show version and exit

The /etc/resolve.conf has 8.8.8.8 and some company run DNS server, which I set in /etc/sysconfig/network/config .

### /etc/resolv.conf is a symlink to /run/netconfig/resolv.conf
### autogenerated by netconfig!
#
# Before you change this file manually, consider to define the
# static DNS configuration using the following variables in the
# /etc/sysconfig/network/config file:
#     NETCONFIG_DNS_STATIC_SEARCHLIST
#     NETCONFIG_DNS_STATIC_SERVERS
#     NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig by setting:
#     NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other documentation.
#
### Call "netconfig update -f" to force adjusting of /etc/resolv.conf.
nameserver 8.8.8.8
nameserver some-ip

Thank you for you reply again. I hope my reply helps to provide more hints on that.

Best

I hadn’t noticed that slirp4netns was used as part of the setup - the non-response from 10.0.2.3 makes sense with that information.

Did you perform any configuration at all for slirp4netns? I didn’t in my system - which is part of why I didn’t even notice it was there.

What do you see when you run ps ax | grep slirp4netns on the host? I’d like to compare to what I see on mine:

jhenderson@localhost:~> ps ax | grep -i slirp
  3939 ?        Ssl    0:00 rootlesskit --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh
  3950 ?        Sl     0:00 /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /usr/bin/dockerd-rootless.sh
  3975 ?        S      0:00 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 3950 tap0
  7784 pts/1    S+     0:00 grep --color=auto -i slirp

I wonder…Is your host running IPv4 or IPv6? I could see perhaps an issue if the host were IPv6 only and slirp4netns was only running IPv4. I’ve not worked with it at all, but if it doesn’t handle the two protocols, I could see that leading to a failure like the one you’re seeing.

I did not perform any configuration of slirp4netns. The host only uses IPv4. I have actually thought the same before, but that, unfortunately, did not lead me to a solution.

Output of ps ax | grep slirp4netns.

someuser@somehost:~> ps ax | grep slirp4netns
20242 ?        Ssl    0:00 rootlesskit --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave ~/bin/dockerd-rootless.sh
20252 ?        Sl     0:00 /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave ~/bin/dockerd-rootless.sh
20272 ?        S      0:00 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 20252 tap0
26711 pts/3    S+     0:00 grep --color=auto slirp4netns

Best

I’m starting to run out of ideas here, just in full transparency. I’ve configured my setup in a virtual machine using Tumbleweed and manually configured my DNS settings in NetworkManager so that my /etc/resolv.conf file is similar to yours (using my own internal DNS server as a ā€˜local’ dns server).

From the host, name lookups using 8.8.8.8 and your local DNS server work, yes?

If you stop/restart docker using systemctl --user restart docker, I assume that makes no difference?

I did run across your question on the Docker forums as well - I wonder if they might have ideas if you pointed at the slirp4netns layer rather than Docker itself. Finding troubleshooting information about that has, I’m sure you have found, been particularly challenging as well.

From a networking standpoint, are you using NetworkManager, Wickedd, or something else? (I’m using NM - just trying to eliminate differences in our configurations).

Does the corporate DNS server forward unresolved requests upstream, or does it just resolve local names?

For anyone else following along, I’ve joined the discussion on the Docker forums to see about additional troubleshooting ideas.

Here’s something that might help with diagnosis - modify ~/.config/systemd/user/docker.service to add a -D execstart line:

ExecStart=/usr/bin/dockerd-rootless.sh -D

Then run systemctl --user daemon-reload and systemctl --user restart docker. Open another terminal window and run journalctl --user -f and then go back to the first window and try pulling an image.

The -D enables debug logging. Since I can’t reproduce the issue, I don’t know if that will shed any additional light on what’s going on, but the worst case is that we learn nothing new.

I don’t know if this will tell us anything useful either, but if you use the non-rootless docker installation to pull the busybox image, then run:

docker run -it --rm busybox sh

Then within that busybox instance, try running traceroute -n 8.8.8.8 and see if that actually gives you a full traceroute. I noticed in my test environment that there is a hop through the 10.0.2.x network that isn’t present in a non-rootless installation.

Also, while I’m thinking about it - what is your firewall configuration? I don’t really think a firewall is going to be the problem here, but it would be good to rule it out. (Firewalld is an ingress firewall, so unless there’s something really wonky with the configuration, it shouldn’t cause any issues for outbound traffic)

I use wicket network. Lookups works for the user and also as root (no shown):

someuser@somehost:~> nslookup forums.opensuse.org
Server:		8.8.8.8
Address:	8.8.8.8#53

Non-authoritative answer:
Name:	forums.opensuse.org
Address: 195.135.221.140
Name:	forums.opensuse.org
Address: 2001:67c:2178:8::16

Restarting the rootless docker service didn’t help. I did this many times before trying to troubleshoot the problem. Yes, it makes no difference. As far as I know the corporate DNS server does not just resolve local names. The corporate DNS server (and another corporate DNS server) were the only ones in /etc/resolve.conf for years. To eliminate any problems with that, I did put 8.8.8.8 on top.

I followed your recommendations but journalctl --user -f gave:

someuser@somehost:~> journalctl --user --user-unit=docker.service
Hint: You are currently not seeing messages from the system.
      Users in the 'systemd-journal' group can see all messages. Pass -q to
      turn off this notice.
No journal files were opened due to insufficient permissions.

Following someuser@somehost:~> sudo usermod -a -G systemd-journal someuser, restarted and

someuser@somehost:~> id -a someuser 
uid=1001(someuser) gid=100(users) groups=481(systemd-journal),447(docker),100(users)

Then, journalctl --user -f:

someusern@somehost:~> journalctl --user -f
No journal files were found.

I decided to stop all docker container running and sudo the command above and did `docker pull hello-world``

Sep 29 11:50:26 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:26.400178183+02:00" level=debug msg="Calling HEAD /_ping"
Sep 29 11:50:26 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:26.400883351+02:00" level=debug msg="Calling POST /v1.43/images/create?fromImage=hello-world&tag=latest"
Sep 29 11:50:33 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:33.340314847+02:00" level=debug msg="clean 208 unused exec commands"
Sep 29 11:50:36 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:36.403794839+02:00" level=debug msg="Trying to pull hello-world from https://registry-1.docker.io"
Sep 29 11:50:46 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:46.410291747+02:00" level=warning msg="Error getting v2 registry: Get \"https://registry-1.docker.io/v2/\": dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.100:43257->10.0.2.3:53: i/o timeout"
Sep 29 11:50:46 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:46.410328258+02:00" level=info msg="Attempting next endpoint for pull after error: Get \"https://registry-1.docker.io/v2/\": dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.100:43257->10.0.2.3:53: i/o timeout"
Sep 29 11:50:46 somehost dockerd-rootless.sh[1520]: time="2023-09-29T11:50:46.411754710+02:00" level=error msg="Handler for POST /v1.43/images/create returned error: Get \"https://registry-1.docker.io/v2/\": dial tcp: lookup registry-1.docker.io on 10.0.2.3:53: read udp 10.0.2.100:43257->10.0.2.3:53: i/o timeout"
Sep 29 11:50:46 somehost dockerd-rootless.sh[1556]: time="2023-09-29T11:50:46.444904153+02:00" level=debug msg="garbage collected" d="554.536µs"

With the non-root docker I ran the command above and the traceroute which gave:

/ # traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 46 byte packets
 1  172.17.0.1  0.007 ms  0.002 ms  0.001 ms
 2  *  *  *
 3  *  *  *
...

I don’t thing this is firewall related. I use firewalld, but (I was brave) disabled it and tried to pull hello-world as user. This did not work and I enabled firewalld again. There are corporate firewalls as well, which also dynamically blocks connections in some cases, and I have access to a UI to see if there is anything getting blocked. However, this was and is not the case.

I hope this helps. Thank you again for the time you invested in this issue, and interesting to hear for the docker forum, that user rimelek also got timeouts.

Thank you again!

Not a problem, this is a really strange issue.

It seems that this is being narrowed down to it being in the virtual networking that rootless docker uses. I wonder if you change to NetworkManager if the behavior changes - I’m going to try switching my VM to Wicked and see if I can reproduce the issue that way. I am running a GNOME desktop in my VM - you wouldn’t happen to be running a different desktop by chance?

Indeed, this also seems to be strange issue to me. Something that popped up my mind today, I want to report: I actually installed the rootless docker some time ago. If I remember correctly this was in May this year. I could find some docker image I must have pulled rootlessly at this time point, hence, it did work before. There were no main changes to the server since that day beside of upgrades of openSUSE Tumbleweed.

I am not totally sure, but it also seems to me that this might be worth a bug report to maintainers of openSUSE, also because user rimelek in the docker forms also experiences the timeout issue within his installation. Nevertheless, a procedure to reproduce this behaviour is required first.

Best

Out of curiosity, have you tried uninstalling and reinstalling rootless?

ie, run:

/usr/bin/dockerd-rootless-setuptool.sh uninstall

And then rerun it with ā€˜install’.

Also, you’re not by chance using a proxy server, are you?

I did that before, and did that again. No change. I am not using any proxies.

1 Like

For anyone following along, @tilfischer found a resolution (I’m just full of puns today). slirp4netns specifically documents that /etc/resolv.conf can’t be a symlink, but openSUSE creates a symlink to it from /run/netconfig/resolv.conf. Replacing that with a real file seems to have fixed it.

Interesting, but I cannot explain this to me:

someuser@somehost:/> cat /etc/resolv.conf 
### /etc/resolv.conf is a symlink to /run/netconfig/resolv.conf
### autogenerated by netconfig!
#
# Before you change this file manually, consider to define the
# static DNS configuration using the following variables in the
# /etc/sysconfig/network/config file:
#     NETCONFIG_DNS_STATIC_SEARCHLIST
#     NETCONFIG_DNS_STATIC_SERVERS
#     NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig by setting:
#     NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other documentation.
#
### Call "netconfig update -f" to force adjusting of /etc/resolv.conf.
nameserver someip
nameserver someotherip

It says that /etc/resolv.conf is a symlink to /run/netconfig/resolve.conf . In my case, it was a symlink to /var/run/netconfig/resolv.conf . Why was that the case?

Following the issue to slirp4netns adding --copy-up:/var/run might help. If I simply symlink /etc/resolv.conf to what is explained in the file, it works seamlessly.

/var/run and /run are the same location. netconfig was changed (it probably does not deserve to be called ā€œfixedā€) to explicitly use /run/netconfig instead of /var/run/netconfig more than a year ago. Are you really using Tumbleweed (as is implied by the topic tag)?

Yes, I do use Tumbleweed. The symlink of /etc/resolv.conf to /var/run/netconfig/resolv.conf instead of /run/netconfig/resolv.conf causes all the trouble. It took some (…) time to find this.

Could that be related to my Tumbleweed installation running for many years already. I cannot remember exactly when I initially installed it, but that could have happened more than 8 years ago.

I can confirm that it’s an issue with Tumbleweed. I was able to reproduce the issue using the lxd image for Tumbleweed, which was updated to match the version I was running in VMware. Both were running the same version, but the lxd image failed for me, where the DVD-based install (updated to the most recent release as of last week) worked fine.

Moreover, cat /etc/resolv.conf inside the userspace network (ie, using nsenter) actually does report the same as the oustide the namespace, but using dig inside the namespace says it can’t read the file (specifically, it says the /etc/resolve.conf file is invalid), even though it is the same file as outside the namespace.