I’ve been having issues with my Docker Compose setup, specifically with containers that require network access (e.g., Nginx Proxy Manager). After every reboot, I needed to restart those containers manually, otherwise they wouldn’t work.
After some investigation, I found that the service uses the following After target:
If I replace network.target with network-online.target (see NetworkTarget on freedesktop.org), all containers start automatically without any errors after reboot.
Good investigation, yes, it makes sense to have units that require network access to wait until the network is online and would make sense to me to make that the default.
I am not running persistent containers so I am not seeing this problem.
I checked /usr/lib/systemd/system/docker.service and yes that has:
But maybe that is fine, it is not docker that needs the network to be online, it are the docker applications you are running. So the best solutions would be to have those applications wait till the network is online, or even better have those containers fixed so that the see the network coming online and reconfigure.
Is that that all containers that needs network access do not work or only some? Can you list some containers you use?
First, I did some more testing. I rebooted five times withoutnetwork-online.target, and every time my containers had errors. When I rebooted five times withnetwork-online.target, three of those boots completed without any issues. So while this certainly helps, it’s not a perfect solution. I’ll investigate further, do you have any suggestions on what to look for?
About the applications that I use, these ones starts with errors that repeat in an infinite loop until I restart the manually:
searxng: httpx.ConnectError: [Errno -3] Temporary failure in name resolution
nginx-proxy-manager: nginx: [emerg] no name servers defined in /etc/nginx/conf.d/include/resolvers.conf:1
litellm: litellm.proxy.proxy_server._handle_llm_api_exception(): Exception occured - litellm.InternalServerError: AnthropicException - Cannot connect to host api.anthropic.com:443 ssl:<ssl.SSLContext object at 0x7f0b7316e670> [Temporary failure in name resolution]
cloudflare-ddns: IPv4 not detected via primary, trying fallback
These ones start without problems:
valkey
prometheus
postgres
open-webui: while it starts it cannot connect to online models until I restart litellm
As far as I understand docker containers don’t have built-in mechanisms to detect or react to network up/down events and based on the error messages you listed these applications have some retry mechanism implemented, the errors do not look like fatal but temporary errors as they should.
You could try to define HEALTHCHECK in your Dockerfile or use Docker Compose’s healthcheck option. Docker can restart unhealthy containers automatically. Probably you will need to experiment a bit with this.
For this, use depends_on with condition: service_healthy to ensure dependent services only start after their dependencies are actually ready (not just running).
Would also make sense to report the provider of the container, a temporary failure should resolve itself once that underlying problem is gone.