Updated: February 25, 2019
Hopefully, you will never really need to be reading this article, and you only happen to be here because you are bored or did a wrong kind of search. Or you might actually be facing an issue where your Docker containers do not have Internet access anymore, even though they used to work well, just recently, and you've made no changes to your environment.
This sounds like a very vague problem statement, but this is what I was facing all of a sudden. My containers did not have network access, with the error like Temporary failure resolving URL. It looked like an issue with name resolution, and it bugged me extra, because it was not supposed to happen. But we're getting ahead of ourselves. Test host: Ubuntu with systemd - important for latter. Let's proceed slowly.
Problem in more detail
You have configured Docker on your system. It's working well. You have multiple images, multiple containers, and you even used advanced network rules, and everything seems to be in perfect order. Then, you notice that you can no longer do certain activities in your containers, like updates or package installations. The best way to debug this is to attach a shell to a running container instance, as I've explained in the intro guide. Indeed, inside a running container, you see something like:
# apt-get update
Err:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Temporary failure resolving 'archive.ubuntu.com'
0% [Connecting to security.ubuntu.com]
This looks like a DNS problem (name resolution). The container is unable to figure out how to resolve domain names to IP addresses, and therefore it cannot connect to the servers to grab data, like updates. There are two issues here. One, why did the problem come to bear all of a sudden? Two, we need to actually fix the network. Now, I'd like to show you how to troubleshoot this in an elegant way.
First, let's understand why the issue occurred. There's no immediate answer here, but some things to look at may include more than just your immediate environment. For example, your host may not have changed, but the network could have - routers, network policies, DNS servers themselves. Since you cannot normally control what's happening outside your immediate setup, the best way to figure out where the issue resides is by doing a step-by-step isolation of the problem.
- If your host system has network and cannot resolve URLs, you will need to sort that out first. There's most likely a problem between your host and the destination, and whatever network infrastructure exists in between.
- If your host system has network and can correctly resolve URLs, the issue is in specifically how Docker containers resolve URLs. This is where we need to focus next.
- Check if the Docker network interface is up and running (with a command like ip or ifconfig). If not, you will next to fix that first before moving on to the next step.
- Check if the container instance has an IP address. If not, you will need to fix that.
docker inspect <container name or ID> | grep -i "ipaddr"
- Check if you get the same results inside the container if possible (ip or ifconfig). If the result does not match what you have seen in the previous command, you will need to fix that.
- Check if you can ping the container from the outside (and vice versa, if the ping command is available). If ping works, this is probably a good indication that the network is configured correctly, and that there are no firewall rules blocking the traffic (most likely).
If all these checks return no strange problems or errors, the next step is to focus on DNS resolution. The configuration for that will be available in the /etc/resolv.conf configuration file. This is true both for physical instances of Linux as well as virtual machines and containers. You will probably notice that the container uses something like:
# See man:systemd-resolved.service(8) for details about the supported modes
# of operation for /etc/resolv.conf.
The nameserver IP addresses will most likely be an external IP address (something like your ISP) or the localhost (127.0.0.X). The question is: Do these match your host's /etc/resolv.conf file?
The answer is, you will most probably have localhost defined in your host's /etc/resolv.conf file. Now, try that in your container. Edit the resolv.conf file and replace the IP address in the nameserver line with the one that matches the host's value = localhost. If that solves your issue, great. But most likely, it won't.
But then, at this point, you have no problem with network connectivity on your host. So we need to figure out the REAL address of the DNS server in your environment. And this is further complicated by the fact most modern Linux distributions use systemd. The plot thickens.
Find out DNS with systemd
So yes. We need to figure out what the nameserver is, and we will need to use a systemd command for that. Most likely, if you have systemd in your system, you are also using systemd-resolve, the network name resolution manager and service. The configuration is stored under /etc/systemd/resolved.conf. But you can also obtain the results on the command line with the systemd-resolve command:
Link 3 (wlp59s0)
Current Scopes: DNS
LLMNR setting: yes
MulticastDNS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNS Servers: 10.50.34.1
DNS Domain: dedoimedo
What do we have here? A lot of interesting stuff. But what really matters is the line that reads DNS Servers. This is what we want. Place this IP address into the container /etc/resolv.conf file and try again. You should have the network working again.
Why this problem then?
Now, we can discuss the why again. If you look at the systemd-resolve documentation, it is possible that there was some change in your system (possibly even due to a regular update) whereby the selected mode of operation causes some conflicts with the name resolution. In particular, if we look at the way /etc/resolv.conf is handled, then the first mode states:
systemd-resolved maintains the /run/systemd/resolve/stub-resolv.conf file for compatibility with traditional Linux programs. This file may be symlinked from /etc/resolv.conf. This file lists the 127.0.0.53 DNS stub (see above) as the only DNS server. It also contains a list of search domains that are in use by systemd-resolved. The list of search domains is always kept up-to-date. Note that /run/systemd/resolve/stub-resolv.conf should not be used directly by applications, but only through a symlink from /etc/resolv.conf. This file may be symlinked from /etc/resolv.conf in order to connect all local clients that bypass local DNS APIs to systemd-resolved with correct search domains settings. This mode of operation is recommended.
If one of the links in this equation is broken, or something changed, it's possible that the Docker service can't really determine what gives, and you end up with no name resolution. So the resolution [sic] is to provide the actual network DNS address, which the containers can understand and use. This is a bit speculative on my side, but I think it's quite correct.
Here we go, another mystery demisted, another windscreen demystified. Or something. I don't like half-magic solutions, but when you have a super-complicated, layered system infrastructure, sometimes the solutions are just as bad as the problem. Not in that they don't fix the issue - in that you have less visibility and control than you respectfully should. But that's the future of Linux - and everything IT - endless abstraction.
On topic, hopefully this little guide should provide you with a relatively quick and painless fix for your container adventures. If you're using Docker, and the name resolution no longer works inside the container instances, perhaps you should test the tips and tricks written above - and then build your own images with the right fix in place, so you don't need to fight this all over again. And there you go, the end.