Updated: May 10, 2015
Let me start with a big promise. You will absolutely LOVE this article today. It's going to be long, detailed and highly useful. Think GRUB, GRUB2. The same thing here. Only we will tackle Docker, a nice distribution platform that wraps the Linux Containers (LXC) technology in a simple, convenient way.
I will show you how to get started, and then we will create our own containers with SSH and Apache, learn how to use Dockerfiles, expose service ports, and solve an immense number of little bugs and problems that normally never get addressed in public forums. Please, without further ado, follow me.
I have given a brief overview of the technology in a Gizmo's Freeware article sometime last year. Now, we are going to get serious about using Docker. First, it is important to remember that this framework allows you to use LXC in a convenient manner, without having to worry about all the little details. It is the next step in this world, the same way OpenStack is the next evolutionary step in the virtualization world. Let me give you some history and analogies.
Virtualization began with software that lets you abstractize your hardware. Then, to make things speedier, virtualization programs began using hardware acceleration, and then you also got paravirtualization. In the end, hypervisors began popping up like mushrooms after rain, and it became somewhat difficult to provision and manage them all. This is the core reason for concepts like OpenStack, which hide different platforms under a unified API.
The containers began their way in a similar manner. First, we had the chroot, but processes running inside the jailed environment shared the same namespace and fought for the same resources. Then, we got the kexec system call, which let us boot into the context of another kernel without going through the BIOS. Then, control groups came about, allowing us to partition system resources like CPU, memory and others into subgroups, thus allowing better control, hence the name, of processes running on the system.
Later on, the Linux kernel began offering a full isolation of resources, using cgroups as the basic partitioning mechanism. Technically, this is a system-level virtualization technology, allowing you to run multiple instances of the running kernel on top of the control host inside self-contained environments, with the added bonus of very little performance penalty and overhead.
Several competing technologies tried to offer similar solutions, like OpenVZ, but the community eventually narrowed down its focus to the native enablement inside the mainline kernel, and this seems to be the future direction. Still, LXC remains somewhat difficult to use, as a fair amount of technical knowledge and scripting is required to get the containers running.
This is where Docker comes into place. It tries to take away the gritty pieces and offer a simple method of spawning new container instances without worrying about the infrastructure backend. Well, almost. But the level of difficulty is much less.
Another strong advantage of Docker is a widespread community acceptance, as well as the emphasis on integration with cloud services. Here we go full buzzword, and this means naming some of the big players like AWS, Hadoop, Azure, Jenkins and others. Then we can also talk about Platform as a Service (Paas), and you can imagine how much money and focus this is going to get in the coming years. The technological landscape is huge and confusing, and it's definitely going to keep on changing and evolving, with more and more concepts and wrapper technologies coming into life and building on top of Docker.
But we want to focus on the technological side. Once we master the basic, we will slowly expand and began utilizing the strong integration capabilities, the flexibility of the solution, and work on making our cloud ecosystem expertise varied, automated and just pure rad. That won't happen right now, but I want to help you navigate the first few miles, or should we say kilometers, of the muddy startup waters, so you can begin using Docker in a sensible, efficient way. Since this is a young technology, it's Wild West out there, and most of the online documentation, tips, tutorials and whatnot are outdated, copy & paste versions that do not help anyone, and largely incomplete. I want to fix that today.
A bit more boring stuff before we do some cool things. Anyhow, Docker is mostly about LXC, but not just. It's been designed to be extensible, and it can also interface with libvirt and systemd. In a way, this makes it almost like a hyper-hypervisor, as there's potential for future growth, and when additional modules are added, it could effectively replace classic hypervisors like Xen or KVM or anything using libvirt and friends.
This be a public domain image, if you wondered.
We will demonstrate using CentOS 7. Not Ubuntu. Most of the online stuff focuses on Ubuntu, but I want to show you how it's done using as-near-as-enterprise flavor of Linux as possible, because if you're going to be using Docker, it's gonna be somewhere business like. The first thing is to install docker:
yum install docker-io
Once the software is installed, you can start using it. However, you may encounter the following two issues the first time you attempt to run docker commands:
docker <any one command>
FATA Get http:///var/run/docker.sock/v1.18/images/json: dial unix /var/run/docker.sock: no such file or directory. Are you trying to connect to a TLS-enabled daemon without TLS?
And the other error is:
docker <any one command>
FATA Get http:///var/run/docker.sock/v1.18/containers/json: dial unix /var/run/docker.sock: permission denied. Are you trying to connect to a TLS-enabled daemon without TLS?
The reason is, you need to start the Docker service first. Moreover, you must run the technology as root, because Docker needs access to some rather sensitive pieces of the system, and interact with the kernel. That's how it works.
systemctl start docker
Now we can go crazy and begin using Docker.
The basic thing is to run docker help to get the available list of commands. I will not go through all the options. We will learn more about them as we go along. In general, if you're ever in doubt, you should consult the pretty decent online documentation. The complete CLI reference also kicks ass. And then, there's also an excellent cheat sheet on GitHub. But our first mission will be to download a new Docker image and then run our first instance.
There are many available images. We want to practice with CentOS. This is a good starting point. An official repository is available, and it lists all the supported images and tags. Indeed, at this point, we need to understand how Docker images are labeled.
The naming convention is repository:tag, for example centos:latest. In other words, we want the latest CentOS image. But the require image might as well be centos:6.6. All right, let's do it.
Now let's list the images by running the docker images command:
As we've seen in my original tutorial, the simplest example is to run a shell:
docker run -ti centos:centos7 /bin/bash
So what do we have here? We are running a new container instance with its own TTY (-t) and STDIN (-i), from the CentOS 7 image, with a BASH shell. With a few seconds, you will get a new shell inside the container. Now, it's a very basic, very stripped-down operating system, but you can start building things inside it.
Let's setup a Web server, which will also have SSH access. To this end, we will need to do some rather basic installations. Grab Apache (httpd) and SSHD (openssh-server), and configure them. This has nothing to do with Docker, per se, but it's a useful exercise.
How, some of you may clamor, wait, you don't need SSH inside a container, it's a security risk and whatnot. Well, maybe, yes and no, depending on what you need and what you intend to use the container for. But let's leave the security considerations aside. The purpose of the exercise is to learn how to setup and run ANY service.
You might want to start your Apache using an init script or a systemd command. This will not quite work. Specifically for CentOS, it comes with systemd, but more importantly, the container does not have its own systemd. If you try, the commands will fail.
systemctl start httpd
Failed to get D-Bus connection: No connection to service manager.
There are hacks around this problem, and we will learn about some of these in a future tutorial. But in general, given the lightweight and simple nature of containers, you do not really need a fully fledged startup service to run your processes. This does add some complexity.
To run Apache (HTTPD), just execute /usr/sbin/httpd - or an equivalent command in your distro. The service should start, most likely with a warning that you have not configured your ServerName directive in httpd.conf. We have learned how to do this in my rather extensive Apache guide.
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.4. Set the 'ServerName' directive globally to suppress this message
With SSHD, run /usr/sbin/sshd.
/usr/sbin/sshd -f /etc/ssh/sshd_config
Could not load host key: /etc/ssh/ssh_host_rsa_key
Could not load host key: /etc/ssh/ssh_host_dsa_key
Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Could not load host key: /etc/ssh/ssh_host_ed25519_key
You will also fail, because you won't have all the keys. Normally, startup scripts take of this, so you will need to run the ssh-keygen command once before the service starts correctly. Either one of the two commands will work:
/usr/bin/ssh-keygen -t rsa -f <path to file>
ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519
Now, inside the container, we can see that Apache is indeed running.
ps -ef|grep apache
apache 87 86 0 10:47 ? 00:00:00 /usr/sbin/httpd
apache 88 86 0 10:47 ? 00:00:00 /usr/sbin/httpd
apache 89 86 0 10:47 ? 00:00:00 /usr/sbin/httpd
apache 90 86 0 10:47 ? 00:00:00 /usr/sbin/httpd
apache 91 86 0 10:47 ? 00:00:00 /usr/sbin/httpd
But what if we want to check external connectivity? At this point, we have a couple of problems at our hand. One, we have not setup any open ports, so to speak. Two, we do not know what the IP address of our container is. Now, if you try to run the ifconfig inside the BASH shell, you won't get anywhere, because the necessary package containing the basic networking commands is not installed. Good, because it makes our container slim and secure.
Like with any Web server, we will need to allow incoming connections. We will use the default port 80. This is no different than port forwarding in your router, allowing firewall policies and whatnot. With Docker, there are several ways you can achieve the desired result.
When starting a new container with the run command, you can use -p option to specify which ports to open. You can choose a single port or a range of ports, and you can also map both the host port (hostPort) and container port (containerPort). For instance:
docker run -ti -p 22:22 -p 80:80 image-1:latest
FATA Error response from daemon: Cannot start container 64bd520e2d95a699156f5d40331d1aba972039c3c201a97268d61c6ed17e1619: Bind for 0.0.0.0:80 failed: port is already allocated
There are many additional considerations. IP forwarding, bridged networks, public and private networks, subnet ranges, firewall rules, load balancing, and more. At the moment, we do not need to worry about these.
There is also an additional method of how we can expose port, but we will discuss that later on, when we touch on the topic of Dockerfiles, which are templates for building new images. For now, we need to remember to run our images with the -p option.
If you want to leave your host ports free, then you can omit the hostPort piece. In that case, you can connect to the container directly, using its IP address and Web server port. To do that, we need to figure our the container details:
docker inspect <container name or ID>
This will give a very long list of details, much like the KVM XML config, except this one is written in JSON, which is another modern and ugly format for data. Readable but extremely ugly.
docker inspect distracted_euclid
We can narrow it down to just the IP address.
docker inspect <container name or ID> | grep -i "ipaddr"
Let's start fresh. Launch a new instance, setup Apache, start it. Open a Web browser and test. If it works, then you have properly configured your Web server. Exactly what we wanted.
docker run -it -p 80:80 centos:centos7 /bin/bash
If we check the running container, we can see the port mapping - the output is split over multiple lines for brevety, so please excuse that. Normally, the all-uppercase titles will show as the row header, and then, you will get all the rest printed below, one container per line.
# docker ps
CONTAINER ID IMAGE COMMAND
43b179c5aec7 centos:centos7 "/bin/bash"
CREATED STATUS PORTS
2 hours ago Up 2 hours 0.0.0.0:80->80/tcp
And in the browser, we get:
Optional: Now, the internal IP address range will only be accessible on the host. If you want to make it accessible from other machines, you will need your NAT and IP forwarding. And if you want to use names, then you will need to properly configure the /etc/hosts as well as DNS. For container, this can be done using the --add-host="host:IP" directive when running a new instance.
Another note: Remember that Docker has its own internal networking, much like VirtualBox and KVM, as we've seen in my other tutorials. It's a fairly extensive /16 network, so you have quite a lot of freedom. On the host:
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
inet6 fe80::5484:7aff:fefe:9799 prefixlen 64 scopeid 0x20<link>
ether 56:84:7a:fe:97:99 txqueuelen 0 (Ethernet)
RX packets 6199 bytes 333408 (325.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11037 bytes 32736299 (31.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
We need to do the same exercise with SSH. Again, this means exposing port 22, and we have several options available. To make it more interesting, let's try with a random port assignment:
docker run -ti -p 20 -p 80 centos:centos7 /bin/bash
And if we check with docker ps, specifically for ports:
0.0.0.0:49176->22/tcp, 0.0.0.0:49177->80/tcp boring_mcclintock
This means you can connect to the docker0 IP address, ports as specified above in the docker ps command output, and this equivalent to actually connecting to the container IP directly, on their service port. This can be useful, because you do not need to worry about the internal IP address that your container uses, and it can simplify forwarding. Now, let's try to connect. We can use the host port, or we can use the container IP directly.
ssh 172.17.42.1 -p 49117
Either way, we will get what we need, for instance:
The authenticity of host '172.17.0.5 (172.17.0.5)' can't be established. ECDSA key fingerprint is 00:4b:de:91:60:e5:22:cc:f7:89:01:19:3e:61:cb:ea.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.17.0.5' (ECDSA) to the list of known hosts.
We will fail because we do not have the root password. So what do we do now? Again, we have several options. First, try to change the root password inside the container using the passwd command. But this won't work, because the passwd utility is not installed. We can then grab the necessary RPM and set it up inside the container. On the host, check the dependencies:
rpm -q --whatprovides /etc/passwd
But this is a security vulnerability. We want our containers to be lean. So we can just copy the password hash from /etc/shadow on the host into the container. Later, we will learn about a more streamlined way of doing it.
Another thing that strikes quite clearly is that we are repeating all our actions. This is not efficient, and this is why we want to preserve changes we have done to our container. The next section handles that.
After you've made changes to the container, you may want to commit it. In other words, when starting a new container later on, you will not need to repeat all the steps from scratch, you will be able to reuse your existing work and save time and bandwidth. You can commit an image based on its ID or its alias:
docker commit <container name or ID> <new image>
For example, we get the following:
docker commit 43b179c5aec7 myapache3
Check the list of images again:
A more streamlined way of creating your images is to use Dockerfiles. In a way, it's like using Makefile for compilation, only in Docker format. Or an RPM specfile if you will. Basically, in any one "build" directory, create a Dockerfile. We will learn what things we can put inside one, and why we want it for our Apache + SSH exercise. Then, we will build a new image from it. We can combine it with our committed images to preserve changes already done inside the container, like the installation of software, to make it faster and save network utilization.
Before we go any further, let's take a look at a Dockerfile that we will be using for our exercise. At the moment, the commands may not make much sense, but they soon will.
CMD ["/usr/sbin/sshd", "-D"]
RUN mkdir -p /run/httpd
CMD ["/usr/sbin/httpd", "-D", "FOREGROUND"]
What do we have here?
As you can see, Dockerfiles aren't that complex or difficult to write, but they are highly useful. You can pretty much add anything you want. Using these templates form a basis for automation, and with conditional logic, you can create all sorts of scenarios and spawn containers that match your requirements.
Once you have a Dockerfile in place, it's time to build a new image. Dockerfiles must follow a strict convention, just like Makefiles. It's best to keep different image builds in separate sub-directories. For example:
docker build -t test5 .
Sending build context to Docker daemon 41.47 kB
Sending build context to Docker daemon
Step 0 : FROM myapache4:latest
Step 1 : EXPOSE 22 80
---> Using cache
Step 2 : CMD /usr/sbin/sshd -D
---> Using cache
Step 3 : RUN mkdir -p /run/httpd
---> Using cache
Step 4 : CMD /usr/sbin/httpd -D FOREGROUND
---> Using cache
Successfully built d892acd86198
The command tells us the following: -t repository name from a Dockerfile stored
in the current directory (.). That's all. Very simple and elegant.
Run a new container from the created image. If everything went smoothly, you should have both SSH connectivity, as well as a running Web server in place. Again, all the usual network related rules apply.
Once you have the knowledge how do it on your own, you can try one of the official Apache builds. Indeed, the Docker repository contains a lot of good stuff, so you should definitely invest time checking available templates. For Apache, you only need the following in your Dockerfile - the second like is optional.
COPY ./public-html/ /usr/local/apache2/htdocs/
What do we have above? Basically, in the Dockerfile, we have the declaration what template to use. And then, we have a COPY instructions, which will look for a public-html directory in the current folder and copy it into the container during the build. In the same manner, you can also copy your httpd.conf file. Depending on your distribution, the paths and filenames might differ. Finally, after building the image and running the container:
docker run -ti -p 22 -p 80 image-1:latest
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.17. Set the 'ServerName' directive globally to suppress this message
[Thu Apr 16 21:08:35.967670 2015] [mpm_event:notice] [pid 1:tid 140302870259584] AH00489: Apache/2.4.12 (Unix) configured -- resuming normal operations
[Thu Apr 16 21:08:35.976879 2015] [core:notice] [pid 1:tid 140302870259584] AH00094: Command line: 'httpd -D FOREGROUND'
There are many good reasons why you want to use this technology. But let's just briefly focus on what we gain by running these tiny, isolated instances. Sure, there's a lot happening under the hood, in the kernel, but in general, the memory footprint of spawned containers is fairly small. In our case, the SSH + Apache containers use a tiny fraction of extra memory. Compare this to any virtualization technology.
Let's go back to the Apache example, and now you will also learn why so many online tutorials sin the sin of copy & pasting information without checking, and why most of the advice is not correct, unfortunately. It has to do with, what do you do if your Apache server seems to die within a second or two after launching the container? Indeed, if this happens, you want to step into the container and troubleshoot. To that end, you can use the docker exec command to attach a shell to the instance.
docker exec -ti boring_mcclintock /bin/bash
Then, it comes down to reading logs and trying to figure out what might have gone wrong. If your httpd.conf is configured correctly, you will have access and error logs under /var/log/httpd:
[auth_digest:error] [pid 25] (2)No such file or directory: AH01762: Failed to create shared memory segment on file /run/httpd/authdigest_shm.25
A typical problem is that you may be a missing /run/httpd directory. If this one does not exist in your container, httpd will start and die. Sounds so simple, but few if any reference mentions this.
While initially playing with containers, I did encounter this issue. Reading online, I found several suggestions, none of which really helped. But I do want to elaborate on them, and how you can make progress in your problem solving, even if intermediate steps aren't really useful.
Suggestion 1: You must use -D FOREGROUND to run Apache, and you must also use ENTRYPOINT rather than CMD. The difference between the two instructions is very subtle. And it does not solve our problem in any way.
CMD ["-D", "FOREGROUND"]
Suggestion 2: Use a separate startup script, which could work around any issues with the starting or restarting of the httpd service. In other words, the Dockerfile becomes something like this:
COPY ./run-httpd.sh /run-httpd.sh
RUN chmod -v +x /run-httpd.sh
And the contents of the run-httpd.sh script are along the lines of:
rm -rf /run/httpd/*
exec /usr/sbin/apachectl -D FOREGROUND
Almost there. Remove any old leftover PID files, but these are normally not stored under /run/httpd. Instead, you will find them under /var/run/httpd. Moreover, we are not certain that this directory exists.
Finally, the idea is to work around any problems with the execution of a separation shell inside which the httpd thread is spawned. While it does provide us with additional, useful lessons on how to manage the container, with COPY and RUN instructions, it's not what we need to fix the issue.
Step 3 : EXPOSE 80
---> Using cache
Step 4 : COPY ./run-httpd.sh /run-httpd.sh
Removing intermediate container 7ff5b58b40bf
Step 5 : RUN chmod -v +x /run-httpd.sh
---> Running in 56fadf4dd2d4
mode of '/run-httpd.sh' changed from 0644 (rw-r--r--) to 0755 (rwxr-xr-x)
Removing intermediate container 56fadf4dd2d4
Step 6 : CMD /run-httpd.sh
---> Running in f9c6b30795e2
Removing intermediate container f9c6b30795e2
Successfully built b2dcc2818a27
This won't work, because apachectl is an unsupported command for managing httpd, plus we have seen problems using startup scripts and utilities earlier, and we will work on fixing this in a separate tutorial.
docker run -ti -p 80 image-2:latest
Passing arguments to httpd using apachectl is no longer supported. You can only start/stop/restart httpd using this script. If you want to pass extra arguments to httpd, edit the /etc/sysconfig/httpd config file.
But it is useful to try these different things, to get the hang of it. Unfortunately, it also highlights the lack of maturity and the somewhat inadequate documentation for this technology out there.
There are many ways you can interact with your container. If you do not want to attach a new shell to a running instance, you can use a subset of docker commands directly against the container ID or name:
docker <command> <container name or ID>
For instance, to get the top output from the container:
docker top boring_stallman
If you have too many images, some of which have just been used for testing, then you can remove them to free up some of your disk space. This can be done using the docker rmi command.
# docker rmi -f test7
Then, you can also run your containers in the background. Using the -d flag will do exactly that, and you will get the shell prompt back. This is also useful if you do not mask signals, so if you accidentally break in your shell, you might kill the container when it's running in the foreground.
docker run -d -ti -p 80 image-3:latest
You can also check events, examine changes inside a container's filesystem as well as check history, so you basically have a version control in place, export or import tarred images to and from remote locations, including over the Web, and more.
If you read through the documentation, you will notice you can connect to a running container using either exec or attach commands. So what's the difference, you may ask? If we look at the official documentation, then:
The docker exec command runs a new command in a running container. The command started using docker exec only runs while the container's primary process (PID 1) is running, and it is not restarted if the container is restarted.
On the other hand, attach gives you the following:
The docker attach command allows you to attach to a running container using the container's ID or name, either to view its ongoing output or to control it interactively. You can attach to the same contained process multiple times simultaneously, screen sharing style, or quickly view the progress of your daemonized process. You can detach from the container (and leave it running) with CTRL-p CTRL-q (for a quiet exit) or CTRL-c which will send a SIGKILL to the container. When you are attached to a container, and exit its main process, the process's exit code will be returned to the client.
In other words, with attach, you will get a shell, and be able to do whatever you need. With exec, you can issue commands that do not require any interaction, but with you use a shell in combination with exec, you will achieve the same result as if you used attach.
Start is used to resume the execution of a stopped container. It is not used to start a fresh instance. For that, you have the run command. The choice of words could have been better.
The first command is used to create a new image from a Dockerfile. On the other hand, the latter is used to create a new container using command line options and arguments. Create lets you specify container settings, too, like network configurations, resource limitations and other settings, which affect the container from the outside, whereas the changes implemented by the build command will be reflected inside it, once you start an instance. And by start, I mean run. Get it?
There are a million more things we can do: using systemd enabled containers, policies, security, resource constraints, proxying, signals, other networking and storage options including the super-critical question of how to mount data volumes inside containers so that data does not get destroyed when containers die, additional pure LXC commands, and more. We've barely scratched the surface. But now, we know what to do. And we'll get there. Slowly but surely.
I recommend you allocate a few hours and then spend some honest time reading all of the below, in detail. Then practice. This is the only way you will really fully understand and embrace the concepts.
My entire virtualization section
Dockerizing an SSH Deamon Service
Differences between save and export in Docker
Docker Explained: Using Dockerfiles to Automate Building of Images
We're done with this tutorial for today. Hopefully, you've found it useful. In a nutshell, it does explain quite a few things, including how to get started with Docker, how to pull new images, run basic containers, add services like SSH and Apache, commit changes to a file, expose incoming ports, build new images with Dockerfiles, lots of troubleshooting of problems, additional commands, and more. Eventful and colorful, I'd dare say.
In the future, we will expand significantly on what we learned here, and focus on various helper technologies like supervisord for instance, we will learn how to mount filesystems, work on administration and orchestration, and many other cool things. Docker is a very nice concept, and if used correctly, it can make your virtual world easier and more elegant. The initial few steps are rough, but with some luck, this guide will have provided you with the right dose of karma to get happily and confidently underway. Ping me if you have any requests or desires. Technology related, of course. We're done.P.S. If you like this article, then you'd better give some love back to Dedoimedo!
Donate to Dedoimedo!
Do you want to
help me take early retirement? How about donating
some dinero to