Docker & data volumes - Tutorial

Updated: September 25, 2015

For a change, today, we will have a relatively short and uncomplicated article on Docker containers automation framework. We will learn how to mount data inside containers, how to share data, and all the needed bits and pieces to make our instances into useful systems.

So far, you've enjoyed - and that's a liberal term - a very detailed intro guide, then we dabbled in services, networking, and solved some tough errors. This tutorial continues the tradition, step by step, so it should be simple and crystal clear. Let us begin.


The basics

The process for exposing host data inside a container is relatively simple, and it resembles the mount command. You provide a source directory or file, and point it to a target directory or file. Much like ports, which consists of the host and container pieces.

docker run -it -v /root/testing:/dedoimedo centos:latest /bin/bash

We are running a new container with the Bash shell, and we are going to mount the host's /root/testing directory inside the container under /dedoimedo. On the host, the source can be anything, including NFS mounts, which effectively turn this into a lovely remote data sharing thingie. In practice, NFS prevents re-exports, but this is cheating really, as you're still doing the same thing on the same host, sort of.

Mount point inside the container

Inside the container, you can now check the contents of the /dedoimedo directory, and see what's stored inside. However, on a standard, default CentOS build, which is our test case here, you will probably hit an error. And this is relevant, because you may encounter a similar problem, and should be ready and willing to debug, plus, it shows that security features can sometimes be detrimental.

ls: cannot open directory .: Permission denied

Security features. Yes, of course, SELinux. Luckily, the issue can be fixed. You can change the enforcement policy for the module, or you can add a new rule that will allow containers access to the mounted volume.

setenforce 0

And the rule, if you want to go that way:

chcon -Rt svirt_sandbox_file_t /<volume>

Then you will see the contents, and you can start playing, writing files, etc. Indeed, if we first look at the host, the create a new file inside the mounted directory, we will see something like below, excluding the actual contents of course:

Create new file inside the container

Directory contents on the host

Sharing between containers

Much like what we did with networking, you may want to expose data inside one of the containers to others, so they become Data Volume Containers. This might be very useful if you group your containers functionality, plus you can save significant amounts of space. Moreover, you can also chain the volumes between containers, which allows you to narrow down how many different containers have access to the host data simultaneously. This can be useful if you don't have a robust write locking mechanism on the host, which is almost always true for UNIX-like systems.

Let's do a quick example. We will start a container with the name of data-source. It will start with the /data mount point. Optionally, we could make sure that /data is mapped to a directory on the host, but it's fine for now. Then, we will create a second container, which will use a volume from the data-source named container.

docker run -ti --name data-source -v /data centos:latest /bin/bash
docker run -ti --volumes-from data-source centos:latest /bin/bash

Now, if we look at our two containers, both will have the /data directory, except data-source will be the so-called entry point for the data for all containers that have been run with the -volumes-from option. You can save a lot of data space, control how your containers use and access the original data, and you can more easily test multiple software configurations. Very neat, overall.

[root@68d3ed5e243a data]# pwd
[root@68d3ed5e243a data]# touch STUFF
[root@68d3ed5e243a data]# 

And the second container:

[root@7aed935cc156 /]# ls -la /data
total 4
drwxr-xr-x.  2 root root   18 May  1 11:46 .
drwxr-xr-x. 18 root root 4096 May  1 11:45 ..
-rw-r--r--.  1 root root    0 May  1 11:46 STUFF
[root@7aed935cc156 /]#

Two containers, same data

What's next?

Believe it or not, we will stop at this point. There are some other steps we could do, like backups, data removal and such, but that's a separate topic. For the time being, we have enough to begin working with data volumes and expose them inside containers as mount points. Anyhow, as a rule of thumb, you might want to use read only data for environment tools and configurations, volume containers for big data, small, simple directories and mount points for testing, and such.


Docker volume handling is relatively simple, and I believe this aspect of the container technology still needs to grow before it can reach the maturity level like the other pieces of the framework. It's not too feature rich, and there should be additional ways to control the data, including snapshots, built-in support for data distribution and parallelization, and more.

Well, at the very least, you've learned a few new things in this tutorial, including how to create mount points and mount data volumes, how to chain them, as well as fix some small niggles along the way. Perhaps this guide feels a bit naked compared to the rest, but we don't always need heavyweight, neverending articles. Sometimes, it's fine to keep it short and sweet. Indeed, on that bombshell, the end.


You may also like: