Author: Tobit Flatscher (2021 - 2023)
Deploying a piece of software in a portable manner is a non-trivial task. Clearly there are different operating system and different software architectures which require different binary code, but even if these match you have to make sure that the compiled code can be executed on another machine by supplying all its dependencies.
Over the years several different packaging systems for different operating systems have emerged that provide methods for installing new dependencies and managing existing ones in an coherent manner. The low-level package manager for Debian-based Linux operating systems is dpkg
, while for high-level package management, fetching packages from remote locations and resolving complex package relations, generally apt
is chosen. apt
handles retrieving, configuring, installing as well as removing packages in an automated manner. When installing an apt
package it checks the existing dependencies and installs only those that are not available yet on the system. The dependencies are shared, making the packages smaller but not allowing for multiple installations of the same library and potentially causing issues between applications requiring different versions of the same library. Contrary to this the popular package manager snap
uses self-contained packages which pack all the dependencies that a program requires to run, allowing for multiple installations of the same library. Self-contained boxes like these are called containers, as they do not pollute the rest of the system and might only have limited access to the host system. The main advantage of containers is that they provide clean and conistent environments as well as isolation from the hardware.
Docker is another framework for working with containers. A Docker - contrary to snap
- is not integrated in terms of hardware and networking but instead has its own IP address, adding an extra layer of abstraction. A Docker container is similar to a virtual machine but the containers share the same kernel like the host system: Docker does not virtualise on a hardware level but on an app level (OS-level virtualisation). For this Docker builds on a virtualization feature of the Linux kernel, namespaces, that allows to selectivelty grant processes access to kernel resources. As such Docker has its own namespaces for mnt
, pid
, net
, ipc
as well as usr
and its own root file system. As a Docker container uses the same kernel, and as a result also the same scheduler one might achieve native performance. At the same time this results in issues with graphic user interfaces as these are not part of the kernel itself and thus not shared between the container and the host system. These problems can be worked around though mostly.
Using Docker brings a couple of advantages as it strongly leverages on the decoupling of the kernel and the rest of the operating system:
- Portability: You can run code not intended for your particular Linux distribution (e.g packages for Ubuntu 20.04 on Ubuntu 18.04 and vice versa) and you can mix them, launching several containers with different requirements on the same host system by means of dedicated orchestration tools such as Kubernetes or Docker Swarm. This is a huge advantage for robotics applications as one can mix containers with different ROS distributions on the same computer running in parallel, all running on the same kernel of the host operating system, governed by the same scheduler.
- Performance: Contrary to a virtual machine the performance penalty is very small and for most applications is indistinguishable from running code on the host system: After all it uses same kernel and scheduler as the host system.
- Furthermore one can also run a Linux container on a Windows or MacOS operating system. This way you lose though a couple of advantages of Docker such as being able to run real-time code as there will be a light-weight virtual machine underneath emulating a Linux kernel. Furthermore you can also use it from the Windows Subsystem for Linux which allows you to stream graphic user interfaces onto the host system through X-Server.
This way one can guarantee a clean, consistent and standardised build environment while maintaining encapsulation and achieving native performance.
The core component of Docker are so called images, immutable read-only templates, that hold source code, libraries and dependencies. These can be layered over each other to form more complex images. Containers on the other hand are the writable layer on top of the read-only images. By starting an image you obtain a container: Images and containers are not opposing objects but they should rather be seen as different phases of building a containerised application.
The Docker daemon software manages the different containers that are available on the system: The generation of an image can be described by a so called Dockerfile
. A Dockerfile is like a recipe describing how an image can be created from scratch. This file might help also somebody reconstruct the steps required to get a code up and running on a vanilla host system without Docker. It is so to speak self-documenting and does not result in an additional burden like a wiki. Similarly one can recover the steps performed to generate an image with $ docker history --no-trunc <image_id>
. Dedicated servers, so calle Docker registries (such as the Docker Hub or Github's GHCR), allow you to store and distribute your Docker images. These image repositories hold different images so that one does not have to go through the build process but instead can upload and download them directly, speeding up deployment. Uploads might also be triggered by a continuous integration workflow like outlined here.
On top of this there go other toolchains for managing the lifetime of containers and orchestration multiple of them such as Docker-Compose, Swarm or Kubernetes.
This makes Docker in particular suitable for deploying source code in a replicable manner and will likely speed-up your development workflow. Furthermore one can use the description to perform tests or compile the code on a remote machine in terms of continuous integration. This means for most people working professional on code development it comes at virtually no cost.
Docker is installed pretty easily. The installation guide for Ubuntu can be found here. It basically boils down to a five steps:
$ sudo mkdir -m 0755 -p /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
After installation you will want to make sure that Docker can be run without sudo
as described here:
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
Finally log out and log back in again.
Similarly the usage is very simple, you need a few simple commands to get a Docker up and running which are discussed in the file below.
As discussed above you can pull an image from the Dockerhub and launch it as a container. This can be done by opening a console and typing:
$ docker run hello-world
hello-world
is an image intended for testing purposes. After executing the command you should see some output to the console that does not contain any error messages. In case you are not able to run the command above, prepend it with sudo
and retry. If this works please go back to the previous section and enable sudo
less Docker as this will be crucial for e.g. the Visual Studio Code set-up.
If you want to find out what other images you could start from just browse the Dockerhub, e.g. for Ubuntu. You will see that there are different versions with corresponding tags available. For example to run a Docker with Ubuntu 20.04 installed you could use the tag 20.04
or focal
resulting e.g. in the command
$ docker run ubuntu:focal
This should not output anything and should immediately return. The reason for this is that each container has an entrypoint. This script will be run and as soon as it terminates the container will return to the command line. This is actually the basic idea of a Docker container: A container should be responsible for a single service. Once this service stops it should return again.
If you want to keep the container open you have to open it in interactive mode by specifying the flag -i
and the -t
for opening a terminal
$ docker run -it ubuntu:focal
In case you need to run another command in parallel you might have to open a second terminal and connect to this Docker. In this case it is more convenient to relaunch the container with a specified name
(or use the default one displayed by docker run
)
$ docker run -it --name ubuntu_test ubuntu:focal
Now we can connect to it from another console with
$ docker exec -it ubuntu_test sh
The last command sh
corresponds to the type of connection, in our case shell
.
With the $ exit
command the Docker can be shut down.
Now that we have seen how to start a container from an existing image let us build a Dockerfile
that defines steps that should be executed on the image:
# Base image
FROM ubuntu:focal
# Define the workspace folder (e.g. where to place your code)
# We define a variable so that we can re-use it
ENV WS_DIR="/code"
WORKDIR ${WS_DIR}
# Copy your code into the folder (see later for better alternatives!)
COPY . WORKDIR
# Use Bourne Again Shell as default shell
SHELL ["/bin/bash", "-c"]
# Disable user dialogs in apt installation messages
ARG DEBIAN_FRONTEND=noninteractive
# Commands to perform on base image
RUN apt-get -y update \
&& apt-get -y install some_package \
&& git clone https://github.com/some_user/some_repository some_repo \
&& cd some_repo \
&& mkdir build \
&& cd build \
&& cmake .. \
&& make -j$(nproc) \
&& make install \
&& rm -rf /var/lib/apt/lists/*
# Enable apt user dialogs again
ARG DEBIAN_FRONTEND=dialog
# Define the script that should be launched upon start of the container
ENTRYPOINT ["/code/src/my_script.sh"]
When saving this as Dockerfile
(without a file ending) and type:
$ docker build -f Dockerfile .
Then read the available images with
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> latest <image_id> ... ...
You should see your newly created image.
You can launch it with
$ docker run -it <image_id>
As soon as you starting building complex containers you will see that the compilation might be quite slow as a lot of data might have be to installed. If you want to execute it again though or you add a command to the Dockerfile
the container will start pretty quickly. Docker itself caches the compiled images and re-uses them if possible. In fact each individual RUN
etc. command forms a layer of its own that might be re-used. It is therefore crucial to avoid conflicts between different layers, e.g. by introducing each apt-get -y install
with an apt-get update
. They have to be combined in the same RUN
command though to be effective. Similarly you can benefit from this caching by ordering the different layers from less frequent to most frequently changed. This way you might reduce the time you spend re-compiling significantly.
At the same time it is also important to make the images as slim as possible removing all undesired artifacts from the images that are not necessary. For apt
this means deleting the apt
list after each layer again
rm -rf /var/lib/apt/lists/*
A simple solution to this are multi-stage builds where multiple FROM
statements are used within the same Dockerfile and parts are selectively copied between the different stages. This way everything that is not needed can be left behind.
Additionally one should set the DEBIAN_FRONTEND
environment variable to noninteractive
before installing any packages with apt
. Else building the Dockerfile might fail!
As you have seen above we copied the data of the current directory into the container with the COPY
command. This means though that our changes will not affect the local code, instead we are working on a copy of it. This is often not desirable. Most f the time you actually want to mount your folders and shared them between the host system and the container.
There are several approaches for managing data with a Docker container. Generally one stores the relevant code outside the Docker on the host system, mounting files and directories into the container, while leaving any built files inside it. This way the files relevant to both systems can be accessed inside and outside the Docker. This is generally done with volumes. This results in additional flags that have to be supplied when running the Docker:
$ docker run -it <image_id> --volume:<from_host_directory>:<to_container_directory>
As you can imagine when specifying start-up options like -it
, mounting volumes, setting up the network configuration, display settings for graphic user interfaces, passing a user name to be used inside the Docker etc. the commands for building and running Docker containers can get pretty lengthy and complicated such as the one below:
$ docker run -it --volume=../src:/code/src --name=my_container --env=DISPLAY \
--env=QT_X11_NO_MITSHM=1 --volume=/tmp/.X11-unix:/tmp/.X11-unix:rw \
--volume=/tmp/.docker.xauth:/tmp/.docker.xauth:rw --entrypoint='/bin/bash' <image_id>
To simplify this process people often create bash scripts that store these arguments for them. Instead of typing a long command commonly one just call scripts like build_image.bash
and container_start.bash
. This can though be quite unintuitive for other users as there does not exist a common convention for doing so. Therefore tools like Docker-Compose, which is discussed in the next section, try to simplify this process by providing standardized configuration files for it.
In any case try to avoid the privileged
flag. If you run into any issue with not being able to do something, running the container as privileged
will almost always solve it but there will be a more clean way. The privileged
option breaks encapsulation and as such might pose a security risk.
At the same time it makes sense to pass crucial information into the container by means of environment variables.
Instead of creating our Docker image from a Docker file we might want to use a more complex existing one from a Docker registry. For this we will use the official one in the following example, the Dockerhub.
Let's begin by logging in and pulling an existing image from an existing repository that you might have found browsing the Dockerhub.
$ docker login --username=<user> --email=<[email protected]> # If it is public we can pull also without logging in
$ docker pull <repo>:<tag> # Pull an image from the server
This should give as a new image on our local computer that we can run
$ docker images # List all locally available images
REPOSITORY TAG IMAGE ID CREATED SIZE
<repo> <tag> <image_id> ... ...
$ docker run -it <image_id>:<tag> bin/bash # Run the image as a container
<user>@<container_id>:/#
Now we can make changes to the container and finally exit it with exit
. We should be able to see it with the following command:
$ docker ps -a # Show all containers
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<container_id> <image_id>:<tag> ... ... ... ... ...
Finally we can commit our changes to a new image and push this image to the Dockerhub as follows:
$ docker commit <container_id> <image_name>:<tag> # Commit container to new image
$ docker images # List all available images
REPOSITORY TAG IMAGE ID CREATED SIZE
<image_name> <tag> <image_id> ... ...
<repo> ... ... ... ...
$ docker tag <image_id> <user>/<repo>:<tag> # Tag the image
$ docker push <user>/<repo> # Push the image to the server
Accessing most Docker registries will require internet access. Therefore when dealing with a slow network connection or an offline computer, sometimes it might be convenient to save a Docker image to the disk, copy them to the machine without internet access and load them onto that system:
$ docker save <repo>:<tag> > <file.tar> # Save Docker to file
$ docker load --input <file.tar> # Load Docker on other computer without internet access
There are different tools available that simplify the management and the orchestration of these Docker containers, such as Docker-Compose. As said supplying arguments to Docker for building and running a Dockerfile
often results in lengthy shell
scripts. One way of decreasing complexity and tidying up the process is by using Docker-Compose. It is a tool that can be used for defining and running multi-container Docker applications but is also very useful for a single container. In a Yaml file such as docker-compose.yml
one describes the services that an app consists of (see here for the syntax) and which options it should be started with. There are though a few corner cases where Docker-Compose is not powerful enough. For example it can't execute commands on the host system in order to obtain parameters that are then passed to the Docker. Parameters have to be supplied in the form of text or in the form of an environment file.
Prior to Ubuntu 20.04 Docker-Compose had to be installed separately like described here. For Ubuntu 20.04 onwards it should come with Docker directly. Depending on the version you will have to call it with
$ docker compose --version
or $ docker-compose --version
.
For example the rather complicated Docker run command given before could be expressed in Docker-Compose with the hierarchical yml
file, generally named docker-compose.yml
:
version: "3.9"
services:
my_service:
build:
context: ..
dockerfile: docker/Dockerfile
container_name: my_container
tty: true
environment:
- DISPLAY=${DISPLAY}
- QT_X11_NO_MITSHM=1
volumes:
- ../src:/code/src
- /tmp/.X11-unix:/tmp/.X11-unix:rw
- /tmp/.docker.xauth:/tmp/.docker.xauth:rw
command: '/bin/bash'
After having created both a Dockerfile
as well as a docker-compose.yml
you can launch them with:
$ docker compose -f docker-compose.yml build
$ docker compose -f docker-compose.yml up
where with the option -f
a Docker-Compose file with a different filename can be provided. If not given it will default to docker-compose.yml
.
More generally such a file might hold multiple services:
version: "3.9"
services:
some_service: # Name of the particular service (Equivalent to the Docker --name option)
build: # Use Dockerfile to build image
context: . # The folder that should be used as a reference for the Dockerfile and mounting volumes
dockerfile: Dockerfile # The name of the Dockerfile
container_name: some_container
stdin_open: true # Equivalent to the Docker -i option
tty: true # Equivalent to the Docker docker run -t option
volumes:
- /a_folder_on_the_host:/a_folder_inside_the_container # Source folder on host : Destination folder inside the container
another_service:
image: ubuntu/20.04 # Use a Docker image from Dockerhub
container_name: another_container
volumes:
- /another_folder_on_the_host:/another_folder_inside_the_container
volumes:
- ../yet_another_folder_on_host:/a_folder_inside_both_containers # Another folder to be accessed by both images
If instead you wanted only to run a particular service you could do so with:
$ docker compose -f docker-compose.yml run my_service
Then similarly to the previous section one is able to connect to the container from another console with
$ docker compose exec <docker_name> sh
where <docker_name>
is given by the name specified in the docker-compose.yml
file and sh
stands for the type of comand to be execute, in this case we open a shell
.