Docker is a fascinating concept that could be potentially useful in many ways, especially in Data science, and making reproducible workflows / environments. There are several articles which have great introductions and examples of using docker in data science
This is an evolving summary of my exploration with Docker. It should prove to be a handy refresher of commands and concepts.
TODO What is Docker
A brief summary of what Docker is all about.
- The main idea: disposable buckets of code that can do a specific task and either exit or run indefinitely.
- The task / purpose of the container could even be a single command. Like
pwd, which is piped into another container.
- In a way this is an extension of the Unix philosophy of small tools that can do a single task well (i.e reliably).
- The task / purpose of the container could even be a single command. Like
- These buckets of code can be connected with each other and also stacked on top of each other to form a pipeline.
- These buckets of code are complete libraries
- The buckets consist of images which can be launched as containers.
- Docker images are stored in a registry. There are a number of registries, of which dockerhub is popular.
These schematics provide a good refresher of the core concept of Docker:
Dive into Docker
This is an excellent course run by Nick Janatakis (link), which enabled me to tie together various bits and pieces of knowledge I had about Docker. I would recommend this course for anybody starting out with Docker. A lot of the notes in this document were gathered while going through the course.
Biggest wins of Docker
- isolate and manage applications.
- eg: 12 apps with 12 dependency sets.
- VM : waste of resources.
- Vagrant : lets you manage VM’s on the command line (including Docker)
- Disk space occupied for each app is very high.
- Overhead of system boot up and restart / killing is high.
- Docker can be used to manage common dependencies.
- Example of time frame: 2 seconds for loading 8 services.
- Spinning up an entire stack is very fast, compared to a VM.
- Docker: portability of applications and dev environment.
- Dozens of scenarios where something works for you but not for me.
- New dev environments can be discouraging. With all the libraries and dependencies already installed, it is possible to become aggressive with the actual development and experimenting with new technology.
- Multiple versions of a programming language can be installed within a single docker container.
- Smaller Microservices that talk to each other are not always good, but Docker enables this in a streamlined manner.
- LXC: raw linux containers. Existed long before docker.
- uses runC
- very complicated and brittle system.
- runs only on Linux.
- LXC’s are still better than VM’s for rapid build and deploy.
- ANSIBLE: what files and tools should be on a server (very basic definition)
Easy ways to get documentation help
- Just typing in
dockerwill provide a list of primary level commands that can be used.
- For further flags, provide the primary command like
docker run --help
- The official documentation is a good resource.
- Image: Setup of the virtual computer.
- Container: Instance of an image. Many containers can run with the same image.
TODO Running Emacs on Docker
- Note taken on
Matrix DS offers a viable alternative as a platform. However, a customised docker container with all my tools is a good way to reproduce my working environment and also share my work with the community.
Note taken on
This needs to be evaluated. Today I have a vague idea : set up a docker container combining Rocker + data science at the command line + Scimax together. A separate layer could also cater to shiny apps.
Silex - github : Also contains references to other kinds of Emacs docker containers
TODO Good Online resources for Rocker
- Introducing Rocker: Docker for R | R-Bloggers
- Rocker: Using R on Docker - A Hands-On Introduction - useR2015_docker.Pdf
- Jessie Frazelle’s Blog: Using an R Container for Analytical Models
- ROcker Images - Wiki Github
- Introduction to Docker - Paper
TODO Introduction to Rocker - Technical paper link
Note on Docker Toolbox versus Native apps
The native Docker application uses the type 1 hypervisor (hyperkit for Mac OS and hyper-V for Windows).
docker-machine uses a virtualbox based hypervisor (type 2). This can also be specified while creating docker machines.
In general, the native applications have a better user experience and commands can be directly typed into the terminal. The native apps (on Windows/ Mac OS) are newer than the Docker toolbox, and are being actively developed by the Docker company to reach performance on par with the original virtualbox based Docker Toolbox approach.
Note that any performance lag depends on the application and as a thumb rule it may be better to start off with the native applications and switch to the toolbox when required.
Installing Docker on debian
The docker repository has to be added first for being able to install docker. Detailed instructions are available at https://docs.docker.com/install/linux/docker-ce/debian/.
A package is also available, and is probably the easiest method to install. Choose the appropriate version at: https://download.docker.com/linux/debian/dists/
Manual version without using the package:
Adding Docker’s official GPG key:
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
Searching that the key has been installed:
sudo apt-key fingerprint 0EBFCD88
pub rsa4096 2017-02-22 [SCEA] 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 uid [ unknown] Docker Release (CE deb) email@example.com sub rsa4096 2017-02-22 [S]
Adding the stable Docker repository:
sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/debian \ $(lsb_release -cs) \ stable"
Update the package lists and now search for docker-ce. It should be available since the repository has been added and the list updated.
sudo apt-get update
Installing docker and necessary components. Note that the manual recommends removing any older installations if they exist.
Note from the manual that different versions of docker can be installed by including
sudo apt-get install docker-ce=VERSION=abcd. Therefore multiple versions can probably exist side by side.
sudo apt-get install docker-ce docker-compose docker-ce-cli containerd.io
Creating a docker group and adding this to the sudoers list will enable running docker commands without using root privileges (
sudo). A logout will be necessary to have the changes take effect.
Note: Sometimes the
$USER variable does not seem to work. This can be replaced with your actual user name.
sudo groupadd docker sudo usermod -aG docker $USER
To configure docker to start on boot, enable it as a service. The need to do this depends on how frequently you use docker commands.
sudo systemctl start docker
Installing Docker on Antergos / Arch Linux
Installation can be done via Pacman
sudo pacman -S docker
Enable and start docker service.
sudo systemctl enable docker sudo systemctl start docker
Add docker to the user’s group using
usermod. After adding this, a log-out is necessary. Note that $USER can be replaced with the output of
whoami in the shell if desired. If this step is not performed, each docker command will have to be executed with
sudo usermod -a -G docker $USER
Installing Docker on Mac OS
Docker can be downloaded as an app from the docker store : https://hub.docker.com/editions/community/docker-ce-desktop-mac.
On the Mac, the docker app has to be launched run first, and this will create a docker icon in the menu bar indicating the status of the docker machine. This launches the docker daemon, and then commands can be directly entered into the terminal.
Docker can also be installed using Brew:
brew cask install docker
This created an app in the Applications folder which has to be launched. However, it seems additional components are required to run Docker from the command Line. These are available via brew.
brew install docker-compose docker-machine
Checking the installation
Trying the hello world container as an additional check. Note the steps listed in the output, which is the typical process.
cd ~/docker-test docker run hello-world
General notes on containers and images
- images contain the entire filesystem and parameters needed to run the application.
- When an image is run, a container is created.
- containers are generally immutable and changes do not linger
- One image can spawn any number of containers, simultaneously. Each container will be separate.
Default location of images
By default, on Antergos (Linux), the images are stored at
sudo ls -al /var/lib/docker
Docker version and info
docker --version docker info docker version
Listing Docker containers and images
List Docker Images
docker image ls
List running Docker Containers
docker container ls
List all docker containers (running and Stopped)
docker container ls -a
Obtain only container ID’s (All). This is useful to extract the container number alone. The
q argument stands for quiet.
docker container ls -aq
If a local image is not found, docker will try to search and download the image from docker hub.
It is better to create a folder wherein the docker container will reside.
mkdir ~/docker-test/ cd ~/docker-test docker --rm -p 8787:8787 rocker/tidyverse
--rm flag indicates the container will be deleted when the container is quite. The
-p flag denotes using a particular port.
Note that the interim messages and download progress are not shown in eshell.
Different rocker images are available, depending on the need to be served.
-t and Interactive containers
Example to run an ubuntu container and run bash interactively, by attaching a terminal to the container. This will login to Ubuntu and start bash.
An alternative option is to use alpine linux, which is a much smaller download.
docker run -t -i ubuntu /bin/bash
docker run -ti alpine /bin/bash
Running a detached container
- use the
docker container ls -al docker run -d ubuntu
Build process of a docker image
docker commit: used to commit changes to a new image layer. This is a manual process. Commit has little place in the real world. Dockerfile is superior.
- Dockerfile : blue print or recipe for creating a docker image. Each actionable step becomes a separate layer.
Docker image : result of stacking up individual layers. Only the parts or layers that have changed are downloaded for a newer version of a specific image.
Scratch image: docker image with no base operating system
Working with dockerfiles
- sample or reference docker files can be saved as “dockerfile.finished” or with some other useful extension.
- Dockerfiles are read top to bottom.
- the first non-comment instruction should be
FROMallows you import a docker image.
RUN: basically executes the specified commands
WORKDIR: setting the desired working directory. This can be set or used multiple times in the same docker file.