Notes on Docker

Docker is a fascinating concept that could be potentially useful in many ways, especially in Data science, and making reproducible workflows / environments. There are several articles which have great introductions and examples of using docker in data science

This is an evolving summary of my exploration with Docker. It should prove to be a handy refresher of commands and concepts.

TODO What is Docker

A brief summary of what Docker is all about.

  1. The main idea: disposable buckets of code that can do a specific task and either exit or run indefinitely.
    1. The task / purpose of the container could even be a single command. Like pwd, which is piped into another container.
    2. In a way this is an extension of the Unix philosophy of small tools that can do a single task well (i.e reliably).
  2. These buckets of code can be connected with each other and also stacked on top of each other to form a pipeline.
  3. These buckets of code are complete libraries
  4. The buckets consist of images which can be launched as containers.
  5. Docker images are stored in a registry. There are a number of registries, of which dockerhub is popular.

These schematics provide a good refresher of the core concept of Docker:

Containers versus VM

Docker Engine components

Containers versus VM

Engine Components

Dive into Docker

This is an excellent course run by Nick Janatakis (link), which enabled me to tie together various bits and pieces of knowledge I had about Docker. I would recommend this course for anybody starting out with Docker. A lot of the notes in this document were gathered while going through the course.

Biggest wins of Docker

  • isolate and manage applications.
  • eg: 12 apps with 12 dependency sets.
  • VM : waste of resources.
  • Vagrant : lets you manage VM’s on the command line (including Docker)
    • Disk space occupied for each app is very high.
    • Overhead of system boot up and restart / killing is high.
  • Docker can be used to manage common dependencies.
    • Example of time frame: 2 seconds for loading 8 services.
    • Spinning up an entire stack is very fast, compared to a VM.
  • Docker: portability of applications and dev environment.
  • Dozens of scenarios where something works for you but not for me.
  • New dev environments can be discouraging. With all the libraries and dependencies already installed, it is possible to become aggressive with the actual development and experimenting with new technology.
  • Multiple versions of a programming language can be installed within a single docker container.
  • Smaller Microservices that talk to each other are not always good, but Docker enables this in a streamlined manner.
  • LXC: raw linux containers. Existed long before docker.
    • uses runC
    • very complicated and brittle system.
    • runs only on Linux.
    • LXC’s are still better than VM’s for rapid build and deploy.
  • ANSIBLE: what files and tools should be on a server (very basic definition)

Easy ways to get documentation help

  • Just typing in docker will provide a list of primary level commands that can be used.
  • For further flags, provide the primary command like docker run --help
  • The official documentation is a good resource.

Definitions

  1. Image: Setup of the virtual computer.
  2. Container: Instance of an image. Many containers can run with the same image.

TODO Running Emacs on Docker

  • Note taken on [2019-07-07 Sun 17:25]
    Matrix DS offers a viable alternative as a platform. However, a customised docker container with all my tools is a good way to reproduce my working environment and also share my work with the community.
  • Note taken on [2019-07-06 Sat 17:54]
    This needs to be evaluated. Today I have a vague idea : set up a docker container combining Rocker + data science at the command line + Scimax together. A separate layer could also cater to shiny apps.

  • https://www.christopherbiscardi.com/2014/10/17/emacs-in-docker/

  • Silex - github : Also contains references to other kinds of Emacs docker containers

TODO Good Online resources for Rocker

  • Introducing Rocker: Docker for R | R-Bloggers
  • Rocker: Using R on Docker - A Hands-On Introduction - useR2015_docker.Pdf
  • Jessie Frazelle’s Blog: Using an R Container for Analytical Models
  • ROcker Images - Wiki Github
  • Introduction to Docker - Paper

Installation

Note on Docker Toolbox versus Native apps

The native Docker application uses the type 1 hypervisor (hyperkit for Mac OS and hyper-V for Windows). docker-machine uses a virtualbox based hypervisor (type 2). This can also be specified while creating docker machines.

In general, the native applications have a better user experience and commands can be directly typed into the terminal. The native apps (on Windows/ Mac OS) are newer than the Docker toolbox, and are being actively developed by the Docker company to reach performance on par with the original virtualbox based Docker Toolbox approach.

Note that any performance lag depends on the application and as a thumb rule it may be better to start off with the native applications and switch to the toolbox when required.

Installing Docker on debian

The docker repository has to be added first for being able to install docker. Detailed instructions are available at https://docs.docker.com/install/linux/docker-ce/debian/.

A package is also available, and is probably the easiest method to install. Choose the appropriate version at: https://download.docker.com/linux/debian/dists/

Manual version without using the package:

Adding Docker’s official GPG key:

curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

Searching that the key has been installed:

sudo apt-key fingerprint 0EBFCD88

pub rsa4096 2017-02-22 [SCEA] 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 uid [ unknown] Docker Release (CE deb) docker@docker.com sub rsa4096 2017-02-22 [S]

Adding the stable Docker repository:

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/debian \
   $(lsb_release -cs) \
   stable"

Update the package lists and now search for docker-ce. It should be available since the repository has been added and the list updated.

sudo apt-get update

Installing docker and necessary components. Note that the manual recommends removing any older installations if they exist.

Note from the manual that different versions of docker can be installed by including sudo apt-get install docker-ce=VERSION=abcd. Therefore multiple versions can probably exist side by side.

sudo apt-get install docker-ce docker-compose docker-ce-cli containerd.io

Creating a docker group and adding this to the sudoers list will enable running docker commands without using root privileges (sudo). A logout will be necessary to have the changes take effect.

Note: Sometimes the $USER variable does not seem to work. This can be replaced with your actual user name.

sudo groupadd docker
sudo usermod -aG docker $USER

To configure docker to start on boot, enable it as a service. The need to do this depends on how frequently you use docker commands.

sudo systemctl start docker

Installing Docker on Antergos / Arch Linux

Installation can be done via Pacman

sudo pacman -S docker

Enable and start docker service.

sudo systemctl enable docker
sudo systemctl start docker

Add docker to the user’s group using usermod. After adding this, a log-out is necessary. Note that $USER can be replaced with the output of whoami in the shell if desired. If this step is not performed, each docker command will have to be executed with Sudo elevation.

sudo usermod -a -G docker $USER

Installing Docker on Mac OS

Docker can be downloaded as an app from the docker store : https://hub.docker.com/editions/community/docker-ce-desktop-mac.

On the Mac, the docker app has to be launched run first, and this will create a docker icon in the menu bar indicating the status of the docker machine. This launches the docker daemon, and then commands can be directly entered into the terminal.

Docker can also be installed using Brew:

brew cask install docker

This created an app in the Applications folder which has to be launched. However, it seems additional components are required to run Docker from the command Line. These are available via brew.

brew install docker-compose docker-machine

Checking the installation

docker info

Trying the hello world container as an additional check. Note the steps listed in the output, which is the typical process.

cd ~/docker-test
docker run hello-world

Checking docker-compose version.

docker-compose --version

General notes on containers and images

  • images contain the entire filesystem and parameters needed to run the application.
  • When an image is run, a container is created.
  • containers are generally immutable and changes do not linger
  • One image can spawn any number of containers, simultaneously. Each container will be separate.

Default location of images

By default, on Antergos (Linux), the images are stored at /var/lib/docker/

sudo ls -al /var/lib/docker

Docker version and info

docker --version
docker info
docker version

Listing Docker containers and images

List Docker Images

docker image ls

List running Docker Containers

docker container ls

List all docker containers (running and Stopped)

docker container ls -a

Obtain only container ID’s (All). This is useful to extract the container number alone. The q argument stands for quiet.

docker container ls -aq

Getting started

Ropenscilabs has a basic introduction to Docker, and the Docker documentation is also a good place to start. A rocker specific introduction is available here.

If a local image is not found, docker will try to search and download the image from docker hub.

It is better to create a folder wherein the docker container will reside.

mkdir ~/docker-test/
cd ~/docker-test
docker --rm -p 8787:8787 rocker/tidyverse

The --rm flag indicates the container will be deleted when the container is quite. The -p flag denotes using a particular port. iner a Note that the interim messages and download progress are not shown in eshell.

Different rocker images are available, depending on the need to be served.

Attaching shells -t and Interactive containers -i

Example to run an ubuntu container and run bash interactively, by attaching a terminal to the container. This will login to Ubuntu and start bash.

An alternative option is to use alpine linux, which is a much smaller download.

docker run -t -i ubuntu /bin/bash
docker run -ti alpine /bin/bash

Running a detached container

  • use the -d flag
docker container ls -al
docker run -d ubuntu

Build process of a docker image

  • docker commit : used to commit changes to a new image layer. This is a manual process. Commit has little place in the real world. Dockerfile is superior.
  • Dockerfile : blue print or recipe for creating a docker image. Each actionable step becomes a separate layer.

Docker image : result of stacking up individual layers. Only the parts or layers that have changed are downloaded for a newer version of a specific image.

Scratch image: docker image with no base operating system

Working with dockerfiles

  • sample or reference docker files can be saved as “dockerfile.finished” or with some other useful extension.
  • Dockerfiles are read top to bottom.
  • the first non-comment instruction should be FROM
    • FROM allows you import a docker image.
  • RUN : basically executes the specified commands
  • WORKDIR : setting the desired working directory. This can be set or used multiple times in the same docker file.