Docker: Images and Containers
In this lesson, we are going to understand the process of creating Docker images and Docker containers. Since Dockerfile is essential to create Docker images, we will understand the semantics of the Dockerfile as well.
In the previous lesson, we learned about the anatomy of the Docker containers and how containerization works in a nutshell. However, we didn’t really create any applications using Docker or understood the image building process from the scratch. In this lesson, we are going to do just that.
Before we move ahead, let’s understand some terminologies first. A Docker container is an ephemeral (short-lived) process. This process is similar to a guest operating system with its private file system, network ports, memory space, etc. You can perform all sorts of operating such as starting an HTTP server, starting a Database server, or any process to perform IO operation.
A Docker image is a blueprint of containers. Think of a Docker image as a class while a Docker container as an instance of that class. In a standard program, you would use new Image() to create a container. Similarly, we use the $ docker run <image> command to create a container from the image where image is the unique id of the Docker image.
However, to create an image, we need a Dockerfile. This is a plain text file that contains the instructions to assemble an image. These instructions are called build instructions or simply the instructions. Once the image is created, we can make containers from it.
The Docker engine orchestrates the creation of images and lifecycle of the containers among other things. The Docker daemon is a long-running process inside the Docker engine which handles these sorts of tasks. Docker engine exposes Docker daemon through a command-line interface (CLI) and a REST API interface. To install the Docker engine (and other resources), follow this documentation. It also comes with a GUI to manage containers.
In this lesson, we are going to create a Docker container that starts a simple HTTP server using ExpressJS. This HTTP server returns a JSON response with a list of users. And that’s about it, nothing more.
💡 You will need Node.js installed on your system to test the Docker application we are building. You can download it from this official website or use NVM (node version manager) CLI utility. However, this step is not necessary because we want to run this application inside a Docker container which will have the Node.js installation.
As we know, to create a container, we need a Docker image and to create a Docker image, we need a Dockerfile. Well, that’s not always true. Someone might have already created such an image that does the exact same thing. You can find public images on the Docker Hub registry. So you can just use the $ docker pull <image_name> command to download it from this registry.
But since we are trying to understand the process of creating Docker images, we are going to build an image on our own using a Dockerfile. You can find the source code of this project in this GitHub repository. So let’s create the project directory and install the dependencies.
$ mkdir docker-express-example && cd docker-express-example
$ npm init -y
$ npm install --save express
Above commands will generate package.json and package-lock.json files as well as node_modules directory inside our project. So far our project structure looks like this.
| ├── female.png
| └── male.png
We will talk about other files in our project, but for now, let’s focus on the server implementation. The db.json file contains a list of users. The server.js file serves the db.json from the / endpoint and images inside images/ directory from the /images endpoint.
The port at which this server will listen is received from the SERVER_PORT environment variable. Therefore, when we start the server, there need to provide a value for this environment variable. I have done that through the start command in package.json.
Now when we run the $ npm run start command from the project directory, we should receive the Server started at port 3000. message in the terminal and we should be able to see the JSON response in the browser at the http://localhost:3000 endpoint.
Now that our HTTP server is working as expected, let’s discuss why we want to containerize it. The USP of Docker to run applications in an isolated environment without having to install dependencies to run them. Also, we can spawn many replicas of the application really quickly as containers are cheap.
For example, if you wanted to share this application with your colleagues or users, then you have to share the source code and ask them to install Node.js to be able to run this application. Now, this can also lead to various problems such as runtime related issues (due to Node.js version difference), platform-related issues (Mac vs Windows), or other thousands of things.
To solve this issue, what we can do is to create a Docker image and put all the contents of this project inside it. This image will also have all the necessary dependencies to run this application such as a standard Node.js installation and whatnot. Then we can publically or privately publish the image so that other people can download it. Once they download the image, they can create a container from it and start the application (HTTP server) in a few seconds (or in a fraction of a second).
Alright. So we just got the gist of how Docker can be useful. So let’s start building our Docker image for this application. As we discussed, to build an image, we need a Dockerfile. This file is usually placed in the root directory of the project but we can place it anywhere in the project. More on that later.
Before we start working on the Dockerfile, let’s understand the requirements to successfully run this application. First of all, we need an operating system with Node.js installed (with NPM). Then we need package.json as well as package-lock.json files to install the dependencies. Then we need all the source code of the project. And then we need SERVER_PORT environment variable to launch the server. For this, the Dockerfile looks like below.
Dockerfile is a simple text file and it contains the instructions for the Docker daemon to assemble the image. Any text that starts with # is a comment but it can also be a parser directive. Let’s go through each instruction in our Dockerfile and understand their contribution to the image building process.
The FROM instruction sets a parent image for the new image we are about to create using this Dockerfile. The parent image includes all the necessary things we need such as an operating system environment (alpine) and Node.js installation (v12.20.0). You can look for official images on Docker Hub. Here, we are using node:12.20.0-alpine3.10 image as our parent image.
💡 In the node:12.20.0-alpine3.10 image name format, the node part is the image name while 12.20.0-alpine3.10 after the : is called a tag. If you do not provide the :tag part, :latest tag will be used by default. We will talk about this format in another lesson.
You might hear the term “base image” used interchangeably with the parent image but there is a slight difference. The base image is the last image in the parent-child hierarchy. For example, our new image is derived from the node image, the node image is derived from the alpine image (see here) and the alpine image is derived from the scratch image (see here).
Now the scratch image contains absolutely minimal things to work with the Docker engine and it’s the last image in the parent-child hierarchy which means it’s not based on any other image. You can use this image to install an operating system, software, tools, etc., and make a parent image. Therefore, any image that inherits scratch using the FROM scratch instruction is a base image. To create your own base image, follow this documentation.
💡 Instructions in the Dockerfile are case-insensitive which means you can also use from <image> instead of FROM <image>. But it is recommended to use uppercase letters for instruction names to increase the readability.
Ideally, FROM instruction should be the first instruction in the Dockerfile. We can have more than one FROM instructions in the Dockerfile in the case of a multi-stage build which we will cover in another lesson.
The second important instruction in the Dockerfile is WORKDIR. This instruction sets a working directory for the instructions in the Dockerfile (such as RUN, CMD, ENTRYPOINT, COPY, ADD, etc.) that are going to access or modify the filesystem of the image using a relative path. If this directory is not already present in the filesystem, it will be created while creating the image.
💡 We can have multiple WORKDIR instructions in the Dockerfile. Each instruction will set a new working directory for the next instructions. We can provide a relative path to the next WORKDIR instruction which will relative to the previous WORKDIR path.
In our case, we have set the /app directory (relative to the root of the filesystem of the image) as the working directory. Since this directory doesn’t exist in the filesystem of the node image, it will be created.
You can consider the working directory as the directory in the image where a terminal shell will be opened and all relative paths used in the commands to access or modify files will be relative to this directory.
Similarly, a build context is a directory on the host machine (on which we are creating the Docker image) which is sent to the Docker daemon while building an image that contains Dockerfile, .dockerignore and application-related files such as package.json, images/, .etc. This directory is provided via the $ docker build <path> command used to create the image where path is the path of the build context directory. In our case, it will be the path to docker-express-example directory.
The ADD instruction copies the file from the build context to the working directory (for relative destination paths) or any other directory (for absolute paths) of the image. We can copy one or more source files from the build context to a destination in the image. There are two ways to use this instruction as represented below. The latter is useful for files containing whitespaces.
ADD <src>... <dest>
ADD ["<src>",... "<dest>"]
💡 The ADD instruction can also copy files from remote URLs such as a Git repository. If src is a local tar archive file, then it will be extracted at the dest during the copy operation. Read more about this behavior from this documentation. This functionality is not supported in the COPY instruction.
In our case, we are first copying the package.json and package-lock.json file to the working directory (specified by ./ path) so that we can install the NPM dependencies. But doesn’t the build context already contain node_module directory which contains all the dependencies? Can’t we just copy it from the build context into the workspace directory?
First of all, we should really avoid sending large, unnecessary, and sensitive files to the Docker daemon which could get copied to the image using the ADD or COPY instructions. The .dockerignore file works just like .gitignore file but excludes the files from the build context while sending it to the Docker daemon. We have excluded node_module directory by adding node_modules line inside the .dockerignore file.
💡 If src path doesn’t belong to the build context such as ../file.txt, then ADD or COPY operation will fail.
The RUN instructions run commands at build time. This could be a bash command such as rm -rf ./file.txt or any complex command such as to download files, access or modify the file system, .etc. Results of the RUN commands are committed to the image such as a file modification. There are two flavors of RUN commands as illustrated below.
RUN ["executable", "param1", "param2"]
The former format of the instruction is called the shell form because it is executed with the /bin/sh -c by default while the latter is called the exec form since you can pass a custom executable.
In our case, we are running npm install command using the default shell of the image which installs NPM dependencies in the workspace directory.
The ENV instruction sets an environment variable in the image. The value of this instruction is a series of key=value pairs, however, you can have multiple ENV instructions in the Dockerfile.
ENV <key>=<value> ...
We can use the value of an environment variable in other instructions. For example, we have used the value of SERVER_PORT environment variable in the EXPOSE instruction. You can do the same for RUN or CMD instructions but there are some caveats while using the exec form, read more.
If you do not want an environment variable to be accessible in the container but have the presence during the build time only, then consider the ARG instruction. You can also override the default value of the ARG instruction using the --build-arg flag just like --env flag.
A container is an isolated environment with its own filesystem and network just like having a virtual guest operating system. Therefore a container does not use the ports of the host machine. All the ports in the container are closed behind a firewall and the host machine (the one running the container) can not access it.
EXPOSE <port> [<port>/<protocol>...]
The EXPOSE instruction instructs the container to open a certain port and make it public to the host. You can have as many EXPOSE instructions as you want in the Dockerfile. While creating a container from the image, you can then choose to bind a port on the host machine to one of these open ports to send the traffic from the host_port to container_port using the --publish or -p flag of the $ docker run command.
In our case, since the default value of SERVER_PORT environment variable is 8000, only the port 8000 will be exposed to the host when the container is running. But this value can be changed by changing the value of SERVER_PORT environment variable using the --env flag which means we can control which port is exposed during the runtime.
💡 You can use the --expose flag with the $ docker run command to expose a port from the container during runtime.
COPY <src>... <dest>
COPY ["<src>",... "<dest>"]
However, the COPY instruction does not support URLs and it does not extract sources that are tar archives. So it is recommended to use COPY instead of ADD to avoid mistakenly downloading files or extracting archives.
However, the place where COPY shines is the ability to copy files from another image using the --from=<name> flag with the COPY instruction. Here, the name is the name of the image that was generated in the multi-stage build process. We will learn more about this and the multi-stage build in a separate lesson.
In our case, we are copying files from the . to . which means copy the contents of the build context recursively into the workspace directory by honoring the excluded files mentioned inside the .dockerignore file.
The CMD instructions work just like the RUN instruction but they do have a big difference. As discussed, the RUN instruction executes a command during the build and commits the results of the command to the image.
CMD ["executable", "param1", "param2"]
However, the CMD command doesn’t get executed during the build. It is executed by default when we create the container. That means CMD will have an effect during runtime. So if you want to start a server or any process as soon as the container is created, CMD is the right choice.
Unlike other instructions, CMD can only appear once in the Dockerfile. If we have the ENTRYPOINT instruction in the Dockerfile, then there is no need to have CMD instruction, however, CMD can set default parameters for the ENTRYPOINT and they can be overridden by the arguments provided to the $ docker run command. We will talk more about the ENTRYPOINT instruction and its relation with the CMD in another lesson.
Image Building Process
I hope you are still with me because we are now about to discuss the cool part, building an image from the Dockerfile. Docker is shipped with a CLI to access the internal of the Docker engine. It enables us to build images, manage containers, and do pretty much anything the Docker engine can do.
You can check the Docker installation by running the docker --version command. To build an image, we use the docker build command.
$ docker build [options] PATH
Here, the PATH is the directory of build context we want to send to the Docker daemon. The options flags control how the image is created.
💡 PATH can also be an URL such as a Git repository. It can also be a tar archive file. If do not want to pass a build context, use the - as the PATH value.
Ideally, the Dockerfile should be in the root directory of the build context but it’s not a hard and fast rule. You can store Dockerfile anywhere in the filesystem and provide a path to it using the --file or -f flag. You can also provide Dockerfile through STDIN (read more).
First of all, since our image will be based on the node:12.20.0-alpine3.10 image, we would need that beforehand. To download a Docker image from Docker Hub, we use the docker pull command.
$ docker pull node:12.20.0-alpine3.10
However, the docker build command is smart enough to download the image automatically if not already present in our local repository. So the step above is not necessary. Let’s open the terminal in the root of the project directory and run the docker build command.
$ cd docker-express-example
$ docker build .
Here, . is the path to docker-express-example directory which would be the build context sent to Docker daemon and it contains the Dockerfile. Let’s see the result of this command execution.
The docker build command (Docker daemon) analyses the Dockerfile before sending the build context to the Docker daemon. Then it walks through the Dockerfile one instruction at a time. Since the first instruction is FROM, it will try to find a local copy of the node:12.20.0-alpine3.10 image, if not present, it will wait until it is download from the Dockerhub.
As we discussed in the previous lesson, a Docker image is composed of multiple read-only layers stacked on top of each other. You can see these layers of the node image in the log output above as the Docker daemon is downloading it from the Docker hub. Our image will be assembled in a similar manner as well.
Once the FROM instruction is done executing, the build context will be transferred to the Docker daemon and its size is 58.53kb. This would seem unusual at first, but remember that we have ignored the node_modules directory in the .dockerignore file, so it won’t be included.
The next instruction is setting WORKDIR which is considered as a step, hence it would be step 2 of the image building process. From here onwards, any instruction that uses a relative path to access or modify the filesystem of the image will be relative to the /app directory.
The next ADD instruction copies package.json and package-lock.json from the build context (same as ./package.json) to the workspace directory (since the . path is used as dest). After that, the RUN instruction executes the npm install command (shell form) and generates the node_modules directory.
Once the RUN instruction has successfully run, the ENV instruction sets the environment variable and the EXPOSE instruction exposes the port indicated by the SERVER_PORT value. These instructions are not considered as build steps but they are executed by the Docker daemon.
The COPY . . instruction copies contents of the build context to the workspace directory. This step is necessary since we need server.js, db.json and other files to run the application. However, this instruction also copies the package.json and package-lock.json we already copied using the ADD instruction in the beginning. So why we couldn’t use this COPY instruction at that place?
In the previous lesson we discussed the anatomy of a Docker image and how a container uses the Docker image. A docker image is made up of read-only layers. Each layer contains the files that were modified from the previous layer. Hence every RUN, COPY and ADD instructions create a separate layer. These layers increase the size of the image depending on how many files were modified or added.
While building an image, the Docker daemon also creates intermediate images temporarily for each instruction in the Dockerfile (after the FROM instruction) and once the final image is built, these intermediate images will be deleted, however, they will be a part of the build cache.
Whenever we try to build an image using a Dockerfile, the Docker daemon checks the build cache for the images that were created from the same parent image as mentioned in the Dockerfile it is analyzing. Then it will process the next instruction in the current Dockerfile and check if an intermediate image with the same instruction exists in the build cache.
If the instruction is different or the contents of the file(s) mentioned in the instruction such as ADD package.json are different than the intermediate image in the build cache, then the Docker daemon will stop looking for cache and create the image by processing the next instructions where the cache miss occurred. You can read more about the build cache from here.
💡 If you do not want to use the build cache while building an image, use the --no-cache=true flag with the docker build command. If you want to persist the intermediate images generated during the build, use the --rm=false flag.
Since the contents of the project’s source code are more likely to change, we add COPY . . instruction towards the end of the Dockerfile. Since package.json and package-lock.json files are less likely to change, while creating a new build, Docker can use the intermediate images from the build cache of the earlier builds for the ADD instruction. The cache miss may occur at the COPY instruction but by that time, we would have used most of the cache. This significantly decreases the build time.
💡 If you wish to delete all build cache, use the $ docker builder prune -a command. You could also use the $ docker system prune -a command which would delete all non running containers, images, networks as well.
The last instruction in the Dockerfile is the CMD but it won’t be executed by the Docker daemon. It will be registered and executed when a container is created from this Docker image. Once this step is completed, our docker image will be created.
To see the Docker images present locally, we use the $ docker images or $ docker image ls command. It lists all the docker images we have either pulled from the Docker Hub or created locally. At the moment, we can see the Docker image we have created a few seconds ago.
The REPOSITORY column contains the name of the image we just created and TAG column contains the label that is used to identify the different versions of the same image. We can provide these value using the --tag or -t flag with the $ docker build command. Since we haven’t used this flag, these values are empty (<none>). We will talk about these fields in another lesson.
Each Docker image will have a unique SHA-256 digest represented by the IMAGE ID column. When we list images using the $ docker images command, it only displays the first 12 characters. However, using the --no-trunc flag, you can see the full digest value.
💡 Docker generates this SHA-256 hash value for the image by looking at the contents of the image. If two Docker images contain the exact same files, their ID would be the same. However, you should not put all your trust on this logic.
Let’s create another build using the same Dockerfile but this time, we will name the image using the -t flag. The general format for the value of the --tag or -t is name:tag but we can drop the :tag part and Docker will use the :latest value by default. Let’s change the contents of .gitignore so that COPY . . instruction results in a cache miss.
This build took 3.7 seconds to finish, which is 10 seconds faster than the last build. The main reason for that is we already had a local copy of the node:12.20.0-alpine3.10 image from the previous build. So Docker daemon will not download it again from the Docker Hub.
If you inspect the output log above, all steps except the COPY are using the build cache (they are mentioned with the CACHED label). If we wouldn’t have changed the .gitignore file, COPY step (step5/5) would have also been picked up from the build cache as well. Hence we should always put instructions that would probably result in cache miss towards the end of the Dockerfile.
Now if we see the images again using the $ docker images command, we would see a new image with express:latest tag and a unique SHA256 digest value. We can use this tag value as well as the IMAGE ID to identify an image to perform some actions such a creating a container from it or deleting it.
Let’s remove the untagged image with id 64cbafb8165a. To remove an image, we use the $ docker rmi command (rmi stands for remove image). You would need to use the --force or -f flag with this command if a container created from this image is still running.
If we were to remove the second image, we could use $ docker rmi 78de5f4b5437 command as well as $ docker rmi express:latest. If we use the $ docker rmi express command, it will be expanded to the $ docker rmi express:latest command by default, so it would work as well.
You can use the $ docker inspect <image_id|image_name> command to see the information about an image in the JSON format. In the above screenshot, we are only extracting the Config property of the JSON returned by the docker inspect and python -m json.tool is used to pretty-print the JSON.
Running and Managing Containers
I hope you are still with me. So far we have managed to create a Docker image that will act as a blueprint to create containers. A container is a running application. Our express image contains an HTTP server application and when we create a container from it, it will run the $ node server command as specified by the CMD instruction in the Dockerfile.
To create a container from the image, we use the $ docker run [options] <image> command where the image argument is the unique SHA256 ID of the image or the image name (repo:tag) and options are the optional flags to control the behavior of the running container.
First of all, when we create a container using docker run, Docker will invoke the command associated with the CMD instruction to start the application which is node server in our case. If this command exits (completes), the container will be stopped. In our case, node server command halts the process, so the container will never stop running unless we send a stop signal (such as by pressing the CTRL-C).
💡 You should use the --interactive or -i and -t or --tty flag to pass signals from the terminal opened on the host to the container. You can use the -it flag as the combined flag to start a container in an interactive mode.
We would need to use the --publish or -p flag to send the traffic from the host to the container since a running container does not use the host’s ports. The value for this flag is host_port:container_port. In our case, we have exposed the 8000 port (default value of the SERVER_PORT env variable) for the HTTP server, we would use the -p "9000:8000" value to send traffic from the port 9000 on the host machine to the port 8000 of the container.
💡 If a container exposes multiple ports, we can have multiple -p flags in the docker run command. If you want to automatically assign random ports of the host machine to exposed ports of the container, then use the -P or --publish-all flag.
With the $ docker run -p "9000:8000" express:latest command, we create a container from express:latest (ID: 78de5f4b5437) image and send traffic from the 9000 port of the host to 8000 port of the container. So let’s open a browser and visit http://localhost:9000 URL.
The HTTP server running inside the container returns the list of users in the JSON format as designed in the application. This server will be running as long as the terminal above is opened. To stop the container, we just need to exit the process by terminating the HTTP server. Usually, we do that by pressing CTRL+C, but that won’t work this time. We need to use the --init flag while running a Node.js process. Read more from here.
Since we can’t exist out of the current terminal, let’s open a new terminal window and see the running containers. To see the running containers, we use the docker ps or docker container ls command.
From the above result, we can see that a container with ID 9b18ffbe6ed2 is up and running for 15 minutes. It was created from the express:latest image and its name is hopeful_euler. This unique name was automatically assigned by Docker but we can provide a custom name using the --name flag and it will be used to identify a container. The above result also displays the open ports and how it is connected to the host.
So let’s stop the container, we use the $ docker stop <container> command where container is the CONTAINER ID or the unique name. Once the container is stopped, the first terminal will exit and you won’t be able to access the http://localhost:9000 URL in the browser.
Once the container is no longer running, it won’t appear in the $ docker ps results. To see the entire list of running and nonrunning containers, we use the -a or --all flag with this command.
You have the ability to restart a stopped container using $ docker restart <container> command. If you want to pause the running container, you can use the $ docker pause <container> command. To unpause the paused container, we use the $ docker unpause <container> command.
Instead of restarting the stopped container, let’s create a new container from the express:latest image. This time, we will use the --init flag and give our new container a name.
Now we can use the CTRL-C to stop the running Node process which will stop the container and exit from the terminal. To remove the container, we use the $ docker rm <container> command. You may have to use the -f or --force flag to remove a running container.
💡 If you want to automatically remove the container when it exists, use the --rm flag with the $ docker run command.
Now, you would think, if I had to use this container in the production, do I need to keep the terminal open. That doesn’t sound very ideal. Well, you are correct. We don’t want to keep the container process (server) coupled with the terminal but rather, we want to run it as a background process. To run a container in the background, or more precisely, in a detached mode, we need to use -d or --detach flag.
When we run a container in the detached mode, the $ docker run command returns the SHA256 ID of the container and exists the terminal session. The container is running in the background as you can see from the results of $ docker ps command. You can use the $ docker stop command to stop the container which will stop the HTTP server process in the container.
You can override the default CMD command by providing extra arguments in the $ docker run command. For this, we would need to use the -it flag to start the container in an interactive mode so that we can control the shell of the container from the terminal opened on the host machine.
Here, we are overriding the default CMD command (node server.js) with sh command to start a shell session. The shell process will be started in the WORKDIR directory of the container. There, we can manually start and exist the HTTP server. Once we exit this the shell process (using exit), the container will stop as the only process it was keeping it alive is dead.
I have published this Docker image on the Docker Hub with the name thatisuday/express-example. Hence you can download it from the Docker Hub using $ docker pull thatisuday/express-example command or run it directly using the $ docker run -it --init thatisuday/express-example command. If you don’t want to start the server while creating the container, use $ docker run -it --init thatisuday/express-example sh instead.
You can find the Dockerfile and source code of the examples used in this lesson from the following GitHub repository. Feel free to give a pull request if you want to improve this image or its documentation.
- Dockerfile best practices: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
- Docker CLI: https://docs.docker.com/engine/reference/commandline/cli/
- Docker Engine API: https://docs.docker.com/engine/api/