With Docker’s rising
popularity, many people are building and publishing their own images. It’s easy to get started and build. It feels like going
back to the days before configuration management arrived, with lots of messy bash
scripting. Unsurprisingly Docker Hub has it’s fair share of poorly written
Dockerfiles.
This article is an attempt to structure Dockerfiles
better with some tips and keeping in mind how to make them the smallest size
possible.
Layers
It is important to understand that Docker images are
based on layers, every command in the Dockerfile will produce one. The
rule of thumb is to create the least number of layers possible and separate the ones
that rarely change from those that change frequently.
Structuring the Dockerfile
If your Docker image typically installs a package, adds a
couple of files then runs the app, then it’s good to adhere to what I call the
“FMERAEEC” method; that’s using the: FROM, MAINTAINER, ENV, RUN, ADD, ENTRYPOINT,
EXPOSE & CMD commands. Of course not everything will fit into this.
FROM
Use your preferred base image, which one you use is
entirely up to you. It is preferable not to use the latest tag as you need to know
when you base image changes, to verify that the app still runs ok.
debian:jessie tends to be a popular base image on
the Docker Hub. We’ll discuss slim images later on.
MAINTAINER
Use the RFC compliant format e.g.
MAINTAINER Tom Murphy <tom@bluemalkin.net>
ENV
If you are installing packages, it is a good idea to
specify which version of the main package is being installed. e.g.
ENV NGINX_VERSION 1.9.12
RUN
This layer will frequently change, so the golden rule is
to chain up where possible, all the bash commands into a single RUN.
Typically you will get the list of packages,
install the package(s) (using the version specified in ENV) then cleanup the list of
packages.
Then you may run some configuration change
commands such as sed or create symbolic links for the log files to the standard
output/error etc… e.g.
|
RUN apt-get
update &&
\
apt-get
install -qy
nginx=${NGINX_VERSION}
&& \
rm
-rf
/var/lib/apt/lists/*
|
Building from source
If you are frequently building your container which
requires using something built from source, it is preferable not to build from
source. Those builds will take longer to accomplish, plus with the dependencies
required, you may end up with a large sized layer. Instead you may want to have a
separate process which builds, packages and stores them somewhere. Then your main
app build can fetch and install the packages.
ADD
Add your configuration files, artifacts built by your
CI/CD etc… Plus your entrypoint.sh script
Use ADD rather than COPY as ADD allows for
additional sources, plus COPY will be deprecated.
A common misconception is the destination path in the
container automatically creates any missing parent directories, therefore there’s no
need to RUN mkdir commands.
ENTRYPOINT
When you run your docker container, the entrypoint script
will run first. This is a good place to make some configuration changes to the
container, based on any environment variables you pass in at run time, then
issue:
exec "$@" which executes the CMD
command.
A classic one many forget, is to add the execute bit to
the script. Ensure it’s set in your source control, rather than running an
additional RUN command to chmod the file.
EXPOSE
Expose the container port(s) regardless of whether you
will use Docker links or bind to the host.
CMD
Finally CMD instructs the command (and options such as
run in foreground) to run for the container.
It is preferred to put the command and options in the
form of:
CMD ["executable","param1","param2"]
Other commands notes
LABEL: many issue a LABEL command for the container
description and another LABEL command for the version of the image produced. Unless
you have a good reason to use them and that your platform(s) will query the
metadata, avoid them. Especially the description label which mostly remains static.
The version of the image is what you tag the image. Remember each LABEL command
produces another layer.
Logging
Many mount a volume to the container so that the host can
access the logs and send them somewhere. Whilst this approach works, it is a lot
simpler for the container to send logs to the standard and error outputs, then use a
logging driver in Docker.
Running the container CMD in foreground should produce at
least startup log to the standard output. To get the full logs, redirect them to the
outputs, by creating symbolic links in a RUN command.
For example with nginx again:
|
RUN ln
-sf
/dev/stdout
/var/log/nginx/access.log
&& \
ln
-sf
/dev/stderr
/var/log/nginx/error.log
|
Ensure that the user has
permission to write to the outputs.
Keeping images the smallest size possible
As mentioned earlier, try and keep the number of layers
as low as possible. Some other tips are:
- Remove package lists at
the end of the RUN command
rm -rf /var/lib/apt/lists/*
And optionally any other temporary files
under
/var/tmp/* and
/tmp/*
- If you are downloading
archives, remove them after extracting
- If you are building from
source, remove the required packages for building and it’s dependencies
- Consider building using a
slim base image
Do not flatten the image with a docker export unless you
have a good reason to. An export does not preserve all the layers, thus can no
longer take advantage of cached layers.
Slim Images using Alpine Linux
Whenever possible it’s good to use a slim image to speed
up builds and deployments.
The busybox image has been around for a while but
recently there has been an increasing trend in adopting the Alpine Linux image.
It’s is a security-oriented, lightweight Linux
distribution based on musl libc and busybox It’s only 4.8MB ! 
Several official images are published to Docker
Hub with an alpine slim variant, keep an eye for tags suffixed with -alpine.
Alpine Linux currently has a limited number of packages
in it’s package repository. It still has some popular ones: nginx, squid, redis,
openjdk7 etc… For comparison openjdk based on Alpine is 100MB which is over 5 times
smaller than the Debian (560MB) based image.
Some may have security reservations with slim images
versus a full Docker O.S. In my opinion it’s reasonably secure as long as: 1. the
upstream image is frequently updated 2. you ensure you always pull the latest base
image on all builds and 3. more importantly, ensure that your host O.S. is regularly
patched.
Final words
There are many many topics to learn and cover in Docker.
Ensure you are well familiar with the Docker build documentation at https://docs.docker.com/engine/reference/builder/
Read on Dockerfile best practices
https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
Check how others build their images on
Docker Hub, Git Hub etc… learn from them and enhance yours. Try keep your build
structure and naming simple and consistent.
The next Docker topic will cover the Docker platform:
rancher.com. Watch this space and happy Dockering !