With Docker’s rising popularity, many people are building and publishing their own images. It’s easy to get started and build. It feels like going back to the days before configuration management arrived, with lots of messy bash scripting. Unsurprisingly Docker Hub has it’s fair share of poorly written Dockerfiles.
This article is an attempt to structure Dockerfiles better with some tips and keeping in mind how to make them the smallest size possible.
Layers
It is important to understand that Docker images are based on layers, every command in the Dockerfile will produce one. The rule of thumb is to create the least number of layers possible and separate the ones that rarely change from those that change frequently.
Structuring the Dockerfile
If your Docker image typically installs a package, adds a couple of files then runs the app, then it’s good to adhere to what I call the “FMERAEEC” method; that’s using the: FROM, MAINTAINER, ENV, RUN, ADD, ENTRYPOINT, EXPOSE & CMD commands. Of course not everything will fit into this.
FROM
Use your preferred base image, which one you use is entirely up to you. It is preferable not to use the latest tag as you need to know when you base image changes, to verify that the app still runs ok.
debian:jessie tends to be a popular base image on the Docker Hub. We’ll discuss slim images later on.
MAINTAINER
Use the RFC compliant format e.g.
MAINTAINER Tom Murphy <tom@bluemalkin.net>
ENV
If you are installing packages, it is a good idea to specify which version of the main package is being installed. e.g.
ENV NGINX_VERSION 1.9.12
RUN
This layer will frequently change, so the golden rule is to chain up where possible, all the bash commands into a single RUN.
Typically you will get the list of packages, install the package(s) (using the version specified in ENV) then cleanup the list of packages.
Then you may run some configuration change commands such as sed or create symbolic links for the log files to the standard output/error etc… e.g.
1 2 3 |
RUN apt-get update && \ apt-get install -qy nginx=${NGINX_VERSION} && \ rm -rf /var/lib/apt/lists/* |
Building from source
If you are frequently building your container which requires using something built from source, it is preferable not to build from source. Those builds will take longer to accomplish, plus with the dependencies required, you may end up with a large sized layer. Instead you may want to have a separate process which builds, packages and stores them somewhere. Then your main app build can fetch and install the packages.
ADD
Add your configuration files, artifacts built by your CI/CD etc… Plus your entrypoint.sh script
Use ADD rather than COPY as ADD allows for additional sources, plus COPY will be deprecated.
A common misconception is the destination path in the container automatically creates any missing parent directories, therefore there’s no need to RUN mkdir commands.
ENTRYPOINT
When you run your docker container, the entrypoint script will run first. This is a good place to make some configuration changes to the container, based on any environment variables you pass in at run time, then issue:
exec "$@" which executes the CMD command.
A classic one many forget, is to add the execute bit to the script. Ensure it’s set in your source control, rather than running an additional RUN command to chmod the file.
EXPOSE
Expose the container port(s) regardless of whether you will use Docker links or bind to the host.
CMD
Finally CMD instructs the command (and options such as run in foreground) to run for the container.
It is preferred to put the command and options in the form of:
CMD ["executable","param1","param2"]
Other commands notes
LABEL: many issue a LABEL command for the container description and another LABEL command for the version of the image produced. Unless you have a good reason to use them and that your platform(s) will query the metadata, avoid them. Especially the description label which mostly remains static. The version of the image is what you tag the image. Remember each LABEL command produces another layer.
Logging
Many mount a volume to the container so that the host can access the logs and send them somewhere. Whilst this approach works, it is a lot simpler for the container to send logs to the standard and error outputs, then use a logging driver in Docker.
Running the container CMD in foreground should produce at least startup log to the standard output. To get the full logs, redirect them to the outputs, by creating symbolic links in a RUN command.
For example with nginx again:
1 2 |
RUN ln -sf /dev/stdout /var/log/nginx/access.log && \ ln -sf /dev/stderr /var/log/nginx/error.log |
Ensure that the user has permission to write to the outputs.
Keeping images the smallest size possible
As mentioned earlier, try and keep the number of layers as low as possible. Some other tips are:
- Remove package lists at the end of the RUN command
rm -rf /var/lib/apt/lists/*
And optionally any other temporary files under /var/tmp/* and /tmp/* - If you are downloading archives, remove them after extracting
- If you are building from source, remove the required packages for building and it’s dependencies
- Consider building using a slim base image
Do not flatten the image with a docker export unless you have a good reason to. An export does not preserve all the layers, thus can no longer take advantage of cached layers.
Slim Images using Alpine Linux
Whenever possible it’s good to use a slim image to speed up builds and deployments.
The busybox image has been around for a while but recently there has been an increasing trend in adopting the Alpine Linux image.
It’s is a security-oriented, lightweight Linux distribution based on musl libc and busybox It’s only 4.8MB !
Several official images are published to Docker Hub with an alpine slim variant, keep an eye for tags suffixed with -alpine.
Alpine Linux currently has a limited number of packages in it’s package repository. It still has some popular ones: nginx, squid, redis, openjdk7 etc… For comparison openjdk based on Alpine is 100MB which is over 5 times smaller than the Debian (560MB) based image.
Some may have security reservations with slim images versus a full Docker O.S. In my opinion it’s reasonably secure as long as: 1. the upstream image is frequently updated 2. you ensure you always pull the latest base image on all builds and 3. more importantly, ensure that your host O.S. is regularly patched.
Final words
There are many many topics to learn and cover in Docker. Ensure you are well familiar with the Docker build documentation at https://docs.docker.com/engine/reference/builder/
Read on Dockerfile best practices https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
Check how others build their images on Docker Hub, Git Hub etc… learn from them and enhance yours. Try keep your build structure and naming simple and consistent.
The next Docker topic will cover the Docker platform: rancher.com. Watch this space and happy Dockering !