Docker - Reduce, Re-Use, Recycle

Docker - Reduce, Re-Use, Recycle
Containerization is just so useful, but can lead to sloppy practices

Containerization is one of those "hot topics" you see in software development. It's useful because it allows us to build all of our code into a minimized environment similar to a VM, and provide it to users to run with a simple command.

This solves a lot of problems:

  • Handles OS differences between clients
  • Removes dependency issues and the need to set up environments
  • Allows for simple scaling of software

Most of the time when people create a docker file you approach it something like this:

FROM node:18 as project
WORKDIR /src
COPY . ./
RUN npm install
RUN npm run build

A really basic Dockerfile building a node project

If you don't know anything about docker, this is what this is instructing docker to do:

  1. FROM node:18 as project sets the environment to have all the dependencies to work with node version 18
  2. WORKDIR /src says that all the remaining instructions will be run in the src directory
  3. COPY . ./ copies all the files in the current directory to the container (ignoring anything in the .dockerignore file)
  4. RUN install and RUN build lines are just building a hypothetical npm project

Great! So you've got a running environment, you can build the docker image and run it. Most regular developers will stop here, and just ship their code out.

What's the issue?

There are actually a couple of issues:

  • Image Size - How big is your docker image
  • Image Components - What is included in your docker image
  • Security - How many things do you really need in your image

Image Size

The node:18 environment as of the time I wrote this was a whopping 378MB.

This means every time you build this, and you want to store it somewhere, it's going to cost over a third of a Gigabyte, which doesn't sound like a lot until you start considering that:

  • You'll probably want to keep multiple versions of your application, not just delete the old versions
  • You'll probably want to back up your applications
  • You'll probably want to download these images

Each of these actions are costing both storage and movement of a 378MB file, which is expensive at scale.

Now, if you are going to be using all of the features from the node:18 environment this is totally valid and you don't need to do anything else. You could try to select a slimmed version of node, for example you could use node:18-alpine which is considerably smaller at 43MB. But maybe you need a bunch of the packages to build but not to deploy! Suddenly you have to do all the work that docker promised to eliminate!

Image Components

Now another issue is that, what if you want to just build the application, then run it behind a load balancer like nginx?

You could install all the nginx requirements within the existing environment using a bunch of RUN statements with calls to apt but that wouldn't be very clean.

Security

We also need to consider security of our pod, some of the core components to this are:

  • How many vulnerabilities exist in the images we are using
  • How many libraries / how much access does our app really need

When we look at a docker image, for example we can look at the base image of node:18 which has 104 vulnerabilities.

If you can avoid using this entirely, great! If you just need certain libraries to build, and you can live without them after, I have a workaround for you. If you aren't using all the features of these environments, if you need functionality in a different image, or if you generally want to minimize the risk of your application, you may want to optimize this.

This is where the benefit of multi-stage builds comes in.

Multi-Stage Builds

The core of this is that, in our docker file, every time we specify FROM it replaces the previous base BUT we can copy files from the previous base to the current base using the from parameter.

Let's see this in action:

# Uses the first stage to build
FROM node:18 as build
WORKDIR /src
COPY package*.json ./
RUN npm install
RUN npm run build

# Creates a new image with NGINX 
FROM nginx:latest
COPY --from=build /src/build /usr/share/nginx/html
COPY . ./
# ... your code here to start the server

If you want to build a nodeJS application, then expose it using nginx

This solves all our problems:

  1. It removes all the dependencies that we aren't using anymore from node:18 and minimizes the size
  2. It allows us to run our app using nginx as the proxy, without installing it to another environment
  3. It removes the risks of running with the node image as it doesn't have the libraries that were flagged as risky.