Docker - Reduce, Re-Use, Recycle
Containerization is one of those "hot topics" you see in software development. It's useful because it allows us to build all of our code into a minimized environment similar to a VM, and provide it to users to run with a simple command.
This solves a lot of problems:
- Handles OS differences between clients
- Removes dependency issues and the need to set up environments
- Allows for simple scaling of software
Most of the time when people create a docker file you approach it something like this:
If you don't know anything about docker, this is what this is instructing docker to do:
FROM node:18 as project
sets the environment to have all the dependencies to work with node version 18WORKDIR /src
says that all the remaining instructions will be run in thesrc
directoryCOPY . ./
copies all the files in the current directory to the container (ignoring anything in the.dockerignore
file)RUN install
andRUN build
lines are just building a hypotheticalnpm
project
Great! So you've got a running environment, you can build the docker image and run it. Most regular developers will stop here, and just ship their code out.
What's the issue?
There are actually a couple of issues:
- Image Size - How big is your docker image
- Image Components - What is included in your docker image
- Security - How many things do you really need in your image
Image Size
The node:18
environment as of the time I wrote this was a whopping 378MB.
This means every time you build this, and you want to store it somewhere, it's going to cost over a third of a Gigabyte, which doesn't sound like a lot until you start considering that:
- You'll probably want to keep multiple versions of your application, not just delete the old versions
- You'll probably want to back up your applications
- You'll probably want to download these images
Each of these actions are costing both storage and movement of a 378MB file, which is expensive at scale.
Now, if you are going to be using all of the features from the node:18
environment this is totally valid and you don't need to do anything else. You could try to select a slimmed version of node, for example you could use node:18-alpine
which is considerably smaller at 43MB. But maybe you need a bunch of the packages to build but not to deploy! Suddenly you have to do all the work that docker promised to eliminate!
Image Components
Now another issue is that, what if you want to just build the application, then run it behind a load balancer like nginx
?
You could install all the nginx
requirements within the existing environment using a bunch of RUN
statements with calls to apt
but that wouldn't be very clean.
Security
We also need to consider security of our pod, some of the core components to this are:
- How many vulnerabilities exist in the images we are using
- How many libraries / how much access does our app really need
When we look at a docker image, for example we can look at the base image of node:18
which has 104 vulnerabilities.
If you can avoid using this entirely, great! If you just need certain libraries to build, and you can live without them after, I have a workaround for you. If you aren't using all the features of these environments, if you need functionality in a different image, or if you generally want to minimize the risk of your application, you may want to optimize this.
This is where the benefit of multi-stage builds comes in.
Multi-Stage Builds
The core of this is that, in our docker file, every time we specify FROM it replaces the previous base BUT we can copy files from the previous base to the current base using the from
parameter.
Let's see this in action:
This solves all our problems:
- It removes all the dependencies that we aren't using anymore from
node:18
and minimizes the size - It allows us to run our app using nginx as the proxy, without installing it to another environment
- It removes the risks of running with the
node
image as it doesn't have the libraries that were flagged as risky.