Docker Swarm rolling updates
Docker Swarm rolling updates are a very easy way to perform updates without any down time
Some time ago I wrote that it would be best to move to a Kubernetes variant and now this post is about Docker Swarm. Yes, I still use Docker Swarm because I have a project that uses it. I recently moved development from Docker to Docker Swarm, mainly because with Docker Swarm you learn the basics of container orchestration, so why not learn this during development.
In this post, we'll look at rolling updates: an environment variable and an image. I assume you already have some hands-on experience with Docker Swarm. As always, I'm doing this on Ubuntu 22.04.
Our Docker-Compose project
We first create a Docker-Compose project. The project tree:
.
├── docker-compose.yml
├── .env
├── app
│ └── run.sh
The files:
# file: .env
COMPOSE_PROJECT_NAME=my-project
LOGGER_LEVEL=DEBUG
We use the busybox image and start with two replicas. We also make some Docker Swarm specific variables available:
- X_SERVICE_LABEL_STACK_IMAGE: information about the image.
- X_TASK_SLOT: the number of the (task) instance.
# file: docker-compose.yml
version: "3.7"
x-service_defaults: &service_defaults
env_file:
- ./.env
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
services:
busybox:
<< : *service_defaults
deploy:
mode: replicated
replicas: 2
restart_policy:
condition: on-failure
image: busybox:1.35.0
environment:
# swarm info
X_COMPOSE_PROJECT_NAME: "${COMPOSE_PROJECT_NAME}"
X_SERVICE_LABEL_STACK_IMAGE: '{{index .Service.Labels "com.docker.stack.image"}}'
X_TASK_SLOT: "{{.Task.Slot}}"
ports:
- "127.0.0.1:8280:8280"
volumes:
- "./app:/app"
command: /bin/sh /app/run.sh
networks:
- my-project-network
networks:
my-project-network:
external: true
name: my-project-network
We use a script 'run.sh', called when the container starts, that does two things:
- It starts a httpd server in the background
- It generates log lines in a loop, printed to stdout.
# file: app/run.sh
echo "Starting httpd server instance ${X_TASK_SLOT} ..."
echo "Hello from httpd server instance ${X_TASK_SLOT}" > /var/www/index.html
/bin/httpd -f -p 8280 -h /var/www/ &
echo "Starting output ..."
while true; do echo "IMAGE: ${X_SERVICE_LABEL_STACK_IMAGE}, LOGGER_LEVEL = ${LOGGER_LEVEL}"; sleep 1; done
With Docker Swarm, we typically do not create networks in the 'docker-compose.yml' but use external networks, more specifically 'overlay' networks. When creating such a network, we can also specify a flag that allows non-Docker Swarm managed containers to connect to this network.
To create the network:
docker network create -d overlay --attachable my-project-network
To see this network:
docker network ls
Result:
NETWORK ID NAME DRIVER SCOPE
...
qn7qwhpsooty my-project-network overlay swarm
...
Some Docker Swarm commands
A note about the commands. Many times we use '--detach=false'. This means the command does not return immediately, but returns on completion. In the mean time, useful information is shown in the terminal.
Let's bring up our project, the ugly construction is used to pass the environment variables:
env $(cat .env | grep ^[A-Z] | xargs) docker stack deploy --detach=false -c docker-compose.yml my-project
Result:
WARN[0000] ignoring IP-address (127.0.0.1:8280:8280/tcp) service will listen on '0.0.0.0'
Creating service my-project_busybox
overall progress: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service vflo2g4fiybtx0p9b596uk445 converged
Note the warning here. This is unexpected and different from Docker and means that with Docker Swarm, we are creating an open port, be careful!
To remove our project, we can use:
docker stack rm --detach=false my-project
To show the stack services:
docker stack services my-project
Result:
ID NAME MODE REPLICAS IMAGE PORTS
2oz3yg39zuvx my-project_busybox replicated 2/2 busybox:1.35.0 *:8280->8280/tcp
To show the tasks of the service 'my-project_busybox':
docker service ps my-project_busybox
Result:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
v1pc5p8yb2fx my-project_busybox.1 busybox:1.35.0 myra Running Running about a minute ago
6ozvx31c6isq my-project_busybox.2 busybox:1.35.0 myra Running Running about a minute ago
Check the logs, for every task, every second a new log line:
docker service logs -t -f my-project_busybox
Result:
...
2024-07-07T15:42:20.805354434Z my-project_busybox.2.6ozvx31c6isq@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:21.808005147Z my-project_busybox.2.6ozvx31c6isq@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:22.807919531Z my-project_busybox.1.v1pc5p8yb2fx@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:22.809102067Z my-project_busybox.2.6ozvx31c6isq@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:23.808999822Z my-project_busybox.1.v1pc5p8yb2fx@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:23.809973729Z my-project_busybox.2.6ozvx31c6isq@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
In the log we see that both tasks are running.
To check the httpd server, we run on our host:
curl 127.0.0.1:8280
Result:
Hello from httpd server instance 1
If repeat this a few times:
cmd="curl 127.0.0.1:8280"; for i in $(seq 1000); do $cmd; sleep 0.5; done
Result :
...
Hello from httpd server instance 2
Hello from httpd server instance 1
Hello from httpd server instance 2
Hello from httpd server instance 1
Here we see that the Docker Swarm load balancer alternates requests to both instances.
Finally, let's inspect the service:
docker service inspect --pretty my-project_busybox
Result:
ID: 2oz3yg39zuvxyl2k4hc77qsic
Name: my-project_busybox
Labels:
com.docker.stack.image=busybox:1.35.0
com.docker.stack.namespace=my-project
Service Mode: Replicated
Replicas: 2
Placement:
UpdateConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Update order: stop-first
RollbackConfig:
Parallelism: 1
On failure: pause
Monitoring Period: 5s
Max failure ratio: 0
Rollback order: stop-first
ContainerSpec:
...
Note the 'UpdateConfig' 'Parallelism' parameter. The value of 1 means that the update process will update a single task first and once this update has completed it will update a next task. The same parameter is also present in the 'RollbackConfig'.
Scaling by adding replicas
So far our service has two replicas. If we perform a rolling update with only one task present, then our service will be temporarily unavailable. That is not what we want. With two tasks, Docker Swarm can update one task and once that task has been updated, it can update the second task. This means that our service remains available all the time.
To add more replicas, for example 3:
docker service scale my-project_busybox=3
Checking rolling updates
To check if our service is updated we can check the service log. This will show which tasks are running and when a task is restarted.
docker service logs -t -f my-project_busybox
To check that our service is not interrupted during the update process, we can check the httpd server in a separate terminal, in an "endless" loop as mentioned earlier. We should not see any interruptions:
cmd="curl 127.0.0.1:8280"; for i in $(seq 1000); do $cmd; sleep 0.5; done
Rolling updates and rollbacks
Why do call it a 'Rolling update'? Because we pass an update instruction with new data to Docker Swarm, and ask it to perform the update.
Below, there are two service update scenario's:
- Update an environment variable of the service
- Update the image of the service
The update command is:
docker service update <parameters> my-project_busybox
The type of the update is specified by the parameters.
Because an update can fail, we want to be able to return to the previous version. The rollback command in both cases is:
docker service rollback my-project_busybox
1. Rolling update: Environment variable
Here we change the 'LOGGER_LEVEL' of our application. The 'LOGGER_LEVEL' is initially loaded from the '.env file' and has the value 'DEBUG'. We change it to 'WARNING' using the following update command:
docker service update --env-add LOGGER_LEVEL=WARNING my-project_busybox
Result:
my-project_busybox
overall progress: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service my-project_busybox converged
The service log shows the following during the update:
...
2024-07-07T16:35:36.177525585Z my-project_busybox.1.5jwup67xwmbc@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:37.178678148Z my-project_busybox.1.5jwup67xwmbc@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:37.528564504Z my-project_busybox.2.yna9ftex6bau@myra | Starting httpd server instance 2 ...
2024-07-07T16:35:37.528847322Z my-project_busybox.2.yna9ftex6bau@myra | Starting output ...
2024-07-07T16:35:37.529281987Z my-project_busybox.2.yna9ftex6bau@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
2024-07-07T16:35:38.180094076Z my-project_busybox.1.5jwup67xwmbc@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:49.542707071Z my-project_busybox.2.yna9ftex6bau@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
2024-07-07T16:35:50.194103057Z my-project_busybox.1.5jwup67xwmbc@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:52.132182215Z my-project_busybox.1.qrqsbiowltle@myra | Starting httpd server instance 1 ...
2024-07-07T16:35:52.132401060Z my-project_busybox.1.qrqsbiowltle@myra | Starting output ...
2024-07-07T16:35:52.132788443Z my-project_busybox.1.qrqsbiowltle@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
2024-07-07T16:35:52.546112046Z my-project_busybox.2.yna9ftex6bau@myra | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
Now let's rollback:
docker service rollback my-project_busybox
Result:
my-project_busybox
rollback: manually requested rollback
overall progress: rolling back update: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service my-project_busybox converged
After the rollback operation, the 'LOG_LEVEL' is back at 'DEBUG', check the service log.
2. Rolling update: Image
In another scenario, we have a new image for our application. Here we move from busybox:1.35.0 to busybox:1.36.0. The update command is:
docker service update --image busybox:1.36.0 my-project_busybox
And, again, the rollback command is:
docker service rollback my-project_busybox
What if something goes wrong during the update?
Let's make a mistake and update with a non-existing image:
docker service update --image busybox:9.99.0 my-project_busybox
Result:
image busybox:9.99.0 could not be accessed on a registry to record
its digest. Each node will access busybox:9.99.0 independently,
possibly leading to different nodes running different
versions of the image.
my-project_busybox
overall progress: 0 out of 2 tasks
1/2: preparing [=================================> ]
2/2:
service update paused: update paused due to failure or early termination of task n371geu35a4u5xe9oclefv5j9
When the update of a task fails, the update process is terminated. The other task remains running meaning that our service still is available. We also see this by inspecting the service:
docker service inspect --pretty my-project_busybox
Result:
ID: k4a0vy77wirk1fglso42qxx38
Name: my-project_busybox
Labels:
com.docker.stack.image=busybox:1.35.0
com.docker.stack.namespace=my-project
Service Mode: Replicated
Replicas: 2
UpdateStatus:
State: paused
Started: 2 minutes ago
Message: update paused due to failure or early termination of task n371geu35a4u5xe9oclefv5j9
Placement:
...
As mentioned before, we can go back to the state before the update by issuing a rollback:
docker service rollback my-project_busybox
Service updates with 'docker stack deploy' and 'docker-compose.yml'
Now things get a bit ugly. So far we updated our services using 'docker service update', and we could revert to a previous version by issuing 'docker service rollback'.
But, we are using a 'docker-compose.yml' file here. It appears to be possible to change the 'docker-compose.yml' file and re-deploy again.
Let's see what happens. First, we edit the environment variable in the '.env file' and the image tag in the 'docker-compose.yml' file. Then we issue the deploy command again:
env $(cat .env | grep ^[A-Z] | xargs) docker stack deploy --detach=false -c docker-compose.yml my-project
Result:
Updating service my-project_busybox (id: mi27j7jjsz146y4wqqre439io)
overall progress: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service mi27j7jjsz146y4wqqre439io converged
Note that the result now mentions 'Updating service', while the first time it mentioned 'Creating service'.
Anyway, this means Docker Swarm is performing an update (without down time) for every service in the 'docker-compose.yml' file, in the same way we update individual services using 'docker service update'. This is great!
But things can go wrong and there is no 'docker stack rollback' command. This means that, if things go wrong, we can revert to the previous version, by restoring our previous 'docker-compose.yml' file and re-deploy again!
Summary
Running updates without causing down time has become very important. Docker Swarm makes managing and running updates a breeze. Docker Swarm was and remains an extremely powerful container orchestration tool, even though development seems to have stalled.
Links / credits
Docker - Apply rolling updates to a service
https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update
docker stack deploy in 1.13 doesn't load .env file as docker-compose up does #29133
https://github.com/moby/moby/issues/29133
Read more
Docker Docker Swarm Docker-compose
Most viewed
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Using PyInstaller and Cython to create a Python executable
- Reducing page response times of a Flask SQLAlchemy website
- Connect to a service on a Docker host from a Docker container
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- SQLAlchemy: Using Cascade Deletes to delete related objects