angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

Docker Swarm rolling updates

Docker Swarm rolling updates are a very easy way to perform updates without any down time

7 July 2024 Updated 8 July 2024
post main image
https://unsplash.com/@coinstash_au

Some time ago I wrote that it would be best to move to a Kubernetes variant and now this post is about Docker Swarm. Yes, I still use Docker Swarm because I have a project that uses it. I recently moved development from Docker to Docker Swarm, mainly because with Docker Swarm you learn the basics of container orchestration, so why not learn this during development.

In this post, we'll look at rolling updates: an environment variable and an image. I assume you already have some hands-on experience with Docker Swarm. As always, I'm doing this on Ubuntu 22.04.

Our Docker-Compose project

We first create a Docker-Compose project. The project tree:

.
├── docker-compose.yml
├── .env
├── app
│   └── run.sh

The files:

# file: .env
COMPOSE_PROJECT_NAME=my-project
LOGGER_LEVEL=DEBUG

We use the busybox image and start with two replicas. We also make some Docker Swarm specific variables available:

  • X_SERVICE_LABEL_STACK_IMAGE: information about the image.
  • X_TASK_SLOT: the number of the (task) instance.
# file: docker-compose.yml
version: "3.7"

x-service_defaults: &service_defaults
  env_file:
    - ./.env
  logging:
    driver: json-file
    options:
      max-size: "10m"
      max-file: "5"

services:
  busybox:
    << : *service_defaults
    deploy:
      mode: replicated
      replicas: 2
      restart_policy:
        condition: on-failure
    image: busybox:1.35.0
    environment:
      # swarm info
      X_COMPOSE_PROJECT_NAME: "${COMPOSE_PROJECT_NAME}"
      X_SERVICE_LABEL_STACK_IMAGE:  '{{index  .Service.Labels  "com.docker.stack.image"}}'
      X_TASK_SLOT: "{{.Task.Slot}}"
    ports:
      - "127.0.0.1:8280:8280"
    volumes:
      - "./app:/app"
    command: /bin/sh /app/run.sh
    networks:
      - my-project-network

networks:
  my-project-network:
    external: true
    name: my-project-network

We use a script 'run.sh', called when the container starts, that does two things:

  • It starts a httpd server in the background
  • It generates log lines in a loop, printed to stdout.
# file: app/run.sh
echo "Starting httpd server instance ${X_TASK_SLOT} ..."
echo "Hello from httpd server instance ${X_TASK_SLOT}" > /var/www/index.html
/bin/httpd -f -p 8280 -h /var/www/ &
echo "Starting output ..."
while true; do echo "IMAGE: ${X_SERVICE_LABEL_STACK_IMAGE}, LOGGER_LEVEL = ${LOGGER_LEVEL}"; sleep 1; done

With Docker Swarm, we typically do not create networks in the 'docker-compose.yml' but use external networks, more specifically 'overlay' networks. When creating such a network, we can also specify a flag that allows non-Docker Swarm managed containers to connect to this network.

To create the network:

docker network create -d overlay --attachable my-project-network

To see this network:

docker network ls

Result:

NETWORK ID     NAME                                                DRIVER    SCOPE
...
qn7qwhpsooty   my-project-network                                  overlay   swarm
...

Some Docker Swarm commands

A note about the commands. Many times we use '--detach=false'. This means the command does not return immediately, but returns on completion. In the mean time, useful information is shown in the terminal.

Let's bring up our project, the ugly construction is used to pass the environment variables:

env $(cat .env | grep ^[A-Z] | xargs) docker stack deploy --detach=false -c docker-compose.yml my-project

Result:

WARN[0000] ignoring IP-address (127.0.0.1:8280:8280/tcp) service will listen on '0.0.0.0' 
Creating service my-project_busybox
overall progress: 2 out of 2 tasks 
1/2: running   [==================================================>] 
2/2: running   [==================================================>] 
verify: Service vflo2g4fiybtx0p9b596uk445 converged 

Note the warning here. This is unexpected and different from Docker and means that with Docker Swarm, we are creating an open port, be careful!

To remove our project, we can use:

docker stack rm --detach=false my-project

To show the stack services:

docker stack services my-project

Result:

ID             NAME                 MODE         REPLICAS   IMAGE            PORTS
2oz3yg39zuvx   my-project_busybox   replicated   2/2        busybox:1.35.0   *:8280->8280/tcp

To show the tasks of the service 'my-project_busybox':

docker service ps my-project_busybox

Result:

ID             NAME                   IMAGE            NODE      DESIRED STATE   CURRENT STATE                ERROR     PORTS
v1pc5p8yb2fx   my-project_busybox.1   busybox:1.35.0   myra      Running         Running about a minute ago             
6ozvx31c6isq   my-project_busybox.2   busybox:1.35.0   myra      Running         Running about a minute ago  

Check the logs, for every task, every second a new log line:

docker service logs -t -f my-project_busybox

Result:

...
2024-07-07T15:42:20.805354434Z my-project_busybox.2.6ozvx31c6isq@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:21.808005147Z my-project_busybox.2.6ozvx31c6isq@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:22.807919531Z my-project_busybox.1.v1pc5p8yb2fx@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:22.809102067Z my-project_busybox.2.6ozvx31c6isq@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:23.808999822Z my-project_busybox.1.v1pc5p8yb2fx@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T15:42:23.809973729Z my-project_busybox.2.6ozvx31c6isq@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG

In the log we see that both tasks are running.

To check the httpd server, we run on our host:

curl 127.0.0.1:8280

Result:

Hello from httpd server instance 1

If repeat this a few times:

cmd="curl 127.0.0.1:8280"; for i in $(seq 1000); do $cmd; sleep 0.5; done

Result :

...
Hello from httpd server instance 2
Hello from httpd server instance 1
Hello from httpd server instance 2
Hello from httpd server instance 1

Here we see that the Docker Swarm load balancer alternates requests to both instances. 

Finally, let's inspect the service:

docker service inspect --pretty my-project_busybox

Result:

ID:		2oz3yg39zuvxyl2k4hc77qsic
Name:		my-project_busybox
Labels:
 com.docker.stack.image=busybox:1.35.0
 com.docker.stack.namespace=my-project
Service Mode:	Replicated
 Replicas:	2
Placement:
UpdateConfig:
 Parallelism:	1
 On failure:	pause
 Monitoring Period: 5s
 Max failure ratio: 0
 Update order:      stop-first
RollbackConfig:
 Parallelism:	1
 On failure:	pause
 Monitoring Period: 5s
 Max failure ratio: 0
 Rollback order:    stop-first
ContainerSpec:
...

Note the 'UpdateConfig' 'Parallelism' parameter. The value of 1 means that the update process will update a single task first and once this update has completed it will update a next task. The same parameter is also present in the 'RollbackConfig'.

Scaling by adding replicas

So far our service has two replicas. If we perform a rolling update with only one task present, then our service will be temporarily unavailable. That is not what we want. With two tasks, Docker Swarm can update one task and once that task has been updated, it can update the second task. This means that our service remains available all the time.

To add more replicas, for example 3:

docker service scale my-project_busybox=3

Checking rolling updates

To check if our service is updated we can check the service log. This will show which tasks are running and when a task is restarted.

docker service logs -t -f my-project_busybox

To check that our service is not interrupted during the update process, we can check the httpd server in a separate terminal, in an "endless" loop as mentioned earlier. We should not see any interruptions:

cmd="curl 127.0.0.1:8280"; for i in $(seq 1000); do $cmd; sleep 0.5; done

Rolling updates and rollbacks

Why do call it a 'Rolling update'? Because we pass an update instruction with new data to Docker Swarm, and ask it to perform the update.

Below, there are two service update scenario's:

  1. Update an environment variable of the service
  2. Update the image of the service

The update command is:

docker service update <parameters> my-project_busybox

The type of the update is specified by the parameters.

Because an update can fail, we want to be able to return to the previous version. The rollback command in both cases is:

docker service rollback my-project_busybox

1. Rolling update: Environment variable

Here we change the 'LOGGER_LEVEL' of our application. The 'LOGGER_LEVEL' is initially loaded from the '.env file' and has the value 'DEBUG'. We change it to 'WARNING' using the following update command:

docker service update --env-add LOGGER_LEVEL=WARNING my-project_busybox

Result:

my-project_busybox
overall progress: 2 out of 2 tasks 
1/2: running   [==================================================>] 
2/2: running   [==================================================>] 
verify: Service my-project_busybox converged 

The service log shows the following during the update:

...
2024-07-07T16:35:36.177525585Z my-project_busybox.1.5jwup67xwmbc@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:37.178678148Z my-project_busybox.1.5jwup67xwmbc@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:37.528564504Z my-project_busybox.2.yna9ftex6bau@myra    | Starting httpd server instance 2 ...
2024-07-07T16:35:37.528847322Z my-project_busybox.2.yna9ftex6bau@myra    | Starting output ...
2024-07-07T16:35:37.529281987Z my-project_busybox.2.yna9ftex6bau@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
2024-07-07T16:35:38.180094076Z my-project_busybox.1.5jwup67xwmbc@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:49.542707071Z my-project_busybox.2.yna9ftex6bau@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
2024-07-07T16:35:50.194103057Z my-project_busybox.1.5jwup67xwmbc@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = DEBUG
2024-07-07T16:35:52.132182215Z my-project_busybox.1.qrqsbiowltle@myra    | Starting httpd server instance 1 ...
2024-07-07T16:35:52.132401060Z my-project_busybox.1.qrqsbiowltle@myra    | Starting output ...
2024-07-07T16:35:52.132788443Z my-project_busybox.1.qrqsbiowltle@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING
2024-07-07T16:35:52.546112046Z my-project_busybox.2.yna9ftex6bau@myra    | IMAGE: busybox:1.35.0, LOGGER_LEVEL = WARNING

Now let's rollback:

docker service rollback my-project_busybox

Result:

my-project_busybox
rollback: manually requested rollback 
overall progress: rolling back update: 2 out of 2 tasks 
1/2: running   [==================================================>] 
2/2: running   [==================================================>] 
verify: Service my-project_busybox converged 

After the rollback operation, the 'LOG_LEVEL' is back at 'DEBUG', check the service log.

2. Rolling update: Image

In another scenario, we have a new image for our application. Here we move from busybox:1.35.0 to busybox:1.36.0. The update command is:

docker service update --image busybox:1.36.0 my-project_busybox

And, again, the rollback command is:

docker service rollback my-project_busybox

What if something goes wrong during the update?

Let's make a mistake and update with a non-existing image:

docker service update --image busybox:9.99.0 my-project_busybox

Result:

image busybox:9.99.0 could not be accessed on a registry to record
its digest. Each node will access busybox:9.99.0 independently,
possibly leading to different nodes running different
versions of the image.

my-project_busybox
overall progress: 0 out of 2 tasks 
1/2: preparing [=================================>                 ] 
2/2:   
service update paused: update paused due to failure or early termination of task n371geu35a4u5xe9oclefv5j9

When the update of a task fails, the update process is terminated. The other task remains running meaning that our service still is available. We also see this by inspecting the service:

docker service inspect --pretty my-project_busybox

Result:

ID:		k4a0vy77wirk1fglso42qxx38
Name:		my-project_busybox
Labels:
 com.docker.stack.image=busybox:1.35.0
 com.docker.stack.namespace=my-project
Service Mode:	Replicated
 Replicas:	2
UpdateStatus:
 State:		paused
 Started:	2 minutes ago
 Message:	update paused due to failure or early termination of task n371geu35a4u5xe9oclefv5j9
Placement:
...

As mentioned before, we can go back to the state before the update by issuing a rollback:

docker service rollback my-project_busybox

Service updates with 'docker stack deploy' and 'docker-compose.yml'

Now things get a bit ugly. So far we updated our services using 'docker service update', and we could revert to a previous version by issuing 'docker service rollback'.

But, we are using a 'docker-compose.yml' file here. It appears to be possible to change the 'docker-compose.yml' file and re-deploy again.

Let's see what happens. First, we edit the environment variable in the '.env file' and the image tag in the 'docker-compose.yml' file. Then we issue the deploy command again:

env $(cat .env | grep ^[A-Z] | xargs) docker stack deploy --detach=false -c docker-compose.yml my-project

Result:

Updating service my-project_busybox (id: mi27j7jjsz146y4wqqre439io)
overall progress: 2 out of 2 tasks 
1/2: running   [==================================================>] 
2/2: running   [==================================================>] 
verify: Service mi27j7jjsz146y4wqqre439io converged 

Note that the result now mentions 'Updating service', while the first time it mentioned 'Creating service'.

Anyway, this means Docker Swarm is performing an update (without down time) for every service in the 'docker-compose.yml' file, in the same way we update individual services using 'docker service update'. This is great!

But things can go wrong and there is no 'docker stack rollback' command. This means that, if things go wrong, we can revert to the previous version, by restoring our previous 'docker-compose.yml' file and re-deploy again!

Summary

Running updates without causing down time has become very important. Docker Swarm makes managing and running updates a breeze. Docker Swarm was and remains an extremely powerful container orchestration tool, even though development seems to have stalled.

Links / credits

Docker - Apply rolling updates to a service
https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update

docker stack deploy in 1.13 doesn't load .env file as docker-compose up does #29133
https://github.com/moby/moby/issues/29133

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.