angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

PostgreSQL backup with Docker SDK for Python

Avoid complex Bash scripts. Write your scripts in Python!

9 March 2023 Updated 9 March 2023
post main image
https://www.pexels.com/nl-nl/@pixabay/

This is a short post about backup of a Dockerized PostgreSQL database. To access the database, we typically run a Bash script on the host, with commands like:

docker exec -t <container> bash -c '<command>'

In this post we replace our Bash script with a Python script. Why? Because we know Python and programming in Bash can be time-consuming. While we can use 'subprocess' to run all system commands here, we include the Docker SDK for Python and get a lot of extras. As always, I am running this on Ubuntu (22.04).

Backup operation

We will be performing a backup in the steps:

  • Backup database to a temporary file inside the container.
  • Copy the backup file outside the container.
  • Remove the temporary file.
  • Show the time of the backup operation.

The Bash backup script

Below is the Bash backup script. It checks for errors during the backup operation and copy operation. The script starts with our container and database parameters.

#!/usr/bin/bash
# backup.sh
pg_container=8c49633bda68
pg_user=postgres
db_name=my_db
db_user=postgres

# start
SECONDS=0

# backupfile inside docker container
tmp_backupfile=/tmp/${db_name}.pg_dump
# backupfile on the host
backupfile=${db_name}.pg_dump.`date +%Y%m%d"_"%H%M%S`.sql

echo "tmp_backupfile = ${tmp_backupfile}"
echo "backupfile = ${backupfile}"

echo "removing previous tmp_backupfile ..."
docker exec -t ${pg_container} bash -c 'rm -f -- ${0}' ${tmp_backupfile}

echo "dumping database to tmp_backupfile ..."
docker exec -t --user ${pg_user} ${pg_container} bash -c 'pg_dump ${0} -Fc -U ${1} -f ${2}' ${db_name} ${pg_user} ${tmp_backupfile}
exit_code=$?
if [ $exit_code -ne 0 ] ; then
    echo "dump error, exit_code = ${exit_code}"
    exit $exit_code
fi

echo "copying tmp_backupfile to backupfile ..."
docker cp "${pg_container}":${tmp_backupfile} ${backupfile}
exit_code=$?
if [ $exit_code -ne 0 ] ; then
    echo "copy error, exit_code = ${exit_code}"
    exit $exit_code
fi

echo "removing tmp_backupfile ..."
docker exec -t ${pg_container} bash -c 'rm -f -- ${0}' ${tmp_backupfile}

fsize=`stat --printf="%s" ${backupfile}`
elapsed_secs=$SECONDS

echo "ready, backupfile = ${backupfile}, size = ${fsize}, seconds = ${elapsed_secs}"

The Python backup script

For the Python script we first install the Python SDK for Docker (in a virtual environment):

pip install docker

Below is the Python backup script. It checks for errors during the backup operation and copy operation.

# backup.py
import datetime
import logging
import os
import subprocess
import sys
import time

import docker

def get_logger():
    logger_format = '[%(asctime)s] [%(levelname)s] %(message)s'
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    # console
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setLevel(logging.DEBUG)
    console_handler.setFormatter(logging.Formatter(logger_format))
    logger.addHandler(console_handler)                                            
    return logger

logger = get_logger()

class DockerUtils:
    def __init__(
        self,
        container=None,
    ):
        self.client = docker.from_env()
        self.container = self.client.containers.get(container)

    def docker_exec(self, cmd, **kwargs):
        return self.container.exec_run(cmd, **kwargs)

    def remove_container_file(self, f):
        r = self.docker_exec('ls ' + f)
        if r.exit_code == 0:
            r = self.docker_exec('rm ' + f)

    def docker_cp_from_container(self, src, dst):
        r = subprocess.run(['docker', 'cp', self.container.short_id + ':' + src, dst])
        return r

def main():
    pg_container = '8c49633bda68'
    pg_user = 'postgres'
    db_name = 'my_db'
    db_user = 'postgres'

    # start
    dt_start = datetime.datetime.now()

    # backupfile inside docker container
    tmp_backupfile = os.path.join('/tmp/', db_name + '.pg_dump')
    # backupfile on the host
    backupfile = './' + db_name + '.pg_dump.' + datetime.datetime.now().strftime('%Y%m%d_%H%M%S')

    logger.debug('tmp_backupfile = {}'.format(tmp_backupfile))
    logger.debug('backupfile = {}'.format(backupfile))

    # instantiate and set container
    du = DockerUtils(container=pg_container)

    logger.debug('removing previous tmp_backupfile ...')
    du.remove_container_file(tmp_backupfile)

    logger.debug('dumping database to tmp_backupfile ...')
    cmd = 'pg_dump {0} -Fc -U {1} -f {2}'.format(db_name, db_user, tmp_backupfile)
    kwargs = {'user': pg_user}
    r = du.docker_exec(cmd, **kwargs)
    if r.exit_code != 0:
        logger.error('dump error: {}'.format(str(r.output.decode('utf-8'))))
        sys.exit()
    
    logger.debug('copying tmp_backupfile to backupfile ...')
    r = du.docker_cp_from_container(tmp_backupfile, backupfile)
    if r.returncode != 0:
        logger.error('copy error = {}'.format(r.returncode))
        sys.exit()

    logger.debug('removing tmp_backupfile ...')
    du.remove_container_file(tmp_backupfile)

    fsize = os.stat(backupfile).st_size
    elapsed_secs = int((datetime.datetime.now() - dt_start).total_seconds())

    logger.info('ready, backupfile = {}, size = {}, seconds = {}'.format(backupfile, fsize, elapsed_secs))

if __name__ == "__main__":
    main()

Running without a virtual environment

But wait, with Python we need a virtual environment. To avoid this, we can create an executable using PyInstaller. First install PyInstaller:

pip install pyinstaller

Then create the executable with the following command:

pyinstaller --onefile backup.py

This will create an executable in the './dist' directory:

dist/backup

We can run this and put this executable in our path.

Summary

We did not really need the Docker SDK for Python here, we could have used 'subprocess', but by including it we can do a lot of other things as well.

There is not much difference between the Bash script and Python script. Writing some lines in Bash is easy. But as soon as we want a little more functionality and control, it can get very time-consuming. Unless you are a Bash guru, you have to look up commands, study how to write functions, test things. When writing a Python script, we still need knowledge about Linux commands but we can limit this to the few functions we need. For the remaining, we can use our Python knowledge.

Links / credits

Docker - docker cp
https://docs.docker.com/engine/reference/commandline/cp

Docker SDK for Python
https://docker-py.readthedocs.io

Dockerize & Backup A Postgres Database
https://dev.to/mattcale/dockerize-backup-a-postgres-database-2b1l

PostgreSQL - pg_dump
https://www.postgresql.org/docs/current/app-pgdump.html

PyInstaller
https://pyinstaller.org

Why is pg_restore segfaulting in Docker?
https://stackoverflow.com/questions/63934856/why-is-pg-restore-segfaulting-in-docker

Read more

Docker

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.