Skip to content

Latest commit

 

History

History
214 lines (159 loc) · 12.7 KB

README.md

File metadata and controls

214 lines (159 loc) · 12.7 KB

Docker backup container

Features

This container can back up a PostgreSQL server.

  • The PostgreSQL server can be running locally or in Docker
  • The backup is compressed very quickly using all CPU cores with zstd
  • Optionally the backup can be copied off-site using SFTP (for example a Hetzner Storage Box).
  • Can be configured to back up all databases (including user accounts) or a single database
  • The backup is a SQL dump made using pg_dump
  • The container uses a Debian base image and uses pg_dump with the same major version as the server by installing the correct postgresql-client package
  • Directories with files can also be backed up as a tarball (not suitable for large files which don't change often)

Mainly based on borgbackup-docker, but uses Ofelia for scheduling instead of cron and dockerize.

PostgreSQL backup type

This container will create SQL dumps. This means that the backup file is consistent, portable and can be easily restored.

The advantage of a SQL dump compared to backing up the files in /var/lib/postgresql is that the dump is consistent. If PostgreSQL is running while backing up the files in /var/lib/postgresql, the backup may be inconsistent and PostgreSQL must replay the write-ahead-log after a restore until the database is consistent, possibly leading to data loss. SQL dumps are also more portable and don't require the exact same PostgreSQL server version as restoring files in /var/lib/postgresql would.

This does mean that the backup is a specific snapshot of the time the backup was made. It does not support point-in-time recovery (PITR) and when running nightly you might lose changes made since the last backup. If this is not acceptable, choose another (more complicated) backup solution such as pg_basebackup.

When backing up a database, the compression step may be the bottleneck if it only uses a single core. This container uses zstd with multi-threading by default. When the server has enough CPU cores this means that the backup can usually be made very quickly.

If the backup is too large or making the dump impacts the performance too much and can't be made during slow hours it's also better to use another backup solution.

Running

This container can be run separately, or it can be added to or merged with an existing Docker Compose stack.

At build time the PostgreSQL client version 15 is installed. On startup and when creating a backup the major version of the PostgreSQL server to back up is checked and if this differs the corresponding PostgreSQL client packages are installed, so database dumps are created with the same major client version as the server.

Connecting to the PostgreSQL server

The container must be able to connect to the database. If PostgreSQL is running on the host, run the container with the host network mode, otherwise specify the Docker network PostgreSQL is reachable in. See examples below.

Backup all databases

An entire PostgreSQL cluster with user accounts and all databases can be backed up as follows:

PostgreSQL running on the host:

docker run -it --rm --network=host -v $(pwd)/backup:/backup \
  -e PGHOST=localhost -e PGUSER=postgres -e PGPASSWORD=[password] \
  -e ONESHOT=true ghcr.io/b3partners/backup 

PostgreSQL running in Docker:

docker run -it --rm --network=[network-name] -v $(pwd)/backup:/backup \
  -e PGHOST=[postgresql-container-name] -e PGUSER=postgres -e PGPASSWORD=[password] \
  -e ONESHOT=true ghcr.io/b3partners/backup

Backup a single database

In the example above, add -e PGDATABASE=[database] to back up a single database.

PostgreSQL account and password

The account specified with PGUSER must be a superuser account to back up the globals with all user accounts, but a normal account can also be used as long as it has access to the databases to be backed up.

Make sure that the password you specify does not leak, by making your script not world-readable or remain in your shell history (hint: place a space before the command to avoid saving the command in history).

Backup storage location

This container writes the backup to /backup as mounted in the container, which can be a volume or bind mount.

By specifying a bind mount as the examples above you can back up to a directory on the host (will be owned by root). If you are using a backup client on the host, configure it to back up these files further for off-site backup or for keeping older backups. Or extend this image with support for borgbackup, restic, etc.

Copying backup to a SFTP server

The backup can be copied to a remote SFTP server. This is done after all backups are made. Backups are kept locally after copying in /backup/uploaded, so you need enough disk space to keep the previous backup and for creating a new one in /backup/temp.

If you back up to a Hetzner Storage Box for example, you can enable scheduled ZFS snapshots to automatically make read-only copies of your backup files. This means you don't need to run a backup server to have read-only backups. Some backup tools such as borgbackup or restic require write-access to their repository (at the time of writing this document), which does not allow for read-only backups.

Creating backups on a schedule

If you do not specify the ONESHOT=true environment variable, the Ofelia scheduler is started configured to run a backup at midnight by default.

Logging and errors

Logs are written to stdout. When backing up a single database goes wrong, the next databases to back up will not be skipped. Also when uploading the backup using SFTP goes wrong, the backups will remain in /backup/temp as mounted in the container. The Ofelia logs will also be written to /backup/ofelia so they remain persistent even when re-creating the container.

Encryption

The backups can be encrypted with GPG. To enable, set ENCRYPT=true and provide a public key in the PUBLIC_KEY variable, on a single line without header and footer to make it easier to specify.

Use the following command to export a GPG key the required format:

gpg --armor --export [key-name] | tail -n +3 | head -n -1 | tr -d "[:space:]"

Important: if you first make a backup without encryption, have it copied to a SFTP server and then enable encryption, make sure you delete the unencrypted files (without .gpg extension) of the previous backups from your SFTP server.

Configuration

This container is configured using the following environment variables:

Variable Default Description
ONESHOT false Set to true to backup up a single time and exit without starting the scheduler
SCHEDULE @midnight Schedule for the backup job, see here for the format
LOGGING true Whether Ofelia should write logs for each job in the container to /backup/ofelia
BACKUP_DIR - Directory to back up (optional)
BACKUP_PG true Set to false to only backup directories and no PostgreSQL databases
PGHOST db PostgreSQL database hostname. When using Docker Compose specify the service name.
PGPORT 5432 PostgreSQL port
PGUSER postgres PostgreSQL username
PGPASSWORD postgres PostgreSQL password
PGDATABASE all Database(s) to back up, separated by , or all to back up all databases in separate SQL dumps
STORAGE_BOX - Optional: Hetzner Storage Box account name (if set, no need to set SFTP_HOST and SFTP_USER)
SFTP_HOST - Optional SFTP server hostname
SFTP_USER -
SFTP_PATH backup Remote path on the SFTP server where to put backup files
SSHPASS - SFTP account password
PG_COMPRESS zstd Compression program for PostgreSQL dump, available: zstd, pigz (parallel gzip), pbzip2 (parallel bzip2), xz
TAR_COMPRESS zstd Compression program for TAR-ed directory
ZSTD_CLEVEL 3 Zstd compression level (1-19)
ZSTD_NBTHREADS 0 Number of CPU cores for Zstd compression, default 0 means all cores
XZ_DEFAULTS -T 0 Options voor xz compression: use all cores by default
ENCRYPT false Option for encryption with default is false. If enabled, the PUBLIC_KEY variable must be provided.
PUBLIC_KEY - GPG public key (single line, without header and footer). See above.

The default zstd compression is the fastest and most efficient, and makes sure the backup job is not bottlenecked by the compression as is the case with other compression tools (even the parallel versions).

File hashes

File hashes are created to check the integrity of the backup files. These are written to a file in the same directory as the backup files with the extension .sha256. You can compare the hashes of the backup files with the hashes in this file to check if the backup files integrity is intact. When you download the backup files and the checksum file and the files are in the same directory as the checksums file, you can use the following command to check the hashes:

sha256sum -c checksums.sha256

if the files are ok, you will see the following output:

backup_file1.gpg: OK
backup_file2.gpg: OK

Windows users can use the following command to check the hash of a file:

Get-FileHash -Algorithm SHA256 -Path path_to_your_file

Backing up directories or volumes

Mount directories and volumes under a single path in the container to back them up in a single large tarball. For example using Docker Compose:

services:
  backup:
    # ...
    environment:
      # ...
      - "BACKUP_DIR=/files"
    volumes:
      - volume-logs:/files/logs
      - volume-ssl-certificates:/files/ssl-certificates
      - my-files:/files/my-data
      - /some/host/path:/files/host-files

Gotchas

  • Backups of removed databases remain on the remote SFTP server (not locally). Not fixable using scp only.

Todos

  • Push backup metrics to Prometheus (which databases, success/fail, full and compressed size, duration, upload stats) for alerts and dashboard in Grafana or similar
  • Don't start new job when old one still running (only problem with short schedule or if job hangs)

Won't fix

  • Run as non-root user. The script uses apt to install the correct PostgreSQL client, and may need permissions to read mounted directories to back up.