Batch processing part of CLIx dashboard backend. Implemented with Airflow.
This repository contains Dockerfile of apache-airflow for Docker's automated build published to the public Docker Hub Registry.
- Based on Python (3.6-slim) official Image python:3.6-slim and uses the official Postgres as backend and Redis as queue
- Install Docker
- Install Docker Compose
- Following the Airflow release from Python Package Index
Pull the image from the Docker repository.
docker pull puckel/docker-airflow
Optionally install Extra Airflow Packages and/or python dependencies at build time :
docker build --rm --build-arg AIRFLOW_DEPS="datadog,dask" -t puckel/docker-airflow .
docker build --rm --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .
or combined
docker build --rm --build-arg AIRFLOW_DEPS="datadog,dask" --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .
Don't forget to update the airflow images in the docker-compose files to puckel/docker-airflow:latest.
By default, docker-airflow runs Airflow with SequentialExecutor :
docker run -d -p 8080:8080 puckel/docker-airflow webserver
If you want to run another executor, use the other docker-compose.yml files provided in this repository.
For LocalExecutor :
docker-compose -f docker-compose-LocalExecutor.yml up -d
For CeleryExecutor :
docker-compose -f docker-compose-CeleryExecutor.yml up -d
NB : If you want to have DAGs example loaded (default=False), you've to set the following environment variable :
LOAD_EX=n
docker run -d -p 8080:8080 -e LOAD_EX=y puckel/docker-airflow
If you want to use Ad hoc query, make sure you've configured connections: Go to Admin -> Connections and Edit "postgres_default" set this values (equivalent to values in airflow.cfg/docker-compose*.yml) :
- Host : postgres
- Schema : airflow
- Login : airflow
- Password : airflow
For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default docker-airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose (ie: docker-compose-LocalExecutor.yml) file to set the same key accross containers. To generate a fernet_key :
docker run puckel/docker-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"
It's possible to set any configuration value for Airflow from environment variables, which are used over values from the airflow.cfg.
The general rule is the environment variable should be named AIRFLOW__<section>__<key>
, for example AIRFLOW__CORE__SQL_ALCHEMY_CONN
sets the sql_alchemy_conn
config option in the [core]
section.
Check out the Airflow documentation for more details
You can also define connections via environment variables by prefixing them with AIRFLOW_CONN_
- for example AIRFLOW_CONN_POSTGRES_MASTER=postgres://user:password@localhost:5432/master
for a connection called "postgres_master". The value is parsed as a URI. This will work for hooks etc, but won't show up in the "Ad-hoc Query" section unless an (empty) connection is also created in the DB
Airflow allows for custom user-created plugins which are typically found in ${AIRFLOW_HOME}/plugins
folder. Documentation on plugins can be found here
In order to incorporate plugins into your docker container
- Create the plugins folders
plugins/
with your custom plugins. - Mount the folder as a volume by doing either of the following:
- Include the folder as a volume in command-line
-v $(pwd)/plugins/:/usr/local/airflow/plugins
- Use docker-compose-LocalExecutor.yml or docker-compose-CeleryExecutor.yml which contains support for adding the plugins folder as a volume
- Include the folder as a volume in command-line
- Create a file "requirements.txt" with the desired python modules
- Mount this file as a volume
-v $(pwd)/requirements.txt:/requirements.txt
(or add it as a volume in docker-compose file) - The entrypoint.sh script execute the pip install command (with --user option)
- Airflow: localhost:8080
- Flower: localhost:5555
Easy scaling using docker-compose:
docker-compose -f docker-compose-CeleryExecutor.yml scale worker=5
This can be used to scale to a multi node setup using docker swarm.
If you want to run other airflow sub-commands, such as list_dags
or clear
you can do so like this:
docker run --rm -ti puckel/docker-airflow airflow list_dags
or with your docker-compose set up like this:
docker-compose -f docker-compose-CeleryExecutor.yml run --rm webserver airflow list_dags
You can also use this to run a bash shell or any other command in the same environment that airflow would be run in:
docker run --rm -ti puckel/docker-airflow bash
docker run --rm -ti puckel/docker-airflow ipython
Fork, improve and PR. ;-)