Skip to content

Deployment

Jonathan Mang edited this page Jan 25, 2022 · 6 revisions

Deployment of the DQA Tool

Docker

You can test the package without needing to install anything except docker. To try out the package follow these instructions:

  1. Make sure you have docker installed

  2. Clone this repo

    git clone https://gitlab.miracum.org/miracum/dqa/dqastats.git dqastats
    cd dqastats
  3. Run the containerized setup using

    docker-compose -f ./docker/docker-compose.yml up
  4. Go to ./docker/output/ and see the created report.

Advanced dockerized usage

If you want to use your own docker-compose and .env file(s) you can do this simply by using them in this command:

docker-compose \
  -f docker-compose_miracum.yml \
  --env-file ../dqastats.env \
  up --build

Debugging

Maybe these snippets might be helpful to debug if something goes wrong:

## Open an console inside the container:
docker run -it ghcr.io/miracum/dqastats:latest //bin//bash

## Installed R packages are stored in:
## "/usr/local/lib/R/site-library" and
## "/usr/local/lib/R/library"
## Run example data:
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file("demo_data", package = "DQAstats"));
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file("demo_data", package = "DQAstats"));
tmp <- DQAstats::dqa(
                     source_system_name = "exampleCSV_source",
                     target_system_name = "exampleCSV_target",
                     utils_path = "/usr/local/lib/R/site-library/DQAstats/demo_data/utilities",
                     mdr_filename = "mdr_example_data.csv",
                     output_dir = "/data/output",
                     logfile_dir = "/data/logs"
                     )

Kubernetes

Background

The manifest ./docker/dqastats-workflow.yaml uses Argo Workflows to shedule the dockerized version of DQAstats to run a data quality (DQ) analysis on a regular basis.

How to use

  1. Install KinD (Kubernetes in Docker).

  2. Create a local cluster for testing:

    kind create cluster
  3. Install Argo Workflows:

    ## Add the HELM repo for Argo:
    helm repo add bitnami https://charts.bitnami.com/bitnami
    
    ## Install Argo Workflow with own presets:
    helm install argo-wf bitnami/argo-workflows \
        --set server.serviceAccount.name=argo-wf-san
  4. Follow the instructions in the console to obtain the Bearer token, these might be similar to the following:

    ## Note: If you changed the name `arg-wf` of the deployment
    ## in the `helm install ...` command above,
    ## you also need to change it here:
    SECRET=$(kubectl get sa argo-wf-san -o=jsonpath='{.secrets[0].name}')
    ARGO_TOKEN="Bearer $(kubectl get secret $SECRET -o=jsonpath='{.data.token}' | base64 --decode)"
    echo "$ARGO_TOKEN"
  5. Change the manifest ./docker/dqastats-workflow.yaml to your needs or keep the current one for demo purpose.

  6. Send the secret and the workflow to the cluster:

    kubectl apply -f ./docker/dqastats-secret.yaml
    kubectl apply -f ./docker/dqastats-workflow.yaml

Thanks

🎉 Big thanks to @christian.gulden / @chgl for all Kubernetes Support! The draft of this "How to ..." section is borrowed from him, originally from here: https://gitlab.miracum.org/miracum/charts/-/blob/master/README.md.