There is just 1 config file that control how the tool connects to the database; follow the link to see an example file.
This file contains connection parameters for the DOT database and also for any of the project databases (e.g. Muso
)
for which you want to run the DOT tests. See more details in the paragraph below on how to adapt this and other config files to your needs.
Additionally to the database connections handled in dot_config.yml
, the different objects generated by the DOT
can be stored in different schemas. Read below about the file dbt_project.yml
to learn how to define these output schemas.
The DOT can be run per project, where configuration and output files for each project are found in the following directories:
- mandatory configuration
|____config
| |____dot_config.yml
- optional per project configuration
|____config
| |____<project_name>
| | |____dbt
| | | |____profiles.yml
| | | |____dbt_project.yml
| | |____ge
| | | |____great_expectations.yml
| | | |____config_variables.yml
| | | |____batch_config.json
- generated files per project
|____generated_files
| |____<project_name>
| | |____all_tests_summary.xlsx
| | |____ge_clean_results.csv
| | |____dbt_test_coverage_report.txt
| | |____all_tests_rows.xlsx
All config files are grouped under the config dir. The DOT DB connection details are propagated through Jinja templates to other config files that belong to DBT and Great Expectations. Please follow the guidelines below if you need to customize other configurations.
A single file controls connections the DOT and any project for which you want to run DOT tests:
- copy the default dot_config into the top config folder (i.e. as
dot/config/dot_config.yml
) - note that the copied file will be ignored for version control
- change the necessary parameters for the dot_db connection, e.g.
host
,dbname
- add connection parameters for each of the projects you would like to run, with the same structure
as the
Muso_db
entry for the dot_config example, i.e.
<project_name>_db:
type: connection type e.g. postgres
host: host
user: username
pass: password
port: port number e.g 5432
dbname: database name
schema: schema name, e.g. public
threads: nubmer of threads for DBT, e.g. 4
- note that the DOT and the project connections should be at least in different schemas, but also they can be either in different databases of the same host, or in different servers
In you need to edit configurations for DBT and Great Expectations, you would need to change the Jinja templates. In general these customizations will not be needed, but only in some scenarios with particular requirements; these require a deeper knowledge of the DOT and of either DBT and/or Great Expectations.
This file goes into the dbt main folder. If you don't need to customise it, DOT uses this Jinja template, after a few project-dependent adjustments:
model-paths
is set to a subdirectory for the project, i.e.["models_<project_name>"]
test-paths
is also set to a subdirectory for the project, i.d.["tests_<project_name>"]
and the modified version is copied by the DOT into the destination dbt main folder.
The tool also copies the content of the models folder into the model path for the project, dot/dbt/models/<project_name>/core
,
and creates the custom SQL tests at dot/dbt/tests/<project_name>
.
An example of a common personalization would be for changing the schema in which the objects generated by the dot are written. See the paragraph just below.
The DOT generates 2 kind of database objects:
- entities of the models that are being tested, e.g. assessments, follow ups, patients
- results of the failing tests
If nothing is done, these objects would be created in the same schema as the original data for the project (thus polluting the db).
The following lines added to dbt_project.yml
will modify where those objects are stored:
models:
dbt_model_1:
core:
+schema: <schema_suffix>
test:
+schema: <schema_suffix>
Which will be added as a suffix. I.e. if the project data is stored in a certain schema, the output objects will go to
<project_schema>_<schema_suffix>
(e.g. to public_tests
if the project schema is public
and the suffix is set to
tests
in the lines above).
Note that this mechanism uses a DBT feature, and that the same applies to the GE tests.
Finally, although this is not really recommended, you can send the 2 different types of outputs to 2 schemas:
core
in the lines above corresponds to the modelstest
corresponds to the failing test results
This DBT configuration file goes into ~/.dbt/profiles.yml
. If you don't need to customise it, the Jinja template
is used by the tool to generate the final config file, using the connection parameters for the DOTdb in the dot_config file.
First sight there is no good reason to customise this config file.
This file goes into the great expectations main folder. Starting from this Jinja template a config file is generated into the destination great expectations main folder.
This file goes into the great expectations main folder. The Jinja template generates a file copied into the great expectations main folder.
There are no obvious reasons why you may want to customize this file.
Starting from this Jinja template
the GE configuration file goes into dot/great_expectations/uncommitted/config_variables.yml
First sight there is no good reason to customise this config file.