diff --git a/.github/workflows/pull_request.yml b/.github/workflows/pull_request.yml index 000dc9d..7fd6fb1 100644 --- a/.github/workflows/pull_request.yml +++ b/.github/workflows/pull_request.yml @@ -14,20 +14,6 @@ jobs: run: ruff check . secrets: inherit - black: - name: Lint with black - uses: ./.github/workflows/python-job.yml - with: - run: black --check . - secrets: inherit - - isort: - name: Lint with isort - uses: ./.github/workflows/python-job.yml - with: - run: isort --check . - secrets: inherit - test: name: Run tests uses: ./.github/workflows/python-job.yml diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2ff7524..7f0ee88 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -16,4 +16,4 @@ Join our [Slack community](https://data-observability-slack.datakitchen.io/join) ## Code Submission -We are currently working on our development and CI process. Stay tuned for our developer contribution guide for working with the repository code and submitting pull requests. +Follow our [development guide](docs/local_development.md) to work with the repository code locally and submit contributions. diff --git a/README.md b/README.md index 3114ea2..badf1a5 100644 --- a/README.md +++ b/README.md @@ -1,418 +1,144 @@ -# DataOps Data Quality TestGen +# DataOps Data Quality TestGen ![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/#!dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join) -*

DataOps Data Quality TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. DataOps TestGen is part of DataKitchen's Open Source Data Observability.

* +*

DataOps Data Quality TestGen, or "TestGen" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.

* ## Features -What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and find data issues in new data. +What does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and find data issues in new data.

-DatKitchen Open Source Data Quality TestGen Features - New Data +DatKitchen Open Source Data Quality TestGen Features - New Data

It constantly watches your data for data quality anomalies and lets you drill into problems.

-DataKitchen Open Source Data Quality TestGen Features - Data Ingestion and Quality Testing +DataKitchen Open Source Data Quality TestGen Features - Data Ingestion and Quality Testing

A single place to manage Data Quality across data sets, locations, and teams.

-DataKitchen Open Source Data Quality TestGen Features - Singel Placeg +DataKitchen Open Source Data Quality TestGen Features - Single Place

## Installation -The dk-installer program [installs DataOps Data Quality TestGen](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application). Install the required software for TestGen and download the installer program to a new directory on your computer. -### Using dk-installer (recommended) -Install with a single command using [`dk-installer`](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application). +The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) program installs DataOps Data Quality TestGen. -``` -python3 dk-installer.py tg install -``` - -### Using docker compose -You can also install using the provided [`docker-compose.yml`](deploy/docker-compose.yml). - -Make a local copy of the compose file. -```bash -curl -o docker-compose.yml 'https://raw.githubusercontent.com/DataKitchen/dataops-testgen/main/deploy/docker-compose.yml' -``` - -If you are interested in integrating TestGen with DataKitchen Observability platform, edit the compose file and set values for the environment variables `OBSERVABILITY_API_URL` and `OBSERVABILITY_API_KEY`. +### Install the prerequisite software -Before running docker compose, create a `.env` to hold the secrets needed to run Testgen. -```bash -touch testgen.env -``` +| Software | Tested Versions | Command to check version | +|-------------------------|-------------------------|-------------------------------| +| [Python](https://www.python.org/downloads/)
- Most Linux and macOS systems have Python pre-installed.
- On Windows machines, you will need to download and install it. | 3.9, 3.10, 3.11, 3.12 | `python3 --version` | +| [Docker](https://docs.docker.com/get-docker/)
[Docker Compose](https://docs.docker.com/compose/install/) | 25.0.3, 26.1.1,
2.24.6, 2.27.0, 2.28.1 | `docker -v`
`docker compose version` | -The following variables are required: -``` -TESTGEN_USERNAME= -TESTGEN_PASSWORD= -TG_DECRYPT_SALT= -TG_DECRYPT_PASSWORD= -``` +### Download the installer -You can learn about how each variable is used in [Configuration](#configuration) +On Unix-based operating systems, use the following command to download it to the current directory. We recommend creating a new, empty directory. -Then, run docker compose to start the services: -```bash -docker compose --env-file testgen.env up --detach +```shell +curl -o dk-installer.py 'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py' ``` -This will spin up a postgres service, a startup service which runs once to setup the database and, make the Testgen UI available at http://localhost:8501. +* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from the [data-observability-installer](https://github.com/DataKitchen/data-observability-installer) repository. +* All commands listed below should be run from the folder containing this file. +* For usage help and command options, run `python3 dk-installer.py --help` or `python3 dk-installer.py --help`. -After verifying that Testgen is running, follow [the steps for the quick start](#quick-start) to start getting familiar with the tool. +### Install the TestGen application -## Quick start -Testgen includes a basic data set for you to play around. +The installation downloads the latest Docker images for TestGen and deploys a new Docker Compose application. The process may take 5~10 minutes depending on your machine and network connection. -### Using dk-installer (recommended) -Once Testgen is running, you can use [`dk-installer`](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#run-the-testgen-demo-setup) to generate the demo data: -```bash -python3 dk-installer.py tg run-demo -``` - -And, if you are integrating Testgen with the DataKitchen Observability platform, you will need to pass the `--export` -flag: -```bash -python3 dk-installer.py tg run-demo --export -``` - -### Using docker compose -You can also generate the demo data if you installed using docker compose. Set it up by using the Testgen CLI to run the quick start command: -```bash -docker compose --env-file testgen.env exec engine testgen quick-start +```shell +python3 dk-installer.py tg install ``` -It also supports setting up the integration with DataKitchen Observability: -```bash -docker compose --env-file testgen.env exec engine testgen quick-start --observability-api-url --observability-api-key -``` -**NOTE:** You don't need to pass the Observability URL and key as arguments if you set them up as environment variables in your compose file. +The `--port` option may be used to set a custom localhost port for the application (default: 8501). +To enable SSL for HTTPS support, use the `--ssl-cert-file` and `--ssl-key-file` options to specify local file paths to your SSL certificate and key files. -After you have the demo data from the `quick-start` command, follow the following steps to complete the quick start: +Once the installation completes, verify that you can login to the UI with the URL and credentials provided in the output. -1. Run profiling against the target demo database -```bash -docker compose --env-file testgen.env exec engine testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d -``` +### Optional: Run the TestGen demo setup -2. Generate tests cases for all columns in the target demo database -```bash -docker compose --env-file testgen.env exec engine testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d -``` +The [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams. -3. Run the generated tests -```bash -docker compose --env-file testgen.env exec engine testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1 -``` - -4. Export the test results to Observability -```bash -docker compose --env-file testgen.env exec engine testgen export-observability --project-key DEFAULT --test-suite-key default-suite-1 -``` - -5. Simulate changes to the demo data -```bash -docker compose --env-file testgen.env exec engine testgen quick-start --simulate-fast-forward -``` - -6. And, export the test results over the simulated changes to Observability -```bash -docker compose --env-file testgen.env exec engine testgen export-observability --project-key DEFAULT --test-suite-key default-suite-1 +```shell +python3 dk-installer.py tg run-demo ``` -## Configuration - -#### `TESTGEN_DEBUG` - -Invalidates the cache with the bootstrapped application causing the changes to the routing and plugins to take effect -on every render. - -Also, changes the logging level for the `testgen.ui` logger from `INFO` to `DEBUG`. - -default: `no` - -### `TESTGEN_LOG_TO_FILE` -Set it to `yes` to enable rotating file logs to be written under `/var/log/testgen/`. - -default: `no` - -#### `TG_DECRYPT_SALT` - -Salt used to encrypt and decrypt user secrets. Only allows ascii characters. - -A minimun length of 16 characters is recommended. - -#### `TG_DECRYPT_PASSWORD` - -Secret passcode used in combination with `TG_DECRYPT_SALT` to encrypt and decrypt user secrets. Only allows ascii characters. - -#### `TESTGEN_USERNAME` - -Username to log into the web application. - -#### `TESTGEN_PASSWORD` - -Password to log into the web application. - -#### `TG_METADATA_DB_USER` - -User to connect to the testgen application postgres database. - -default: `os.environ["TESTGEN_USERNAME"]` - -#### `TG_METADATA_DB_PASSWORD` - -Password to connect to the testgen application postgres database. - -default: `os.environ["TESTGEN_PASSWORD"]` - -#### `DATABASE_ADMIN_USER` - -User with admin privileges in the testgen application postgres database used to create roles, users, database and schema. Required if the user in `TG_METADATA_DB_USER` does not have the required privileges. - -default: `os.environ["TG_METADATA_DB_USER"]` | - -#### `DATABASE_ADMIN_PASSWORD` - -Password for the admin user to connect to the testgen application postgres database. - -default: `os.environ["TG_METADATA_DB_PASSWORD"]` - -#### `DATABASE_EXECUTE_USER` - -User to be created into the testgen application postgres database. - -Will be granted: -- read/write to tables `test_results`, `test_suites` and `test_definitions` -- read only to all other tables. - -default: `testgen_execute` - -#### `DATABASE_REPORT_USER` - -User to be created into the testgen application postgres database. Will be granted read_only access to all tables. - -default: `testgen_report` - -#### `TG_METADATA_DB_HOST` - -Hostname where the testgen application postgres database is running in. - -default: `localhost` - -#### `TG_METADATA_DB_PORT` - -Port at which the testgen application postgres database is exposed by the host. - -default: `5432` - -#### `TG_METADATA_DB_NAME` - -Name of the database in postgres on which to store testgen metadata. - -default: `datakitchen` - -#### `TG_METADATA_DB_SCHEMA` - -Name of the schema inside the postgres database on which to store testgen metadata. - -default: `testgen` - -#### `PROJECT_KEY` +In the TestGen UI, you will see that new data profiling and test results have been generated. -Code used to uniquely identify the auto generated project. +## Product Documentation -default: `DEFAULT` +[DataOps Data Quality TestGen](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) -#### `DEFAULT_PROJECT_NAME` +## Useful Commands -Name to assign to the auto generated project. +The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yml` files used by the installation. -default: `Demo` +### Remove demo data -#### `PROJECT_SQL_FLAVOR` +After completing the quickstart, you can remove the demo data from the application with the following command. -SQL flavor of the database the auto generated project will run tests against. - -Supported flavors: -- `redshift` -- `snowflake` -- `mssql` -- `postgresql` - -default: `postgresql` - -#### `PROJECT_CONNECTION_NAME` - -Name assigned to identify the connection to the project database. - -default: `default` - -#### `PROJECT_CONNECTION_MAX_THREADS` - -Maximum number of concurrent queries executed when fetching data from the project database. - -default: `4` - -#### `PROJECT_CONNECTION_MAX_QUERY_CHAR` - -Determine how many tests are grouped together in a single query. Increase for better performance or decrease to better isolate test failures. Accepted values are 500 to 14 000. - -default: `5000` - -#### `PROJECT_QC_SCHEMA` - -Name of the schema to be created in the project database. - -default: `qc` - -#### `PROJECT_DATABASE_NAME` - -Name of the database the auto generated project will run test against. - -default: `demo_db` - -#### `PROJECT_DATABASE_SCHEMA` - -Name of the schema inside the project database the tests will be run against. - -default: `demo` - -#### `PROJECT_DATABASE_USER` - -User to be used by the auto generated project to connect to the database under testing. - -default: `os.environ["TG_METADATA_DB_USER"]` - -#### `PROJECT_DATABASE_USER` - -Password to be used by the auto generated project to connect to the database under testing. - -default: `os.environ["TG_METADATA_DB_PASSWORD"]` - -#### `PROJECT_DATABASE_HOST` - -Hostname where the database under testing is running in. - -default: `os.environ["TG_METADATA_DB_HOST"]` - -#### `PROJECT_DATABASE_PORT` - -Port at which the database under testing is exposed by the host. -default: `os.environ["TG_METADATA_DB_PORT"]` - -#### `TG_TARGET_DB_TRUST_SERVER_CERTIFICATE` - -For supported SQL flavors, set up the SQLAlchemy connection to trust the database server certificate. - -default: `no` - -#### `DEFAULT_TABLE_GROUPS_NAME` - -Name assigned to the auto generated table group. - -default: `default` - -#### `DEFAULT_TEST_SUITE_NAME` - -Key to be assgined to the auto generated test suite. - -default: `default-suite-1` - -#### `DEFAULT_TEST_SUITE_DESCRIPTION` - -Description for the auto generated test suite. - -default: `default_suite_desc` - -#### `DEFAULT_PROFILING_TABLE_SET` - -Comma separated list of specific table names to include when running profiling for the project database. - -#### `DEFAULT_PROFILING_INCLUDE_MASK` - -A SQL filter supported by the project database's `LIKE` operator for table names to include. - -default: `%%` - -#### `DEFAULT_PROFILING_EXCLUDE_MASK` - -A SQL filter supported by the project database's `LIKE` operator for table names to exclude. - -default: `tmp%%` - -#### `DEFAULT_PROFILING_ID_COLUMN_MASK` - -A SQL filter supported by the project database's `LIKE` operator representing ID columns. - -default: `%%id` - -#### `DEFAULT_PROFILING_SK_COLUMN_MASK` - -A SQL filter supported by the project database's `LIKE` operator representing surrogate key columns. - -default: `%%sk` - -#### `DEFAULT_PROFILING_USE_SAMPLING` - -Toggle on to base profiling on a sample of records instead of the full table. Accepts `Y` or `N`. - -default: `N` - -#### `OBSERVABILITY_API_URL` - -API URL of your instance of Observability where to send events to for the project. - -#### `OBSERVABILITY_API_KEY` - -Authentication key with permissions to send events created in your instance of Observability. +```shell +python3 dk-installer.py tg delete-demo +``` -#### `TG_EXPORT_TO_OBSERVABILITY_VERIFY_SSL` +### Upgrade to latest version -Exporting events to your instance of Observabilty verifies SSL certificate. +New releases of TestGen are announced on the `#releases` channel on [Data Observability Slack](https://data-observability-slack.datakitchen.io/join), and release notes can be found on the [DataKitchen documentation portal](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-release-notes/a/h1_1691719522). Use the following command to upgrade to the latest released version. -default: `yes` + ```shell + python3 dk-installer.py tg upgrade + ``` -#### `TG_OBSERVABILITY_EXPORT_MAX_QTY` +### Uninstall the application -When exporting to your instance of Observabilty, the maximum number of events that will be sent to the events API on a single export. +The following command uninstalls the Docker Compose application and removes all data, containers, and images related to TestGen from your machine. -default: `5000` +```shell +python3 dk-installer.py tg delete +``` -#### `OBSERVABILITY_DEFAULT_COMPONENT_TYPE` +### Access the _testgen_ CLI -When exporting to your instance of Observabilty, the type of event that will be sent to the events API. +The [_testgen_ command line](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container. -default: `dataset` +```shell +docker compose exec engine bash +``` -#### `OBSERVABILITY_DEFAULT_COMPONENT_KEY` +Use `exit` to return to the regular terminal. -When exporting to your instance of Observabilty, the key sent to the events API to identify the components. -default: `default` +### Stop the application -#### `TG_DOCKER_RELEASE_CHECK_ENABLED` +```shell +docker compose down +``` -Enables calling Docker Hub API to fetch the latest released image tag. The fetched tag is displayed in the UI menu. +### Restart the application -default: `yes` +```shell +docker compose up -d +``` -## Community +## What Next? -### Getting Started Guide +### Getting started guide We recommend you start by going through the [Data Observability Overview Demo](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview). ### Support -For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) and ask post on #support channel. +For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and post on the `#support` channel. + +### Connect to your database +Follow [these instructions](https://docs.datakitchen.io/articles/#!dataops-testgen-help/connect-your-database) to improve the quality of data in your database. -### Connect -Talk and Learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project. +### Community +Talk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project. Join our community here: +* 👋 [Join us on Slack](https://data-observability-slack.datakitchen.io/join), this is also how you get support (see above) + * 🌟 [Star us on GitHub](https://github.com/DataKitchen/data-observability-installer) * 🐦 [Follow us on Twitter](https://twitter.com/i/flow/login?redirect_after_login=%2Fdatakitchen_io) @@ -423,15 +149,13 @@ Join our community here: * 📚 [Read our blog posts](https://datakitchen.io/blog/) -* 👋 [Join us on Slack](https://data-observability-slack.datakitchen.io/join) - * 🗃 [Sign The DataOps Manifesto](https://DataOpsManifesto.org) * 🗃 [Sign The Data Journey Manifesto](https://DataJourneyManifesto.org) ### Contributing -For details on contributing or running the project for development, check out our contributing guide. +For details on contributing or running the project for development, check out our [contributing guide](CONTRIBUTING.md). ### License -DataKitchen DataOps TestGen is Apache 2.0 licensed. +DataKitchen's DataOps Data Quality TestGen is Apache 2.0 licensed. diff --git a/docker-compose.local.yml b/docker-compose.local.yml new file mode 100644 index 0000000..bb54c40 --- /dev/null +++ b/docker-compose.local.yml @@ -0,0 +1,21 @@ +name: local-testgen-db + +services: + local-postgres: + image: postgres:14.1-alpine + restart: always + environment: + - POSTGRES_USER=${TESTGEN_USERNAME} + - POSTGRES_PASSWORD=${TESTGEN_PASSWORD} + ports: + - 5432:5432 + volumes: + - local_postgres_data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U ${TESTGEN_USERNAME}"] + interval: 8s + timeout: 5s + retries: 3 + +volumes: + local_postgres_data: \ No newline at end of file diff --git a/deploy/docker-compose.yml b/docker-compose.yml similarity index 67% rename from deploy/docker-compose.yml rename to docker-compose.yml index c010f49..929bb50 100644 --- a/deploy/docker-compose.yml +++ b/docker-compose.yml @@ -1,4 +1,4 @@ -version: "3.8" + name: testgen x-common-variables: &common-variables @@ -9,31 +9,24 @@ x-common-variables: &common-variables TG_METADATA_DB_HOST: postgres TG_TARGET_DB_TRUST_SERVER_CERTIFICATE: yes TG_EXPORT_TO_OBSERVABILITY_VERIFY_SSL: no - TG_DOCKER_RELEASE_CHECK_ENABLED: no + TG_DOCKER_RELEASE_CHECK_ENABLED: yes + services: engine: image: datakitchen/dataops-testgen:v2 container_name: testgen environment: *common-variables + volumes: + - testgen_data:/var/lib/testgen ports: - 8501:8501 extra_hosts: - host.docker.internal:host-gateway depends_on: - - startup - - startup: - image: datakitchen/dataops-testgen:v2 - restart: "no" - environment: *common-variables - entrypoint: ["/bin/sh","-c"] - command: - - | - testgen setup-system-db --yes - testgen upgrade-system-version - depends_on: - - postgres + - postgres + networks: + - datakitchen postgres: image: postgres:14.1-alpine @@ -43,6 +36,19 @@ services: - POSTGRES_PASSWORD=${TESTGEN_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U ${TESTGEN_USERNAME}"] + interval: 8s + timeout: 5s + retries: 3 + networks: + - datakitchen volumes: postgres_data: + testgen_data: + +networks: + datakitchen: + name: datakitchen-network + external: true diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000..2b844b1 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,296 @@ +## TestGen Configuration + +This document describes the environment variables supported by TestGen. + +#### `TESTGEN_DEBUG_LOG_LEVEL` + +Sets logs to the debug level. + +default: `no` + +#### `TESTGEN_DEBUG` + +Invalidates the cache with the bootstrapped application causing the changes to the routing and plugins to take effect +on every render. + +Also, changes the logging level for the `testgen.ui` logger from `INFO` to `DEBUG`. + +default: `no` + +#### `TESTGEN_LOG_TO_FILE` + +Enables generation of rotating file logs. + +default: `yes` + +#### `TESTGEN_LOG_FILE_PATH` + +File path under which to generate rotating file logs, when `TESTGEN_LOG_TO_FILE` is turned on. + +default: `/var/lib/testgen/log` + +#### `TESTGEN_LOG_FILE_MAX_QTY` + +Maximum log files to keep (one file per day), when `TESTGEN_LOG_TO_FILE` is turned on. + +default: `90` + +#### `TG_DECRYPT_SALT` + +Salt used to encrypt and decrypt user secrets. Only allows ascii characters. + +A minimun length of 16 characters is recommended. + +#### `TG_DECRYPT_PASSWORD` + +Secret passcode used in combination with `TG_DECRYPT_SALT` to encrypt and decrypt user secrets. Only allows ascii characters. + +#### `TESTGEN_USERNAME` + +Username to log into the web application. + +#### `TESTGEN_PASSWORD` + +Password to log into the web application. + +#### `TG_METADATA_DB_USER` + +User to connect to the testgen application postgres database. + +default: `os.environ["TESTGEN_USERNAME"]` + +#### `TG_METADATA_DB_PASSWORD` + +Password to connect to the testgen application postgres database. + +default: `os.environ["TESTGEN_PASSWORD"]` + +#### `DATABASE_ADMIN_USER` + +User with admin privileges in the testgen application postgres database used to create roles, users, database and schema. Required if the user in `TG_METADATA_DB_USER` does not have the required privileges. + +default: `os.environ["TG_METADATA_DB_USER"]` | + +#### `DATABASE_ADMIN_PASSWORD` + +Password for the admin user to connect to the testgen application postgres database. + +default: `os.environ["TG_METADATA_DB_PASSWORD"]` + +#### `DATABASE_EXECUTE_USER` + +User to be created into the testgen application postgres database. + +Will be granted: +- read/write to tables `test_results`, `test_suites` and `test_definitions` +- read only to all other tables. + +default: `testgen_execute` + +#### `DATABASE_REPORT_USER` + +User to be created into the testgen application postgres database. Will be granted read_only access to all tables. + +default: `testgen_report` + +#### `TG_METADATA_DB_HOST` + +Hostname where the testgen application postgres database is running in. + +default: `localhost` + +#### `TG_METADATA_DB_PORT` + +Port at which the testgen application postgres database is exposed by the host. + +default: `5432` + +#### `TG_METADATA_DB_NAME` + +Name of the database in postgres on which to store testgen metadata. + +default: `datakitchen` + +#### `TG_METADATA_DB_SCHEMA` + +Name of the schema inside the postgres database on which to store testgen metadata. + +default: `testgen` + +#### `PROJECT_KEY` + +Code used to uniquely identify the auto generated project. + +default: `DEFAULT` + +#### `DEFAULT_PROJECT_NAME` + +Name to assign to the auto generated project. + +default: `Demo` + +#### `PROJECT_SQL_FLAVOR` + +SQL flavor of the database the auto generated project will run tests against. + +Supported flavors: +- `redshift` +- `snowflake` +- `mssql` +- `postgresql` + +default: `postgresql` + +#### `PROJECT_CONNECTION_NAME` + +Name assigned to identify the connection to the project database. + +default: `default` + +#### `PROJECT_CONNECTION_MAX_THREADS` + +Maximum number of concurrent queries executed when fetching data from the project database. + +default: `4` + +#### `PROJECT_CONNECTION_MAX_QUERY_CHAR` + +Determine how many tests are grouped together in a single query. Increase for better performance or decrease to better isolate test failures. Accepted values are 500 to 14 000. + +default: `5000` + +#### `PROJECT_QC_SCHEMA` + +Name of the schema to be created in the project database. + +default: `qc` + +#### `PROJECT_DATABASE_NAME` + +Name of the database the auto generated project will run test against. + +default: `demo_db` + +#### `PROJECT_DATABASE_SCHEMA` + +Name of the schema inside the project database the tests will be run against. + +default: `demo` + +#### `PROJECT_DATABASE_USER` + +User to be used by the auto generated project to connect to the database under testing. + +default: `os.environ["TG_METADATA_DB_USER"]` + +#### `PROJECT_DATABASE_USER` + +Password to be used by the auto generated project to connect to the database under testing. + +default: `os.environ["TG_METADATA_DB_PASSWORD"]` + +#### `PROJECT_DATABASE_HOST` + +Hostname where the database under testing is running in. + +default: `os.environ["TG_METADATA_DB_HOST"]` + +#### `PROJECT_DATABASE_PORT` + +Port at which the database under testing is exposed by the host. +default: `os.environ["TG_METADATA_DB_PORT"]` + +#### `TG_TARGET_DB_TRUST_SERVER_CERTIFICATE` + +For supported SQL flavors, set up the SQLAlchemy connection to trust the database server certificate. + +default: `no` + +#### `DEFAULT_TABLE_GROUPS_NAME` + +Name assigned to the auto generated table group. + +default: `default` + +#### `DEFAULT_TEST_SUITE_NAME` + +Key to be assgined to the auto generated test suite. + +default: `default-suite-1` + +#### `DEFAULT_TEST_SUITE_DESCRIPTION` + +Description for the auto generated test suite. + +default: `default_suite_desc` + +#### `DEFAULT_PROFILING_TABLE_SET` + +Comma separated list of specific table names to include when running profiling for the project database. + +#### `DEFAULT_PROFILING_INCLUDE_MASK` + +A SQL filter supported by the project database's `LIKE` operator for table names to include. + +default: `%%` + +#### `DEFAULT_PROFILING_EXCLUDE_MASK` + +A SQL filter supported by the project database's `LIKE` operator for table names to exclude. + +default: `tmp%%` + +#### `DEFAULT_PROFILING_ID_COLUMN_MASK` + +A SQL filter supported by the project database's `LIKE` operator representing ID columns. + +default: `%%id` + +#### `DEFAULT_PROFILING_SK_COLUMN_MASK` + +A SQL filter supported by the project database's `LIKE` operator representing surrogate key columns. + +default: `%%sk` + +#### `DEFAULT_PROFILING_USE_SAMPLING` + +Toggle on to base profiling on a sample of records instead of the full table. Accepts `Y` or `N`. + +default: `N` + +#### `OBSERVABILITY_API_URL` + +API URL of your instance of Observability where to send events to for the project. + +#### `OBSERVABILITY_API_KEY` + +Authentication key with permissions to send events created in your instance of Observability. + +#### `TG_EXPORT_TO_OBSERVABILITY_VERIFY_SSL` + +Exporting events to your instance of Observabilty verifies SSL certificate. + +default: `yes` + +#### `TG_OBSERVABILITY_EXPORT_MAX_QTY` + +When exporting to your instance of Observabilty, the maximum number of events that will be sent to the events API on a single export. + +default: `5000` + +#### `OBSERVABILITY_DEFAULT_COMPONENT_TYPE` + +When exporting to your instance of Observabilty, the type of event that will be sent to the events API. + +default: `dataset` + +#### `OBSERVABILITY_DEFAULT_COMPONENT_KEY` + +When exporting to your instance of Observabilty, the key sent to the events API to identify the components. +default: `default` + +#### `TG_DOCKER_RELEASE_CHECK_ENABLED` + +Enables calling Docker Hub API to fetch the latest released image tag. The fetched tag is displayed in the UI menu. + +default: `yes` diff --git a/docs/local_development.md b/docs/local_development.md new file mode 100644 index 0000000..0687aa0 --- /dev/null +++ b/docs/local_development.md @@ -0,0 +1,99 @@ +# Local Environment Setup + +This document describes how to set up your local environment for TestGen development. + +### Prerequisites + +- [Git](https://github.com/git-guides/install-git) +- [Python 3](https://www.python.org/downloads/) +- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) + +### Clone repository + +Login to your GitHub account. + +Fork the [dataops-testgen](https://github.com/DataKitchen/dataops-testgen) repository, following [GitHub's guide](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo). + +Clone your forked repository locally. +```shell +git clone https://github.com/YOUR-USERNAME/dataops-testgen +``` + +### Set up virtual environment + +From the root of your local repository, create a Python virtual environment. +```shell +python3.10 -m venv venv +``` + +Activate the environment. +```shell +source venv/bin/activate +``` + +### Install dependencies + +Install the Python dependencies in editable mode. +```shell +# On Linux +pip install -e .[dev] + +# On Mac +pip install -e .'[dev]' +``` + +On Mac, you can optionally install [watchdog](https://github.com/gorakhargosh/watchdog) for better performance of the [file watcher](https://docs.streamlit.io/develop/api-reference/configuration/config.toml) used for local development. +```shell +xcode-select --install +pip install watchdog +``` + +### Set environment variables + +Create a `local.env` file with the following environment variables, replacing the `` placeholders with appropriate values. Refer to the [TestGen Configuration](configuration.md) document for other supported values. +```shell +export TESTGEN_DEBUG=yes +export TESTGEN_LOG_TO_FILE=no +export TESTGEN_USERNAME= +export TESTGEN_PASSWORD= +export TG_DECRYPT_SALT= +export TG_DECRYPT_PASSWORD= +``` + +Source the file to apply the environment variables. +```shell +source local.env +``` + +### Set up Postgres instance + +Run a PostgreSQL instance as a Docker container. + +```shell +docker compose -f docker-compose.local.yml up -d +``` + +Initialize the application database for TestGen. +```shell +testgen setup-system-db --yes +``` + +Seed the demo data. +```shell +testgen quick-start --delete-target-db +testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d +testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d +testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1 +testgen quick-start --simulate-fast-forward +``` + +### Patch and run Streamlit +Patch the Streamlit package with our custom files. +```shell +testgen ui patch-streamlit -f +``` + +Run the local Streamlit-based TestGen application. It will open the browser at [http://localhost:8501](http://localhost:8501). +```shell +testgen ui run +``` diff --git a/invocations/dev.py b/invocations/dev.py index 731c19b..697b89f 100644 --- a/invocations/dev.py +++ b/invocations/dev.py @@ -30,8 +30,6 @@ def install(ctx: Context, quiet_pip: bool = False) -> None: @task def lint(ctx: Context) -> None: """Runs the standard suite of quality/linting tools.""" - ctx.run("isort .") - ctx.run("black .") ctx.run("ruff check . --fix --show-fixes") print("Lint complete!")