Skip to content

Commit

Permalink
Merge pull request #22 from DataKitchen/release/2.15.1
Browse files Browse the repository at this point in the history
Release: 2.15.1
  • Loading branch information
datakitchen-devops authored Oct 12, 2024
2 parents 72b3f39 + f2d3530 commit 6b13d54
Show file tree
Hide file tree
Showing 147 changed files with 6,306 additions and 5,094 deletions.
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,9 @@ RUN python3 -m pip install --no-deps /tmp/dk --prefix=/dk
ENV PYTHONPATH ${PYTHONPATH}:/dk/lib/python3.10/site-packages
ENV PATH="$PATH:/dk/bin:/opt/mssql-tools/bin/"

RUN TG_METADATA_DB_USER=- TG_METADATA_DB_PASSWORD=- TG_METADATA_DB_HOST=- TG_METADATA_DB_PORT=- testgen ui patch-streamlit

ARG TESTGEN_VERSION
ENV TESTGEN_VERSION=v$TESTGEN_VERSION
ENV TG_RELEASE_CHECK=docker

ENV STREAMLIT_SERVER_MAX_UPLOAD_SIZE=200

Expand Down
104 changes: 101 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ A <b>single place to manage Data Quality</b> across data sets, locations, and te
<img alt="DataKitchen Open Source Data Quality TestGen Features - Single Place" src="https://datakitchen.io/wp-content/uploads/2024/07/Screenshot-dataops-testgen-centralize.png" width="70%">
</p>

## Installation
## Installation with dk-installer (recommended)

The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) program installs DataOps Data Quality TestGen.
The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) program installs DataOps Data Quality TestGen as a [Docker Compose](https://docs.docker.com/compose/) application. This is the recommended mode of installation as Docker encapsulates and isolates the application from other software on your machine and does not require you to manage Python dependencies.

### Install the prerequisite software

Expand Down Expand Up @@ -75,9 +75,107 @@ python3 dk-installer.py tg run-demo

In the TestGen UI, you will see that new data profiling and test results have been generated.

## Installation with pip

As an alternative to the Docker Compose [installation with dk-installer (recommended)](#installation-with-dk-installer-recommended), DataOps Data Quality TestGen can also be installed as a Python package via [pip](https://pip.pypa.io/en/stable/). This mode of installation uses the [dataops-testgen](https://pypi.org/project/dataops-testgen/) package published to PyPI, and it requires a PostgreSQL instance to be provisioned for the application database.

### Install the prerequisite software

| Software | Tested Versions | Command to check version |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------------------|
| [Python](https://www.python.org/downloads/) <br/>- Most Linux and macOS systems have Python pre-installed. <br/>- On Windows machines, you will need to download and install it. | 3.10, 3.11, 3.12 | `python3 --version` |
| [PostgreSQL](https://www.postgresql.org/download/) | 14.1, 15.8, 16.4 | `psql --version`|

### Install the TestGen package

We recommend using a Python virtual environment to avoid any dependency conflicts with other applications installed on your machine. The [venv](https://docs.python.org/3/library/venv.html#creating-virtual-environments) module, which is part of the Python standard library, or other third-party tools, like [virtualenv](https://virtualenv.pypa.io/en/latest/) or [conda](https://docs.conda.io/en/latest/), can be used.

Create and activate a virtual environment with a TestGen-compatible version of Python (`>=3.10`). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.

_On Linux/Mac_
```shell
python3 -m venv venv
source venv/bin/activate
```

_On Windows_
```powershell
py -3.10 -m venv venv
venv\Scripts\activate
```

Within the virtual environment, install the TestGen package with pip.
```shell
pip install dataops-testgen
```

Verify that the [_testgen_ command line](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-commands-and-details) works.
```shell
testgen --help
```

### Set up the application database in PostgresSQL

Create a `local.env` file with the following environment variables, replacing the `<value>` placeholders with appropriate values. Refer to the [TestGen Configuration](docs/configuration.md) document for more details, defaults, and other supported configuration.
```shell
# Connection parameters for the PostgreSQL server
export TG_METADATA_DB_HOST=<postgres_hostname>
export TG_METADATA_DB_PORT=<postgres_port>

# Connection credentials for the PostgreSQL server
# This role must have privileges to create roles, users, database and schema so that the application database can be initialized
export TG_METADATA_DB_USER=<postgres_username>
export TG_METADATA_DB_PASSWORD=<postgres_password>

# Set a password and arbitrary string (the "salt") to be used for encrypting secrets in the application database
export TG_DECRYPT_PASSWORD=<encryption_password>
export TG_DECRYPT_SALT=<encryption_salt>

# Set credentials for the default admin user to be created for TestGen
export TESTGEN_USERNAME=<username>
export TESTGEN_PASSWORD=<password>

# Set an accessible path for storing application logs
export TESTGEN_LOG_FILE_PATH=<path_for_logs>
```

Source the file to apply the environment variables. For the Windows equivalent, refer to [this guide](https://bennett4.medium.com/windows-alternative-to-source-env-for-setting-environment-variables-606be2a6d3e1).
```shell
source local.env
```

Make sure the PostgreSQL database server is up and running. Initialize the application database for TestGen.
```shell
testgen setup-system-db --yes
```

### Run the TestGen UI

Run the following command to start the TestGen UI. It will open the browser at [http://localhost:8501](http://localhost:8501).

```shell
testgen ui run
```

Verify that you can login to the UI with the `TESTGEN_USERNAME` and `TESTGEN_PASSWORD` values that you configured in the environment variables.

### Optional: Run the TestGen demo setup

The [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.

```shell
testgen quick-start --delete-target-db
testgen run-profile --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d
testgen run-test-generation --table-group-id 0ea85e17-acbe-47fe-8394-9970725ad37d
testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1
testgen quick-start --simulate-fast-forward
```

In the TestGen UI, you will see that new data profiling and test results have been generated.

## Useful Commands

The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yml` files used by the installation.
The [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the TestGen application installed using dk-installer. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yml` files used by the installation.

### Remove demo data

Expand Down
32 changes: 19 additions & 13 deletions docs/local_development.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,34 @@ git clone https://github.com/YOUR-USERNAME/dataops-testgen

### Set up virtual environment

From the root of your local repository, create a Python virtual environment.
We recommend using a Python virtual environment to avoid any dependency conflicts with other applications installed on your machine. The [venv](https://docs.python.org/3/library/venv.html#creating-virtual-environments) module, which is part of the Python standard library, or other third-party tools, like [virtualenv](https://virtualenv.pypa.io/en/latest/) or [conda](https://docs.conda.io/en/latest/), can be used.

From the root of your local repository, create and activate a virtual environment with a TestGen-compatible version of Python (`>=3.10`). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.

_On Linux/Mac_
```shell
python3.10 -m venv venv
source venv/bin/activate
```

Activate the environment.
```shell
source venv/bin/activate
_On Windows_
```powershell
py -3.10 -m venv venv
venv\Scripts\activate
```

### Install dependencies

Install the Python dependencies in editable mode.

_On Linux_
```shell
# On Linux
pip install -e .[dev]
```

# On Mac
pip install -e .'[dev]'
_On Mac/Windows_
```shell
pip install -e ".[dev]"
```

On Mac, you can optionally install [watchdog](https://github.com/gorakhargosh/watchdog) for better performance of the [file watcher](https://docs.streamlit.io/develop/api-reference/configuration/config.toml) used for local development.
Expand All @@ -65,6 +74,8 @@ Source the file to apply the environment variables.
source local.env
```

For the Windows equivalent, refer to [this guide](https://bennett4.medium.com/windows-alternative-to-source-env-for-setting-environment-variables-606be2a6d3e1).

### Set up Postgres instance

Run a PostgreSQL instance as a Docker container.
Expand All @@ -87,12 +98,7 @@ testgen run-tests --project-key DEFAULT --test-suite-key default-suite-1
testgen quick-start --simulate-fast-forward
```

### Patch and run Streamlit
Patch the Streamlit package with our custom files.
```shell
testgen ui patch-streamlit -f
```

### Run Streamlit
Run the local Streamlit-based TestGen application. It will open the browser at [http://localhost:8501](http://localhost:8501).
```shell
testgen ui run
Expand Down
52 changes: 30 additions & 22 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,45 +7,43 @@ requires = [
build-backend = "setuptools.build_meta"

[project]
name = "data-ops-testgen"
version = "2.2.0"
description = "DataKitchen Inc. Data Quality Engine"
urls = { "homepage" = "https://datakitchen.io" }
name = "dataops-testgen"
version = "2.8.1"
description = "DataKitchen's Data Quality DataOps TestGen"
authors = [
{ "name" = "Charles Bloche", "email" = "[email protected]" },
{ "name" = "Tyler Stubenvoll", "email" = "[email protected]" },
{ "name" = "Alejandro Fernandez", "email" = "[email protected]" },
{ "name" = "Anuja Waikar", "email" = "[email protected]" },
{ "name" = "Shruthy Vakkil", "email" = "[email protected]" },
{ "name" = "Arnob Bordoloi", "email" = "[email protected]" },
{ "name" = "Saurabh Vashist", "email" = "[email protected]" },
{ "name" = "Saurabh Vaidya", "email" = "[email protected]" }
{ "name" = "DataKitchen, Inc.", "email" = "[email protected]" },
]
maintainers = [
{ "name" = "DataKitchen, Inc.", "email" = "[email protected]" },
]
license = { "text" = "CLOSED" }
readme = "README.md"
classifiers = [
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Development Status :: 5 - Production/Stable",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: System :: Monitoring",
]
keywords = [ "dataops", "data", "quality", "testing", "database", "profiling" ]
requires-python = ">=3.10"

dependencies = [
"PyYAML==6.0.1",
"click==8.1.3",
"sqlalchemy==1.4.46",
"snowflake-sqlalchemy==1.4.7",
"pyodbc==4.0.39",
"psycopg2-binary==2.9.6",
"pyodbc==5.0.0",
"psycopg2-binary==2.9.9",
"pycryptodome==3.17",
"prettytable==3.7.0",
"requests_extensions==1.1.3",
"bz2file==0.98",
"trogon==0.4.0",
"numpy==1.25.2",
"pandas==2.1.0",
"streamlit==1.26.0",
"numpy==1.26.4",
"pandas==2.1.4",
"streamlit==1.38.0",
"streamlit-extras==0.3.0",
"streamlit-aggrid==0.3.4.post3",
"streamlit-antd-components==0.2.2",
Expand All @@ -54,14 +52,14 @@ dependencies = [
"streamlit-option-menu==0.3.6",
"streamlit-authenticator==0.2.3",
"streamlit-javascript==0.1.5",
"streamlit-modal==0.1.0",
"progress==1.6",
"beautifulsoup4==4.12.3",
"trino==0.327.0",
"xlsxwriter==3.2.0",
"psutil==5.9.8",
"concurrent_log_handler==0.9.25",
"cryptography==42.0.8",
"validators==0.33.0",
]

[project.optional-dependencies]
Expand All @@ -79,12 +77,22 @@ dev = [
]

release = [
"bumpver==2023.1129"
"build==1.2.1",
"bumpver==2023.1129",
"twine==5.1.1",
]

[project.entry-points.console_scripts]
testgen = "testgen.__main__:cli"

[project.urls]
"Source Code" = "https://github.com/DataKitchen/dataops-testgen"
"Bug Tracker" = "https://github.com/DataKitchen/dataops-testgen/issues"
"Documentation" = "https://docs.datakitchen.io/articles/#!dataops-testgen-help/dataops-testgen-help"
"Release Notes" = "https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-release-notes"
"Slack" = "https://data-observability-slack.datakitchen.io/join"
"Homepage" = "https://example.com"

[tool.setuptools]
include-package-data = true

Expand Down
Loading

0 comments on commit 6b13d54

Please sign in to comment.