Skip to content

Commit

Permalink
[FEATURE] Notebooks for 1.0.0a4 (#18)
Browse files Browse the repository at this point in the history
* Namespace 1.0.0a2 demos

* Add gx-1.0.0a4
  • Loading branch information
tyler-hoffman authored Jun 10, 2024
1 parent 3934ece commit 8b27c89
Show file tree
Hide file tree
Showing 14 changed files with 615 additions and 14 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@

**/.ipynb_checkpoints/
**/__pycache__/
**/gx/
21 changes: 9 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
# GX Community Demo 2024-04-16
# 1.0.0 Prerelease Demos

Demos of expectation authoring and validation workflows for great-expectations 1.0.
## About this repository

These demos use python 3.10 with [1.0.0a2](https://pypi.org/project/great-expectations/1.0.0a2/).

## Getting started
1. Create a virtual environment: `python -m venv .venv`
1. Source the virtual environment: `source .venv/bin/activate`
1. Install requirements: `pip install -r requirements.txt`
1. Start the postgres container: `./scripts/run_dockerized_pg.sh`
1. Run the notebooks in `demos/`
The included scripts and notebooks are intended to give a quick look at API changes in 1.0.0.
Scripts that use postgres all use the same postgres image described below.
Demos and scripts are found under the associated `gx-<VERSION>` directory, and a corresponding `requirements.txt` is included in each that pins the great_expectations version.

### Additional steps for Macs with Apple Silicon (e.g. M1)
You may need to run these additional steps
Expand All @@ -20,9 +15,11 @@ You may need to run these additional steps

## Project Structure

* `gx-*/`
* `demos/`: full, working versions of demos as notebooks
* `scripts/`: full, working python scripts
* `requirements.txt`: requirements files for the specific version
* `scripts/`: setup scripts
* `demos/`: full, working versions of demos as jupyter notebooks


## Running PostgreSQL

Expand Down
25 changes: 25 additions & 0 deletions gx-1.0.0a2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# GX Community Demo 2024-04-16

Demos of expectation authoring and validation workflows for great-expectations 1.0.

These demos use python 3.10 with [1.0.0a2](https://pypi.org/project/great-expectations/1.0.0a2/).

## Getting started
1. Create a virtual environment: `python -m venv .venv`
1. Source the virtual environment: `source .venv/bin/activate`
1. Install requirements: `pip install -r requirements.txt`
1. Start the postgres container: `../scripts/run_dockerized_pg.sh`
1. Run the notebooks in `demos/`

## Running PostgreSQL

We've prepared a dockerized PostgreSQL with sample data that you can run with the command:
```
../scripts/run_dockerized_pg.sh
```

If you want to connect to the database to manually execute SQL queries you can run:
```
./scripts/psql_dockerized_postgres.sh
```
This `psql` script has the connection string if you would like to use a different tool to connect to the database.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -52,7 +52,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down
File renamed without changes.
File renamed without changes.
15 changes: 15 additions & 0 deletions gx-1.0.0a4/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# great_expectations 1.0.0a4 demos

Demos of expectation authoring and validation workflows for great-expectations 1.0.

These demos use python 3.10 with [1.0.0a4](https://pypi.org/project/great-expectations/1.0.0a4/).

## Notes about these scripts
The scripts in this directory will run against 1.0.0a4, and include TODOs on future changes planning for subsequent prereleases of 1.0.0.

## Getting started
1. Create a virtual environment: `python -m venv .venv`
1. Source the virtual environment: `source .venv/bin/activate`
1. Install requirements: `pip install -r requirements.txt`
1. Start the postgres container: `../scripts/run_dockerized_pg.sh`
1. Run the notebooks in `demos/`
249 changes: 249 additions & 0 deletions gx-1.0.0a4/demos/01-authoring_expectation_suites.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Authoring Expectation Suites\n",
"\n",
"This notebook will walk you through interactively creating an expectation suite. Contrast it to 0.18 [here](https://docs.greatexpectations.io/docs/oss/guides/expectations/how_to_create_and_edit_expectations_with_instant_feedback_from_a_sample_batch_of_data/). One of the biggest high-level changes is to move away from the ambiguous side effects of 0.18's Validator to a more explicit flow."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports\n",
"\n",
"We'll start by importing everything we'll use in the notebook. A couple things to highlight:\n",
"* We still import `great_expectations` as `gx`\n",
"* We now import `great_expectations.expectations` as `gxe`. In 1.0, Expectations are top-level classes namespaced to `gxe`.\n",
"* This repo contains a `constants.py` file used across the different notebooks. We'll import the relevant files from there."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import great_expectations as gx\n",
"import great_expectations.exceptions as exceptions\n",
"import great_expectations.expectations as gxe\n",
"from great_expectations.core.expectation_suite import ExpectationSuite\n",
"from great_expectations.datasource.fluent.interfaces import Datasource\n",
"\n",
"from constants import (\n",
" DB_CONNECTION_STRING,\n",
" TABLE_NAME,\n",
" DATASOURCE_NAME,\n",
" ASSET_NAME,\n",
" SUITE_NAME,\n",
" BATCH_DEFINITION_NAME_WHOLE_TABLE,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connecting to data\n",
"\n",
"Next let's connect to our data.\n",
"Datasources and DataAssets are largely unchanged in 1.0.\n",
"\n",
"A new concept (that we'll touch on in greater detail later with partitioning) is the \"Batch Definition\", which describes what data will be validated in each run.\n",
"In this example, we'll use our whole asset with the \"whole table\" Batch Definition.\n",
"\n",
"We'll use our Batch Definition to get a concrete Batch we can validate expectations against."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"context = gx.get_context(mode=\"file\")\n",
"\n",
"try:\n",
" datasource = context.data_sources.add_postgres(DATASOURCE_NAME, connection_string=DB_CONNECTION_STRING)\n",
" data_asset = datasource.add_table_asset(name=ASSET_NAME, table_name=TABLE_NAME)\n",
" \n",
" batch_definition = data_asset.add_batch_definition_whole_table(BATCH_DEFINITION_NAME_WHOLE_TABLE)\n",
"\n",
" print(\"Created entities\")\n",
"\n",
"except exceptions.DataContextError:\n",
" datasource = context.get_datasource(DATASOURCE_NAME)\n",
" assert isinstance(datasource, Datasource)\n",
" data_asset = datasource.get_asset(asset_name=ASSET_NAME)\n",
" batch_definition = next(bd for bd in data_asset.batch_definitions if bd.name == BATCH_DEFINITION_NAME_WHOLE_TABLE)\n",
"\n",
" print(\"Entities alread exist - loaded them\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Expectation creation\n",
"\n",
"Let's create our first gx 1.0 Expectation!\n",
"\n",
"Expectation classes are exposed directly in gx 1.0, and are statically typed using [Pydantic](https://docs.pydantic.dev/latest/).\n",
"Inellisense will show valid arguments and their types.\n",
"\n",
"We can test the expectation against our batch without the use of a Validator, [as was needed in gx 0.18](https://docs.greatexpectations.io/docs/oss/guides/expectations/how_to_create_and_edit_expectations_with_instant_feedback_from_a_sample_batch_of_data).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"expectation = gxe.ExpectColumnMinToBeBetween(column=\"passenger_count\", min_value=4, max_value=5)\n",
"batch = batch_definition.get_batch()\n",
"batch.validate(expectation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Edit the expectation\n",
"\n",
"The expectation is currently failing, so let's update it and verify that it succeeds."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"expectation.min_value = 0\n",
"result = batch.validate(expectation)\n",
"\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating an Expectation Suite\n",
"\n",
"Now that we have an expectation, let's create an Expectation Suite and add our expectation to it. We'll also add an expectation while we're at it.\n",
"\n",
"You'll notice the call to `context.suites.add`. GX 1.0 moves toward a consistent API for collections object. You'll see similar collections under `context.checkpoints` and `context.validation_definitions`.\n",
"\n",
"Note that after this cell, the suite and its expectations have been persisted. The structure of the persisted JSON is **nearly** identical to that of 0.18, with the promotion of `notes` as a top-level entity."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" suite = context.suites.add(ExpectationSuite(name=SUITE_NAME))\n",
"\n",
" suite.add_expectation(expectation)\n",
" expectation = suite.add_expectation(gxe.ExpectColumnValuesToBeBetween(column=\"passenger_count\", min_value=0, max_value=4))\n",
" print(\"Expectation Suite created\")\n",
"except exceptions.DataContextError:\n",
" suite = context.suites.get(SUITE_NAME)\n",
" print(\"We've already added the suite\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Validating an Expectation Suite\n",
"\n",
"We can validate Expectation Suites against Batches the same way we validated Expectations against Batches."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"results = batch.validate(suite)\n",
"\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Editing an Expectation Suite\n",
"\n",
"To edit an expectation suite, we find the expectation we want and edit it directly"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"suite = context.suites.get(SUITE_NAME)\n",
"expectation = next(e for e in suite.expectations if isinstance(e, gxe.ExpectColumnValuesToBeBetween))\n",
"expectation.max_value = 10\n",
"\n",
"results = batch.validate(suite)\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save the Expectation Suite\n",
"\n",
"Now that the suite is updated, we can save it with `suite.save()`, or we can save just the individual expectation with `expectation.save()`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"suite.save()\n",
"\n",
"# OR\n",
"\n",
"expectation.save()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 8b27c89

Please sign in to comment.