-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEATURE] Notebooks for 1.0.0a4 (#18)
* Namespace 1.0.0a2 demos * Add gx-1.0.0a4
- Loading branch information
1 parent
3934ece
commit 8b27c89
Showing
14 changed files
with
615 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,3 +6,4 @@ | |
|
||
**/.ipynb_checkpoints/ | ||
**/__pycache__/ | ||
**/gx/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# GX Community Demo 2024-04-16 | ||
|
||
Demos of expectation authoring and validation workflows for great-expectations 1.0. | ||
|
||
These demos use python 3.10 with [1.0.0a2](https://pypi.org/project/great-expectations/1.0.0a2/). | ||
|
||
## Getting started | ||
1. Create a virtual environment: `python -m venv .venv` | ||
1. Source the virtual environment: `source .venv/bin/activate` | ||
1. Install requirements: `pip install -r requirements.txt` | ||
1. Start the postgres container: `../scripts/run_dockerized_pg.sh` | ||
1. Run the notebooks in `demos/` | ||
|
||
## Running PostgreSQL | ||
|
||
We've prepared a dockerized PostgreSQL with sample data that you can run with the command: | ||
``` | ||
../scripts/run_dockerized_pg.sh | ||
``` | ||
|
||
If you want to connect to the database to manually execute SQL queries you can run: | ||
``` | ||
./scripts/psql_dockerized_postgres.sh | ||
``` | ||
This `psql` script has the connection string if you would like to use a different tool to connect to the database. |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# great_expectations 1.0.0a4 demos | ||
|
||
Demos of expectation authoring and validation workflows for great-expectations 1.0. | ||
|
||
These demos use python 3.10 with [1.0.0a4](https://pypi.org/project/great-expectations/1.0.0a4/). | ||
|
||
## Notes about these scripts | ||
The scripts in this directory will run against 1.0.0a4, and include TODOs on future changes planning for subsequent prereleases of 1.0.0. | ||
|
||
## Getting started | ||
1. Create a virtual environment: `python -m venv .venv` | ||
1. Source the virtual environment: `source .venv/bin/activate` | ||
1. Install requirements: `pip install -r requirements.txt` | ||
1. Start the postgres container: `../scripts/run_dockerized_pg.sh` | ||
1. Run the notebooks in `demos/` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,249 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Authoring Expectation Suites\n", | ||
"\n", | ||
"This notebook will walk you through interactively creating an expectation suite. Contrast it to 0.18 [here](https://docs.greatexpectations.io/docs/oss/guides/expectations/how_to_create_and_edit_expectations_with_instant_feedback_from_a_sample_batch_of_data/). One of the biggest high-level changes is to move away from the ambiguous side effects of 0.18's Validator to a more explicit flow." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Imports\n", | ||
"\n", | ||
"We'll start by importing everything we'll use in the notebook. A couple things to highlight:\n", | ||
"* We still import `great_expectations` as `gx`\n", | ||
"* We now import `great_expectations.expectations` as `gxe`. In 1.0, Expectations are top-level classes namespaced to `gxe`.\n", | ||
"* This repo contains a `constants.py` file used across the different notebooks. We'll import the relevant files from there." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import great_expectations as gx\n", | ||
"import great_expectations.exceptions as exceptions\n", | ||
"import great_expectations.expectations as gxe\n", | ||
"from great_expectations.core.expectation_suite import ExpectationSuite\n", | ||
"from great_expectations.datasource.fluent.interfaces import Datasource\n", | ||
"\n", | ||
"from constants import (\n", | ||
" DB_CONNECTION_STRING,\n", | ||
" TABLE_NAME,\n", | ||
" DATASOURCE_NAME,\n", | ||
" ASSET_NAME,\n", | ||
" SUITE_NAME,\n", | ||
" BATCH_DEFINITION_NAME_WHOLE_TABLE,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Connecting to data\n", | ||
"\n", | ||
"Next let's connect to our data.\n", | ||
"Datasources and DataAssets are largely unchanged in 1.0.\n", | ||
"\n", | ||
"A new concept (that we'll touch on in greater detail later with partitioning) is the \"Batch Definition\", which describes what data will be validated in each run.\n", | ||
"In this example, we'll use our whole asset with the \"whole table\" Batch Definition.\n", | ||
"\n", | ||
"We'll use our Batch Definition to get a concrete Batch we can validate expectations against." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"context = gx.get_context(mode=\"file\")\n", | ||
"\n", | ||
"try:\n", | ||
" datasource = context.data_sources.add_postgres(DATASOURCE_NAME, connection_string=DB_CONNECTION_STRING)\n", | ||
" data_asset = datasource.add_table_asset(name=ASSET_NAME, table_name=TABLE_NAME)\n", | ||
" \n", | ||
" batch_definition = data_asset.add_batch_definition_whole_table(BATCH_DEFINITION_NAME_WHOLE_TABLE)\n", | ||
"\n", | ||
" print(\"Created entities\")\n", | ||
"\n", | ||
"except exceptions.DataContextError:\n", | ||
" datasource = context.get_datasource(DATASOURCE_NAME)\n", | ||
" assert isinstance(datasource, Datasource)\n", | ||
" data_asset = datasource.get_asset(asset_name=ASSET_NAME)\n", | ||
" batch_definition = next(bd for bd in data_asset.batch_definitions if bd.name == BATCH_DEFINITION_NAME_WHOLE_TABLE)\n", | ||
"\n", | ||
" print(\"Entities alread exist - loaded them\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Expectation creation\n", | ||
"\n", | ||
"Let's create our first gx 1.0 Expectation!\n", | ||
"\n", | ||
"Expectation classes are exposed directly in gx 1.0, and are statically typed using [Pydantic](https://docs.pydantic.dev/latest/).\n", | ||
"Inellisense will show valid arguments and their types.\n", | ||
"\n", | ||
"We can test the expectation against our batch without the use of a Validator, [as was needed in gx 0.18](https://docs.greatexpectations.io/docs/oss/guides/expectations/how_to_create_and_edit_expectations_with_instant_feedback_from_a_sample_batch_of_data).\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"expectation = gxe.ExpectColumnMinToBeBetween(column=\"passenger_count\", min_value=4, max_value=5)\n", | ||
"batch = batch_definition.get_batch()\n", | ||
"batch.validate(expectation)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Edit the expectation\n", | ||
"\n", | ||
"The expectation is currently failing, so let's update it and verify that it succeeds." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"expectation.min_value = 0\n", | ||
"result = batch.validate(expectation)\n", | ||
"\n", | ||
"print(result)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Creating an Expectation Suite\n", | ||
"\n", | ||
"Now that we have an expectation, let's create an Expectation Suite and add our expectation to it. We'll also add an expectation while we're at it.\n", | ||
"\n", | ||
"You'll notice the call to `context.suites.add`. GX 1.0 moves toward a consistent API for collections object. You'll see similar collections under `context.checkpoints` and `context.validation_definitions`.\n", | ||
"\n", | ||
"Note that after this cell, the suite and its expectations have been persisted. The structure of the persisted JSON is **nearly** identical to that of 0.18, with the promotion of `notes` as a top-level entity." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"try:\n", | ||
" suite = context.suites.add(ExpectationSuite(name=SUITE_NAME))\n", | ||
"\n", | ||
" suite.add_expectation(expectation)\n", | ||
" expectation = suite.add_expectation(gxe.ExpectColumnValuesToBeBetween(column=\"passenger_count\", min_value=0, max_value=4))\n", | ||
" print(\"Expectation Suite created\")\n", | ||
"except exceptions.DataContextError:\n", | ||
" suite = context.suites.get(SUITE_NAME)\n", | ||
" print(\"We've already added the suite\")\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Validating an Expectation Suite\n", | ||
"\n", | ||
"We can validate Expectation Suites against Batches the same way we validated Expectations against Batches." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"results = batch.validate(suite)\n", | ||
"\n", | ||
"print(results)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Editing an Expectation Suite\n", | ||
"\n", | ||
"To edit an expectation suite, we find the expectation we want and edit it directly" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"suite = context.suites.get(SUITE_NAME)\n", | ||
"expectation = next(e for e in suite.expectations if isinstance(e, gxe.ExpectColumnValuesToBeBetween))\n", | ||
"expectation.max_value = 10\n", | ||
"\n", | ||
"results = batch.validate(suite)\n", | ||
"print(results)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Save the Expectation Suite\n", | ||
"\n", | ||
"Now that the suite is updated, we can save it with `suite.save()`, or we can save just the individual expectation with `expectation.save()`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"suite.save()\n", | ||
"\n", | ||
"# OR\n", | ||
"\n", | ||
"expectation.save()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.14" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.