[FEATURE] Notebooks for 1.0.0a4 (#18)

* Namespace 1.0.0a2 demos * Add gx-1.0.0a4
great-expectations · Jun 10, 2024 · 8b27c89 · 8b27c89
1 parent 3934ece
commit 8b27c89
Show file tree

Hide file tree

Showing 14 changed files with 615 additions and 14 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@
 
 **/.ipynb_checkpoints/
 **/__pycache__/
+**/gx/
diff --git a/README.md b/README.md
@@ -1,15 +1,10 @@
-# GX Community Demo 2024-04-16
+# 1.0.0 Prerelease Demos
 
-Demos of expectation authoring and validation workflows for great-expectations 1.0.
+## About this repository
 
-These demos use python 3.10 with [1.0.0a2](https://pypi.org/project/great-expectations/1.0.0a2/).
-
-## Getting started
-1. Create a virtual environment: `python -m venv .venv`
-1. Source the virtual environment: `source .venv/bin/activate`
-1. Install requirements: `pip install -r requirements.txt`
-1. Start the postgres container: `./scripts/run_dockerized_pg.sh`
-1. Run the notebooks in `demos/`
+The included scripts and notebooks are intended to give a quick look at API changes in 1.0.0.
+Scripts that use postgres all use the same postgres image described below.
+Demos and scripts are found under the associated `gx-<VERSION>` directory, and a corresponding `requirements.txt` is included in each that pins the great_expectations version.
 
 ### Additional steps for Macs with Apple Silicon (e.g. M1)
 You may need to run these additional steps
@@ -20,9 +15,11 @@ You may need to run these additional steps
 
 ## Project Structure
 
+* `gx-*/`
+  * `demos/`: full, working versions of demos as notebooks
+  * `scripts/`: full, working python scripts
+  * `requirements.txt`: requirements files for the specific version
 * `scripts/`: setup scripts
-* `demos/`: full, working versions of demos as jupyter notebooks
-
 
 ## Running PostgreSQL
 

diff --git a/gx-1.0.0a2/README.md b/gx-1.0.0a2/README.md
@@ -0,0 +1,25 @@
+# GX Community Demo 2024-04-16
+
+Demos of expectation authoring and validation workflows for great-expectations 1.0.
+
+These demos use python 3.10 with [1.0.0a2](https://pypi.org/project/great-expectations/1.0.0a2/).
+
+## Getting started
+1. Create a virtual environment: `python -m venv .venv`
+1. Source the virtual environment: `source .venv/bin/activate`
+1. Install requirements: `pip install -r requirements.txt`
+1. Start the postgres container: `../scripts/run_dockerized_pg.sh`
+1. Run the notebooks in `demos/`
+
+## Running PostgreSQL
+
+We've prepared a dockerized PostgreSQL with sample data that you can run with the command:
+```
+../scripts/run_dockerized_pg.sh
+```
+
+If you want to connect to the database to manually execute SQL queries you can run:
+```
+./scripts/psql_dockerized_postgres.sh
+```
+This `psql` script has the connection string if you would like to use a different tool to connect to the database.
diff --git a/demos/01-authoring_expectation_suites.ipynb → ...mos/01-authoring_expectation_suites.ipynb b/demos/01-authoring_expectation_suites.ipynb → ...mos/01-authoring_expectation_suites.ipynb
diff --git a/...idation-definitions-and-checkpoints.ipynb → ...idation-definitions-and-checkpoints.ipynb b/...idation-definitions-and-checkpoints.ipynb → ...idation-definitions-and-checkpoints.ipynb
diff --git a/demos/03-sql_month_and_year.ipynb → gx-1.0.0a2/demos/03-sql_month_and_year.ipynb b/demos/03-sql_month_and_year.ipynb → gx-1.0.0a2/demos/03-sql_month_and_year.ipynb
@@ -25,7 +25,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -52,7 +52,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [

diff --git a/demos/constants.py → gx-1.0.0a2/demos/constants.py b/demos/constants.py → gx-1.0.0a2/demos/constants.py
diff --git a/requirements.txt → gx-1.0.0a2/requirements.txt b/requirements.txt → gx-1.0.0a2/requirements.txt
diff --git a/gx-1.0.0a4/README.md b/gx-1.0.0a4/README.md
@@ -0,0 +1,15 @@
+# great_expectations 1.0.0a4 demos
+
+Demos of expectation authoring and validation workflows for great-expectations 1.0.
+
+These demos use python 3.10 with [1.0.0a4](https://pypi.org/project/great-expectations/1.0.0a4/).
+
+## Notes about these scripts
+The scripts in this directory will run against 1.0.0a4, and include TODOs on future changes planning for subsequent prereleases of 1.0.0.
+
+## Getting started
+1. Create a virtual environment: `python -m venv .venv`
+1. Source the virtual environment: `source .venv/bin/activate`
+1. Install requirements: `pip install -r requirements.txt`
+1. Start the postgres container: `../scripts/run_dockerized_pg.sh`
+1. Run the notebooks in `demos/`
diff --git a/gx-1.0.0a4/demos/01-authoring_expectation_suites.ipynb b/gx-1.0.0a4/demos/01-authoring_expectation_suites.ipynb
@@ -0,0 +1,249 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Authoring Expectation Suites\n",
+    "\n",
+    "This notebook will walk you through interactively creating an expectation suite. Contrast it to 0.18 [here](https://docs.greatexpectations.io/docs/oss/guides/expectations/how_to_create_and_edit_expectations_with_instant_feedback_from_a_sample_batch_of_data/). One of the biggest high-level changes is to move away from the ambiguous side effects of 0.18's Validator to a more explicit flow."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Imports\n",
+    "\n",
+    "We'll start by importing everything we'll use in the notebook. A couple things to highlight:\n",
+    "* We still import `great_expectations` as `gx`\n",
+    "* We now import `great_expectations.expectations` as `gxe`. In 1.0, Expectations are top-level classes namespaced to `gxe`.\n",
+    "* This repo contains a `constants.py` file used across the different notebooks. We'll import the relevant files from there."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import great_expectations as gx\n",
+    "import great_expectations.exceptions as exceptions\n",
+    "import great_expectations.expectations as gxe\n",
+    "from great_expectations.core.expectation_suite import ExpectationSuite\n",
+    "from great_expectations.datasource.fluent.interfaces import Datasource\n",
+    "\n",
+    "from constants import (\n",
+    "    DB_CONNECTION_STRING,\n",
+    "    TABLE_NAME,\n",
+    "    DATASOURCE_NAME,\n",
+    "    ASSET_NAME,\n",
+    "    SUITE_NAME,\n",
+    "    BATCH_DEFINITION_NAME_WHOLE_TABLE,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Connecting to data\n",
+    "\n",
+    "Next let's connect to our data.\n",
+    "Datasources and DataAssets are largely unchanged in 1.0.\n",
+    "\n",
+    "A new concept (that we'll touch on in greater detail later with partitioning) is the \"Batch Definition\", which describes what data will be validated in each run.\n",
+    "In this example, we'll use our whole asset with the \"whole table\" Batch Definition.\n",
+    "\n",
+    "We'll use our Batch Definition to get a concrete Batch we can validate expectations against."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "context = gx.get_context(mode=\"file\")\n",
+    "\n",
+    "try:\n",
+    "    datasource = context.data_sources.add_postgres(DATASOURCE_NAME, connection_string=DB_CONNECTION_STRING)\n",
+    "    data_asset = datasource.add_table_asset(name=ASSET_NAME, table_name=TABLE_NAME)\n",
+    "    \n",
+    "    batch_definition = data_asset.add_batch_definition_whole_table(BATCH_DEFINITION_NAME_WHOLE_TABLE)\n",
+    "\n",
+    "    print(\"Created entities\")\n",
+    "\n",
+    "except exceptions.DataContextError:\n",
+    "    datasource = context.get_datasource(DATASOURCE_NAME)\n",
+    "    assert isinstance(datasource, Datasource)\n",
+    "    data_asset = datasource.get_asset(asset_name=ASSET_NAME)\n",
+    "    batch_definition = next(bd for bd in data_asset.batch_definitions if bd.name == BATCH_DEFINITION_NAME_WHOLE_TABLE)\n",
+    "\n",
+    "    print(\"Entities alread exist - loaded them\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Expectation creation\n",
+    "\n",
+    "Let's create our first gx 1.0 Expectation!\n",
+    "\n",
+    "Expectation classes are exposed directly in gx 1.0, and are statically typed using [Pydantic](https://docs.pydantic.dev/latest/).\n",
+    "Inellisense will show valid arguments and their types.\n",
+    "\n",
+    "We can test the expectation against our batch without the use of a Validator, [as was needed in gx 0.18](https://docs.greatexpectations.io/docs/oss/guides/expectations/how_to_create_and_edit_expectations_with_instant_feedback_from_a_sample_batch_of_data).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expectation = gxe.ExpectColumnMinToBeBetween(column=\"passenger_count\", min_value=4, max_value=5)\n",
+    "batch = batch_definition.get_batch()\n",
+    "batch.validate(expectation)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Edit the expectation\n",
+    "\n",
+    "The expectation is currently failing, so let's update it and verify that it succeeds."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expectation.min_value = 0\n",
+    "result = batch.validate(expectation)\n",
+    "\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Creating an Expectation Suite\n",
+    "\n",
+    "Now that we have an expectation, let's create an Expectation Suite and add our expectation to it. We'll also add an expectation while we're at it.\n",
+    "\n",
+    "You'll notice the call to `context.suites.add`. GX 1.0 moves toward a consistent API for collections object. You'll see similar collections under `context.checkpoints` and `context.validation_definitions`.\n",
+    "\n",
+    "Note that after this cell, the suite and its expectations have been persisted. The structure of the persisted JSON is **nearly** identical to that of 0.18, with the promotion of `notes` as a top-level entity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    suite = context.suites.add(ExpectationSuite(name=SUITE_NAME))\n",
+    "\n",
+    "    suite.add_expectation(expectation)\n",
+    "    expectation = suite.add_expectation(gxe.ExpectColumnValuesToBeBetween(column=\"passenger_count\", min_value=0, max_value=4))\n",
+    "    print(\"Expectation Suite created\")\n",
+    "except exceptions.DataContextError:\n",
+    "    suite = context.suites.get(SUITE_NAME)\n",
+    "    print(\"We've already added the suite\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Validating an Expectation Suite\n",
+    "\n",
+    "We can validate Expectation Suites against Batches the same way we validated Expectations against Batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "results = batch.validate(suite)\n",
+    "\n",
+    "print(results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Editing an Expectation Suite\n",
+    "\n",
+    "To edit an expectation suite, we find the expectation we want and edit it directly"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "suite = context.suites.get(SUITE_NAME)\n",
+    "expectation = next(e for e in suite.expectations if isinstance(e, gxe.ExpectColumnValuesToBeBetween))\n",
+    "expectation.max_value = 10\n",
+    "\n",
+    "results = batch.validate(suite)\n",
+    "print(results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Save the Expectation Suite\n",
+    "\n",
+    "Now that the suite is updated, we can save it with `suite.save()`, or we can save just the individual expectation with `expectation.save()`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "suite.save()\n",
+    "\n",
+    "# OR\n",
+    "\n",
+    "expectation.save()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,3 +6,4 @@

		**/.ipynb_checkpoints/
		**/__pycache__/
		**/gx/