-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This library is inspired by the Great Expectations library and has made various expectations found in Great Expectations available when using the inbuilt python unittest assertions. For example if you wanted to use expect_column_values_to_be_between then you can access assertExpectColumnValuesToBeBetween.
The library has also added in further expectations, which may be similar or new.
A list of the available assertion can be found here. These assertions where a direct or similar mapping should be labelled and those assertions, which are not found in the GE library will also be noted.
The major difference with GE and this library, is that this library is intended to be very light-weight and work within a familiar testing framework. Therefore, this library is integrated with unittest and because this is core to python, it should be easier to maintain.
The code snippet below, shows the basic interaction with great-assertions. Instead of inheriting unittest.TestCase
, we exchange this for GreatAssertions
. This means that we still get access to all the is unittest
, it just now that we also have access to the great-assertions expectations.
from great_assertions import GreatAssertions
import pandas as pd
class GreatAssertionTests(GreatAssertions):
def test_expect_table_row_count_to_equal(self):
df = pd.DataFrame({"col_1": [100, 200, 300], "col_2": [10, 20, 30]})
self.expect_table_row_count_to_equal(df, 3)
In the example above, if the row-count fails, then would receive an error message e.g. expected row count is 4 the actual was 3 :
. An additional msg can be tacked on the end.
self.expect_table_row_count_to_equal(df, 3, "my bespoke message")
The full response would be expected row count is 4 the actual was 3 : my bespoke message
In practice we have found that using several expectations is a good way to provide coverage when verifying the quality of the data-source. For example if we had a data set:
col_1 | col_2 | col_3 |
---|---|---|
1 | Y | Hello |
2 | Y | Hello |
2 | N | World |
1 | N | World |
7 | Bye |
If we were looking at ranges for col_1
we could see that 1 and 2 are the most common value, however there is an outlier of 7. Therefore if we were to use the range expectation.
expect_column_values_to_te_between(df, min_value=1, max_value=7)
Although this would assert correctly, we might want to add some additional confirmation. The expectation expect_column_values_to_te_between
would provide a secondary check to make sure that the overall measure should be closer to 1 or 2.
expect_column_values_to_te_between(df, min_value=1, max_value=3)
If we wanted to test the value counts (pandas function) of a column, we can use the self.expect_column_value_counts_percent_to_be_between
assertion.
df = pd.DataFrame(
{
"col_1": ["Y", "Y", "N", "Y", "Y", "N", "N", "Y", "N", "Maybe"],
}
)
value_counts = {
"Y": {"min": 45, "max": 55},
"N": {"min": 35, "max": 45},
"Maybe": {"min": 5, "max": 15},
}
self.expect_column_value_counts_percent_to_be_between(df, "col_1", value_counts)
This allows a percentage range of the occurrences of a particular entry. In this example, we know that the majority, though slim is more 'Y'. However if we combined a assertExpectTableColumnsToMatchSet
assertion. This would check only Y/N/Maybe
are the only available values alongside the value counts, would check the overall grouping counts.
Therefore if only 1% of results were Maybe, then we would easily be able to check both set and percentage with these two assertions.