Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unquoted strings in unexpected_index_query values of gx.ValidationDefinition.run() result #10827

Open
vasilicommify opened this issue Jan 7, 2025 · 4 comments
Labels
feature-request feature request

Comments

@vasilicommify
Copy link

Describe the bug
"COMPLETE" result_format outputs unexpected_index_query
The value of the unexpected_index_query key cannot be used to extract rows with issue, even the part inside expr( )
One of the reasons is that string values are not quoted

    • there are strings in value_set, but in unexpected_index_query these strings are not quoted
    {
      "success": true,
      "expectation_config": {
        "type": "expect_column_values_to_be_in_set",
        "kwargs": {
          "batch_id": "my_spark_datasource-my_spark_dataframe",
          "column": "in_set_str",
          "value_set": [
            "Val1",
            "Val2"
          ]
        },
        "meta": {
          "note": "column_values - without distinct"
        }
      },
      "result": {
        "element_count": 100,
        "unexpected_count": 0,
        "unexpected_percent": 0.0,
        "partial_unexpected_list": [],
        "missing_count": 11,
        "missing_percent": 11.0,
        "unexpected_percent_total": 0.0,
        "unexpected_percent_nonmissing": 0.0,
        "partial_unexpected_counts": [],
        "unexpected_list": [],
        "unexpected_index_query": "df.filter(F.expr((in_set_str IS NOT NULL) AND (NOT (in_set_str IN (Val1, Val2)))))"
      },
      "meta": {},
      "exception_info": {
        "raised_exception": false,
        "exception_traceback": null,
        "exception_message": null
      }
    },
  1. regex expression is not quoted
"unexpected_index_query": "df.filter(F.expr((email IS NOT NULL) AND (NOT RLIKE(email, ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$))))"

To Reproduce

# expectation_suite = any expectation suite with with string fields validations

validation_definition = gx.ValidationDefinition(
    data=batch_definition, suite=expectation_suite, name=validation_definition_name
  )

validation_results = validation_definition.run(
                                      batch_parameters={"dataframe": data_frame_to_check}, 
                                      result_format = "COMPLETE")            
validation_results 

Expected behavior
string and regex expressions should be quoted in unexpected_index_query - either with single or double quates for spark

Environment (please complete the following information):

  • Great Expectations Version: 1.3.0
  • Data Source: Spark
  • Cloud environment: Databricks
@kujaska
Copy link

kujaska commented Jan 7, 2025

+100500

@0lgaZv
Copy link

0lgaZv commented Jan 7, 2025

Please

@adeola-ak adeola-ak moved this from To Do to In progress in GX Core Issues Board Jan 21, 2025
@adeola-ak adeola-ak added the feature-request feature request label Jan 21, 2025
@adeola-ak
Copy link
Contributor

thank you for reaching out about this, I could see how this would be helpful. I will share this with my team. Please be sure to check on this issue for updates

@umanggarg754
Copy link

Yes please correct the syntax of the unexpected index query for spark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request feature request
Projects
Status: In progress
Development

No branches or pull requests

5 participants