Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONTRIB] Normalize schema and table names. #10763

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion great_expectations/datasource/fluent/sql_datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -995,7 +995,9 @@ def test_connection(self) -> None:
engine: sqlalchemy.Engine = datasource.get_engine()
inspector: sqlalchemy.Inspector = sa.inspect(engine)

if self.schema_name and self.schema_name not in inspector.get_schema_names():
if self.schema_name and self.schema_name not in map(
to_lower_if_not_quoted, inspector.get_schema_names()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also call to_lower_if_not_quoted on the self.schema_name used in the map lookup. Right now, this may fail for snowflake since I think their sqlalchemy connector changes names to uppercase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add, but this only failed because to_lower_if_not_quoted was already applied to self.schema_name. Applying it again should be redundant.

):
raise TestConnectionError( # noqa: TRY003
f'Attempt to connect to table: "{self.qualified_name}" failed because the schema '
f'"{self.schema_name}" does not exist.'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ def _spark(
return spark_column_metadata


def _get_normalized_table_name(table_name: str):
return table_name.strip("[]").strip('"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems MSSQL specific. I would rename this method to _get_mssql_normalized_table_name or something similar and check the dialect before calling via execution_engine.dialect_name == GXSqlDialect.MSSQL.

I am not familiar with MSSQL. Can square brackets or double quotes appear in table names if preceded by an escape character? That is, are there table names that will be broken with these strip calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Square brackets are allowed in MSSQL but then need to be escaped when running queries against the table. A better solution would be to let GX add the square brackets to the db object names when needed. Currently, I have to add them when adding a data asset, e.g. asset = datasource.add_table_asset(name="ClientCodes", schema_name="CR", table_name="[247_CLIENT_CODES]").



def _get_sqlalchemy_column_metadata(
execution_engine: SqlAlchemyExecutionEngine, batch_data: SqlAlchemyBatchData
):
Expand All @@ -106,7 +110,7 @@ def _get_sqlalchemy_column_metadata(

return get_sqlalchemy_column_metadata(
execution_engine=execution_engine,
table_selectable=table_selectable, # type: ignore[arg-type]
table_selectable=_get_normalized_table_name(table_selectable), # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above about only calling this normalization for MSSQL execution engines.

schema_name=schema_name,
)

Expand Down
Loading