-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONTRIB] Normalize schema and table names. #10763
base: develop
Are you sure you want to change the base?
Changes from 3 commits
dd16421
de0ac29
ab9b4c7
26ac112
e31ee9c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,10 @@ def _spark( | |
return spark_column_metadata | ||
|
||
|
||
def _get_normalized_table_name(table_name: str): | ||
return table_name.strip("[]").strip('"') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems MSSQL specific. I would rename this method to I am not familiar with MSSQL. Can square brackets or double quotes appear in table names if preceded by an escape character? That is, are there table names that will be broken with these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point. Square brackets are allowed in MSSQL but then need to be escaped when running queries against the table. A better solution would be to let GX add the square brackets to the db object names when needed. Currently, I have to add them when adding a data asset, e.g. asset = datasource.add_table_asset(name="ClientCodes", schema_name="CR", table_name="[247_CLIENT_CODES]"). |
||
|
||
|
||
def _get_sqlalchemy_column_metadata( | ||
execution_engine: SqlAlchemyExecutionEngine, batch_data: SqlAlchemyBatchData | ||
): | ||
|
@@ -106,7 +110,7 @@ def _get_sqlalchemy_column_metadata( | |
|
||
return get_sqlalchemy_column_metadata( | ||
execution_engine=execution_engine, | ||
table_selectable=table_selectable, # type: ignore[arg-type] | ||
table_selectable=_get_normalized_table_name(table_selectable), # type: ignore[arg-type] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my comment above about only calling this normalization for MSSQL execution engines. |
||
schema_name=schema_name, | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also call
to_lower_if_not_quoted
on theself.schema_name
used in the map lookup. Right now, this may fail for snowflake since I think their sqlalchemy connector changes names to uppercase.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add, but this only failed because to_lower_if_not_quoted was already applied to self.schema_name. Applying it again should be redundant.