Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case sensitive/insensitive table validation #3580

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

FastLee
Copy link
Contributor

@FastLee FastLee commented Jan 28, 2025

closes #3568

Added case sensitive flag for metadata comparison. To consider/ignore column name case.

@FastLee FastLee requested a review from a team as a code owner January 28, 2025 17:09
Copy link

github-actions bot commented Jan 28, 2025

✅ 3/3 passed, 2m45s total

Running from acceptance #8144

@FastLee FastLee force-pushed the fix/case-sensitive-table-validation branch from 18ac91b to bdff18c Compare January 28, 2025 17:14
@FastLee FastLee enabled auto-merge January 28, 2025 17:29
@FastLee FastLee requested a review from JCZuurmond January 28, 2025 17:29
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions to make it a bit cleaner.

How do users pass the flag to the metadata comparison?

src/databricks/labs/ucx/recon/base.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/recon/metadata_retriever.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/recon/metadata_retriever.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/recon/schema_comparator.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/recon/schema_comparator.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/recon/schema_comparator.py Outdated Show resolved Hide resolved
tests/unit/recon/test_schema_comparator.py Show resolved Hide resolved
@FastLee FastLee force-pushed the fix/case-sensitive-table-validation branch from c5a6d4f to 1ca97ab Compare January 30, 2025 20:12
@FastLee FastLee requested a review from JCZuurmond January 30, 2025 20:15
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, final nits

self._metadata_retriever = metadata_retriever
self._case_sensitive = case_sensitive

def _column_name_transformer(self) -> Callable[[str], str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More accurate would be to call it a factory here to signal it produces a callable

Suggested change
def _column_name_transformer(self) -> Callable[[str], str]:
def _column_name_transformer_factory(self) -> Callable[[str], str]:

Comment on lines +19 to +21
if self._case_sensitive:
return lambda _: _
return str.lower
Copy link
Member

@JCZuurmond JCZuurmond Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._case_sensitive:
return lambda _: _
return str.lower
return str if self._case_sensitive else str.lower

Comment on lines +37 to +42
source_metadata = self._metadata_retriever.get_metadata(
source, column_name_transformer=self._column_name_transformer()
)
target_metadata = self._metadata_retriever.get_metadata(
target, column_name_transformer=self._column_name_transformer()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not really matter, but my OCD likes to initialize the transformer once. Also, I prefer the full names, but would accept a shorter name within a small code block to increase readability with all statements being single lines

Suggested change
source_metadata = self._metadata_retriever.get_metadata(
source, column_name_transformer=self._column_name_transformer()
)
target_metadata = self._metadata_retriever.get_metadata(
target, column_name_transformer=self._column_name_transformer()
)
transformer = self._column_name_transformer()
source_metadata = self._metadata_retriever.get_metadata(source, column_name_transformer=transformer)
target_metadata = self._metadata_retriever.get_metadata(target, column_name_transformer=transformer)

@FastLee FastLee added this pull request to the merge queue Jan 31, 2025
@JCZuurmond JCZuurmond removed this pull request from the merge queue due to a manual request Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Column name comparison needs better handling for upper/lower case
2 participants