Skip to content

Refactor BasePolarisTableOperations & BasePolarisViewOperations #1426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

eric-maynard
Copy link
Contributor

BasePolarisTableOperations and BasePolarisViewOperations currently extend BaseMetastoreTableOperations and BaseViewOperations respectively. In PRs like #433 and #1378 it has been suggested that instead they could extend more fundamental types like TableOperations instead. This PR implements that suggestion.

There are no semantic changes made to the ...Operations types in this PR, only a refactor that would allow us to customize their logic in the future.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, Polaris does not need to reuse Iceberg Catalog classes at all when dealing with files in storage (we could use FileIO directly), but current approach is fine too. Should be good to merge with a CODE_COPIED_TO_POLARIS mark.

}

@Override
public TableMetadata refresh() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a lot of code came from Iceberg, but this class probably deserves a CODE_COPIED_TO_POLARIS mark (in comments, example in PolarisRestCatalogIntegrationTest).

Copy link
Contributor Author

@eric-maynard eric-maynard Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should communicate that most of this was copied, good callout.

However, currently AFAIK CODE_COPIED_TO_POLARIS is reserved for blocks of code (files?) that are 100% copied with no modifications. There are some modifications here, and we are planning to make more over time. What do you think is the best way to represent the situation -- should we just put CODE_COPIED_TO_POLARIS and explain this in a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: to be clear there should be no functional changes, but there are minor changes (e.g. LOG -> LOGGER) needed to make things work within IcebergCatalog and I have done some light refactoring to improve readability and reduce repeated code. In the future though we may make functional changes. I've added a comment at least calling out that the code was originally more or less copied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for CODE_COPIED_TO_POLARIS is to automatically check acknowledgement in LICENSE / NOTICE files. So technically any reuse of substantial code blocks should be mentioned even if the code was modified.

@jbonofre : WDYT?

Since the check is by file name, one comment per file is sufficient. For that matter, maybe it is best to move the reused code to a top-level class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on getting @jbonofre's POV here

As for making this a top-level class (file) I would love to do so. However, these classes are tightly coupled with IcebergCatalog right now and rely on its private members and methods. The refactor gets significantly more complicated when done that way. So, I decided to try this smaller, more incremental, change which is what's actually needed to unblock the behavior proposed in #1378.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using inner classes if fine too from my POV

/**
* An implementation of {@link TableOperations} that integrates with {@link IcebergCatalog}. Much
* of this code was originally copied from {@link
* org.apache.iceberg.BaseMetastoreTableOperations}. COPIED_COPIED_TO_POLARIS
Copy link
Contributor

@dimas-b dimas-b Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected CI to fail now that we have the COPIED_COPIED_TO_POLARIS mark, but this file in not mentioned in LICENSE 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants