Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add capability to read unity catalog (uc://) uris #3113

Merged
merged 9 commits into from
Feb 18, 2025

Conversation

omkar-foss
Copy link
Contributor

Description

This adds capability to read directly from uc:// uris using the local catalog-unity crate. This also exposes the UC temporary credentials in storage_options of the DeltaTable instance so polars or similar readers can use it.

Related Issue(s)

Documentation

N/A

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Jan 10, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@omkar-foss omkar-foss force-pushed the feat-uc-python branch 2 times, most recently from 8fcd7f4 to 0824abe Compare January 10, 2025 07:03
@omkar-foss omkar-foss changed the title feat(python): Add capability to read Unity Catalog (uc://) uris feat(python): add capability to read unity catalog (uc://) uris Jan 10, 2025
/// Allow http url (e.g. http://localhost:8080/api/2.1/...)
/// Supported keys:
/// - `unity_allow_http_url`
AllowHttpUrl,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows users to work with a local (non-https) Unity Catalog REST API with delta-rs.

@omkar-foss
Copy link
Contributor Author

@ion-elgreco I've updated this PR to include the temp credentials functionality from PR #3078. Cheers.

@omkar-foss omkar-foss marked this pull request as ready for review February 3, 2025 14:30
Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 12.90323% with 108 lines in your changes missing coverage. Please review.

Project coverage is 72.16%. Comparing base (2f1a5ac) to head (0e6c77f).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
crates/catalog-unity/src/lib.rs 10.83% 106 Missing and 1 partial ⚠️
python/src/lib.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3113      +/-   ##
==========================================
- Coverage   72.31%   72.16%   -0.16%     
==========================================
  Files         138      138              
  Lines       45398    45518     +120     
  Branches    45398    45518     +120     
==========================================
+ Hits        32831    32846      +15     
- Misses      10489    10591     +102     
- Partials     2078     2081       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Feb 11, 2025

@omkar-foss looks good from my side!

@hntd187 can you please also take a look. Specifically on whether the credentials need to be refreshed as part of the objectstore? Does UC also support writing temp credentials?

@omkar-foss
Copy link
Contributor Author

omkar-foss commented Feb 11, 2025

I'll need to add a few python tests for this flow, will add tomorrow and then if all good, I suppose we can merge it.

Also please note, the current UC REST integration that we have is more fine-tuned for Databricks Unity Catalog. The Unity Catalog OSS has slightly different response payloads for some APIs. e.g. Generate Temp Credentials API returns r2_temp_credentials and url in the JSON response for Databricks UC but not for OSS UC. So this causes json decoding errors if you try out the current UC client implementation with an OSS UC instance.

I can look into supporting OSS UC smoothly in a separate PR after this one. Cheers.

@omkar-foss omkar-foss closed this Feb 11, 2025
@omkar-foss omkar-foss reopened this Feb 11, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 16, 2025
@omkar-foss
Copy link
Contributor Author

omkar-foss commented Feb 16, 2025

@omkar-foss can you add an integration page to the docs? :)

Yes sure, is it okay if I add it in the next PR?

@ion-elgreco On a second thought, I've added basic UC docs to this PR itself. It looks as follows:

Screenshot at 2025-02-16 16-30-56

In the next PR, I'll also add some code examples and all, along with the UC OSS payload support.

Looks good, would be slightly easier to find under integrations section but can be done in follow up

ion-elgreco
ion-elgreco previously approved these changes Feb 16, 2025
@ion-elgreco
Copy link
Collaborator

@omkar-foss the UC integration test is failing

@omkar-foss
Copy link
Contributor Author

@omkar-foss the UC integration test is failing

Yes I checked, it's passing on my local. Seems like it's an intermittent failure. Would it be possible for you to retrigger that particular failing job, let's see if it passes?

@ion-elgreco
Copy link
Collaborator

@omkar-foss can you try to make it not flaky?

@omkar-foss
Copy link
Contributor Author

omkar-foss commented Feb 16, 2025

@omkar-foss can you try to make it not flaky?

Seems like it gets stuck intermittently due to a deadlock, checking on it now.

hntd187
hntd187 previously approved these changes Feb 16, 2025
Copy link
Collaborator

@hntd187 hntd187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I appreciate you doing all the doc work with this as well, everything looks great to me.

@omkar-foss omkar-foss dismissed stale reviews from hntd187 and ion-elgreco via 9ac034f February 17, 2025 12:06
@omkar-foss
Copy link
Contributor Author

@omkar-foss can you try to make it not flaky?

Seems like it gets stuck intermittently due to a deadlock, checking on it now.

I've added a dedicated function to execute futures pertaining to UC, based on the function we have for AWS S3 storage. This is helping to prevent the deadlocks by using an appropriate block_on(). A deadlock can still occur, but very low chances (less than 2% probability as per my sampling of 100 runs, each 5 seconds apart).

@ion-elgreco @hntd187 guys, please check it out and let me know if this makes sense. Thank you :)

@ion-elgreco ion-elgreco added this pull request to the merge queue Feb 18, 2025
Merged via the queue into delta-io:main with commit 0473d7f Feb 18, 2025
28 checks passed
@omkar-foss omkar-foss deleted the feat-uc-python branch February 20, 2025 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Python-side support for Unity Catalog
4 participants