Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No efficient way to load a subset of files from partitioned table #9

Open
edmondop opened this issue Jan 21, 2025 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@edmondop
Copy link
Owner

Is your feature request related to a problem or challenge?

Is your feature request related to a problem or challenge?
As far as I can tell, there is no good way to load a subset of files from a partitioned table. Using ListingTable or another TableProvider like DeltaTableProvider from deltalake, I'm able to read_table, but this loads the entire table. I can also load a list of parquet files with read_parquet, but this doesn't work with partitioned tables if the partitions are not "materialized" columns in the raw parquet. The only way I've found to load partitioned files is by iterating over a list of file paths, and doing the entire TableProvider/read_table process on each one individually, and unioning the results together.

Describe the solution you'd like
It seems like it would be nice to be able to create a TableProvider with a table path, then pass some sort of file "whitelist" in. Maybe a read_table_files(TableProvider, impl IntoIterator<Item = String>).

Describe alternatives you've considered
As stated above, I've tried reading the files one-by-one and unioning results, but it's shockingly inefficient compared to reading all files at once.

Additional context
No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@edmondop edmondop added the enhancement New feature or request label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant