-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break datafusion-catalog code into its own crate #11182
Comments
I looked into this -- One of the major challenges is that datafusion/datafusion/core/src/execution/session_state.rs Lines 190 to 295 in 78055fe
One way to handle this would be to allow constructing SessionState with the minimal built in features, and have a function in That way it would still be easy to use DataFusion with a minimal SessionState but also easily register all the built in extensions 🤔 |
I made #11183 to start breaking apart the API and implementation -- there is still a ways to go |
@alamb I took a stab at moving parquet functionality into a datafusion-parquet crate (#11188) , and I ran into similar challenges you highlight here. I think to accomplish these goals |
I agree with this entirely The center of the knot is SessionState I think -- figuring out how to get that out of the core is likely key to breaking things up reasonably |
The more I think about this the more I think trying to make So like let session_state = SessionState::new();
// no table providers, etc
// install standard built in table providers
SessionContex::install_built_in(&mut session_state);
// now session_state has them here |
I filed #11320 to track this idea |
Is your feature request related to a problem or challenge?
As we start thinking of adding new catalog support in DataFusion for example, @goldmedal tried to do this for DynamicFileCatalog in #11035 , the current code structure requires that any such implementation be in
datafusion/core
which is already quite large and monolithicAlso, I think long term it would be a more sustainable structure if we can move ListingTable out of the core as well
However, this move is not likely possible until we move table provider and related catalog traits out of the core.
Also, if we could split up datafusion core more, it would be easier for people to use DataFusion as a smaller embedded library (aka not bring in file format support if they didn't need it)
Describe the solution you'd like
Ideally I would like the following traits / code moved into a
datafusion-catalog
crate:TableProvider
CatalogProvider
SchemaProvider
MemoryCatalogProvider
InformationSchemaProvider
Describe alternatives you've considered
No response
Additional context
No response
Part of #10782
The text was updated successfully, but these errors were encountered: