-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for SessionState in supports_filters_pushdown for a Custom Data Source #11193
Comments
I think the usecase of passing some sort of state to https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html One way is to add a However, it seems like if we ever are going to break apart the core crate more, we'll have to figure out how to split out SessionContext. We have a related discussion here: #11182 Also related slack coversation in ASF slack: https://the-asf.slack.com/archives/C04RJ0C85UZ/p1719691754005389 |
Thanks @alamb , makes very good sense. We can separate the 2 concerns. The only issue I see is that |
🤔 I believe the core reason for having |
@alamb That's right, and therein lies the conundrum... We would need to solve that somehow, that's why I suggested the restructuring. If that were possible I would be submitting a PR to do this 😄 |
When does the |
@alamb Good point, I could thought of that but was not sure. I found this: If you like that idea I can experiment in my local clone of the project. |
If it is deprecated, I think making a PR to remove the deprecated functions would certainly be a good step forward as it would keep things simpler |
@alamb That would be great. I do wonder though, the |
I am not sure either -- we would have to do some more research (potentially try it, for example) to find out |
@alamb I will see if I can try it out in the next few days. |
@alamb I had some time and tried a very simple workaround, it would be up to you and the team if it is useful. I added a single arg to: fn supports_filters_pushdown(
&self,
task_id: &Option<String>,
filters: &[&Expr],
) -> Result<Vec<TableProviderFilterPushDown>> {
Ok(vec![
TableProviderFilterPushDown::Unsupported;
filters.len()
])
} This only touched 9 files, and given the This avoids However, even this was easy to do the issue is that likely remains is who calls I believe we would need to add If you or someone on the team can help me solve this part of it then this is a very simple change. At least I understand the issue now. Let me know what you think. |
After trying to modify |
I think #11516 looks promising as a way to potentially pass |
Now that #11516 is merged I think we can make progress on this PR again |
Is your feature request related to a problem or challenge?
We need the ability to get the
TaskContext.task_id
any place where a Custom Data Source is invoked. As it stands currently, thestate: &SessionState
is available inTableProvider.scan
andtask_ctx: Arc<TaskContext>
is available inExecutionPlan.execute
, but not in thesupports_filters_pushdown
. This prohibits per-query customization or tracking of external state in this method. For example if there are 3filters
for a custom table, and 10 are possible, we need to be able to choose the best one at runtime.Further, the
task_id
should always be available by passing theTaskContext
or fromSessionState
to keep things consistent.In trying to implement this it proved infeasible because
supports_filters_pushdown
is in 2 interfaces in 2 separate crates:TableProvider
(incore
) andTableSource
(inexpr
). It is not possible to addstate: &SessionState
to theTableSource
implementation as it cannot access thecore
crate, a cyclic dependency occurs the way it is now. This was intentional to makeLogicalPlan
separable, which makes sense, but preventing this type of enhancement.Describe the solution you'd like
Add
&SessionState
or minimallyTaskContext
in every pertinent method for per-query specific processing in a custom data source.A possible way to solve this is to make a new
datafusion-traits
crate, and to moveSessionState
and other common items todatafusion-common
, such that these components are used bycore
andexpr
. It will make some components available inexpr
that are not strictly necessary, but I think that is a good trade-off. This work could be combined with other efforts to breakcore
into more sub-crates, that would make DataFusion much more flexible overall.Describe alternatives you've considered
No response
Additional context
Restructuring crates in a project of this size will be a lot of work, but I believe the benefit will be there. There are other issues that also would benefit. I would recommended a separate restructure ticket that can be reviewed before any implementation is attempted. In addition then this would need to be implemented by multiple contributors, it will inevitably cause a lot of temporary breakage and retesting will also be required.
The text was updated successfully, but these errors were encountered: