Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add physical plan properties to protobuf definition #13136

Closed

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

This is to support apache/datafusion-python#823 and to address this comment on #12920

Rationale for this change

This is a pure addition to the protobuf definition to allow for transferring PlanProperties in a serialized manner.

What changes are included in this PR?

  • Adds protobuf message definitions for the plan properties and its contained fields
  • Adds methods to convert between these protobuf messages and their datafusion internal counterparts

Are these changes tested?

Tested locally against my code for #12920

I did notice we don't really have unit tests for the other methods, so I didn't and any. I'm willing to do that, if desired.

Are there any user-facing changes?

No changes. Additional message definitions are available for use downstream, and particularly for the upcoming FFI work.

@github-actions github-actions bot added the proto Related to proto crate label Oct 27, 2024
@timsaucer timsaucer mentioned this pull request Oct 27, 2024
3 tasks
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @timsaucer -- I have had a chance to ponder this PR a bit more.

As I understand it, the reason we are thinking this code could help is that we could change the FFI bindings to return a serialized PlanProperties rather than a FFI struct.

However, given our current lackadaisical approach to protobuf compatibility https://docs.rs/datafusion-proto/latest/datafusion_proto/#version-compatibility I think we may run into trouble with this approach

Thus I think we should proceed with the FFI / API based approach rather than try to use serialized protobuf for the stable interface

That being said, I think this PR has potential value in its own right as a way to potentially improve performance (avoid recomputing the properties) though I am not sure is deserializing is faster than recomputing them 🤔

I did notice we don't really have unit tests for the other methods, so I didn't and any. I'm willing to do that, if desired.

I do think we should have some basic testing, otherwise there is a real danger this code will be broken accidentally in a future refactoring.

I think the existing tests are in https://github.com/apache/datafusion/blob/10af8a73662f4f6aac09a34157b7cf5fee034502/datafusion/proto/tests/cases/roundtrip_physical_plan.rs#L107-L106

However, the way most of the current serialization worked is that they re-compute the plan properties

For example
https://github.com/apache/datafusion/blob/10af8a73662f4f6aac09a34157b7cf5fee034502/datafusion/proto/src/physical_plan/mod.rs#L156-L155

Calls
https://github.com/apache/datafusion/blob/d3920f3060fc3745b8a50170dafb2beaa898adc2/datafusion/physical-plan/src/projection.rs#L95-L94

to recompute the properties

One thing we could potentially do is to improve the protobuf format to actually serialize the PlanProperties rather than re-compute them when re-creating the plan

That would also result in test coverage of the new code

@alamb
Copy link
Contributor

alamb commented Oct 31, 2024

So now that we merged #12920 do we still need this PR 🤔

@timsaucer
Copy link
Contributor Author

I don't mind closing it. I hadn't yet because I wanted to think more about the value you thought could be added by avoiding the call to recompute.

@alamb
Copy link
Contributor

alamb commented Nov 1, 2024

In general I would tend towards less code and more re-compute unless someone finds / reports the properties are very expensive to compute. Even if they do I would personally prefer to try and make it faster to re-compute the properties rather than serialize them as that would benefit planning for all queries (not just ones that were serialzed)

@timsaucer
Copy link
Contributor Author

Sounds good to me.

@timsaucer timsaucer closed this Nov 1, 2024
@timsaucer timsaucer deleted the feat/protobuf-planner-properties branch November 11, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants