Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(substrait): add set operations to consumer, update substrait to 0.45.0 #12863

Merged
merged 3 commits into from
Oct 17, 2024

Conversation

tokoko
Copy link
Contributor

@tokoko tokoko commented Oct 11, 2024

Rationale for this change

Adds support for set operations to substrait consumer except for Multiset Minus (still unsure what that one does exactly...)

What changes are included in this PR?

  • updates substrait crate to latest and makes some necessary changes accordingly to handle recent changes in AggregateRel.
  • adds set op rules to consumer with tests.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@tokoko tokoko changed the title feat(substait): add set operations to consumer feat(substrait): add set operations to consumer Oct 11, 2024
@alamb alamb changed the title feat(substrait): add set operations to consumer feat(substrait): add set operations to consumer, update substrait to 0.44.0 Oct 15, 2024
@@ -764,6 +768,7 @@ pub fn operator_to_name(op: Operator) -> &'static str {
}
}

#[allow(deprecated)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this deprecated use required for changes in the substrait dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.. grouping_expressions field was deprecated in the proto definitions, but should still be used for some time not to break backwards-compatibility. Maybe there's some other way.. I just saw similar deprecated markers and followed suit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not a rush, and might be better to stick with the old approach for now for producers until consumers have had a chance to catch up.

The new approach puts the grouping_expressions in the AggregateRel (so in to_substrait_agg_measure) and not in the Grouping. Instead, the Grouping has a list of indices into the AggregateRel's grouping_expressions.

E.g. instead of...

AggregateRel = {
  "groupings": [
    { "grouping_expressions": [expr_1, epxr_2] }
  ]
}

You would have...

AggregateRel = {
  "grouping_expressions": [expr_1, expr_2],
  "groupings": [
    { "expression_references": [0, 1] }
   ]
}

This makes it easier to recognize something like a rollup:

AggregateRel = {
  "grouping_expressions": [expr_1, expr_2],
  "groupings": [
    { "expression_references": [0, 1] },
    { "expression_references": [0] },
    { "expression_references": [] }
   ]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.. grouping_expressions field was deprecated in the proto definitions, but should still be used for some time not to break backwards-compatibility.

+1 for retaining both for backwards compatability

I've created #12957 to track that we deferred implementing this.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @tokoko -- thank you 🙏 Well commented and tested

cc @Blizzara @vbarua and @westonpace

Copy link
Contributor

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks reasonable to me. Left some minor comments.

datafusion/substrait/src/logical_plan/consumer.rs Outdated Show resolved Hide resolved
@@ -764,6 +768,7 @@ pub fn operator_to_name(op: Operator) -> &'static str {
}
}

#[allow(deprecated)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.. grouping_expressions field was deprecated in the proto definitions, but should still be used for some time not to break backwards-compatibility.

+1 for retaining both for backwards compatability

I've created #12957 to track that we deferred implementing this.

@tokoko tokoko changed the title feat(substrait): add set operations to consumer, update substrait to 0.44.0 feat(substrait): add set operations to consumer, update substrait to 0.45.0 Oct 17, 2024
@tokoko
Copy link
Contributor Author

tokoko commented Oct 17, 2024

@alamb upgraded to 0.45 here while we're at it and opened #12984 to track the changes included.

@alamb
Copy link
Contributor

alamb commented Oct 17, 2024

Awesome -- thank you @tokoko

@alamb alamb merged commit e63abe7 into apache:main Oct 17, 2024
27 checks passed
@tokoko tokoko deleted the set-ops branch October 17, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants