Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sum statistics and PhysicalExpr::column_statistics #13736

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gatesn
Copy link

@gatesn gatesn commented Dec 11, 2024

Which issue does this PR close?

Fixes #992

Rationale for this change

Some statistics can propagate through expressions, such as min, max and sum.

In this particular case, I was looking at the ClickBench Q29 and realized we had no way to report sum statistics to DataFusion (which would also help for avg).

Q29 looks like this btw:

SELECT SUM("ResolutionWidth"), SUM("ResolutionWidth" + 1), SUM("ResolutionWidth" + 2), SUM("ResolutionWidth" + 3), SUM("ResolutionWidth" + 4), ... SUM("ResolutionWidth" + 88), SUM("ResolutionWidth" + 89) FROM hits;

What changes are included in this PR?

PhysicalExpr has a new defaulted trait function column_statistics that takes a Statistics and returns statistics for the columnar result of the expression. (Unlike the linked issue which proposes returning a full Statistics object).

Further, this PR adds a sum statistic to demonstrate the value of propagation (that turns into Precision::Absent on overflow).

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate common Related to common crate proto Related to proto crate functions labels Dec 11, 2024
@gatesn
Copy link
Author

gatesn commented Dec 12, 2024

Is there any combined script to run all the linting checks at once? I don't want to burn all your CI credits!

@gatesn
Copy link
Author

gatesn commented Dec 13, 2024

Could I please grab another CI approval for this? I think I've run everything locally now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate functions logical-expr Logical plan and expressions physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expressions should also evaluate on statistics
1 participant