Add sum statistics and PhysicalExpr::column_statistics #13736
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Fixes #992
Rationale for this change
Some statistics can propagate through expressions, such as min, max and sum.
In this particular case, I was looking at the ClickBench Q29 and realized we had no way to report sum statistics to DataFusion (which would also help for avg).
Q29 looks like this btw:
What changes are included in this PR?
PhysicalExpr
has a new defaulted trait functioncolumn_statistics
that takes aStatistics
and returns statistics for the columnar result of the expression. (Unlike the linked issue which proposes returning a full Statistics object).Further, this PR adds a
sum
statistic to demonstrate the value of propagation (that turns into Precision::Absent on overflow).Are these changes tested?
Are there any user-facing changes?