You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Currently datafusion supports topk_aggregation(#7192 ), and I've noticed two potential areas for optimization:
topk_aggregation only supports ordering by aggregations (order by agg), but doesn't support ordering by columns
topk_aggregation cannot be used in cases where there is no order by clause
Describe the solution you'd like
For the first one, I think we can directly use the current priority queue solution
For the second one:
The following query which without an ORDER BY clause is a non-deterministic query where returning aggregate results for any 10 keys is valid. Therefore, the simplest optimization method is to make the aggregate operation work in an ordered manner.
SELECT"UserID", MIN("AdvEngineID") FROM hits GROUP BY"UserID"order byMIN("AdvEngineID") LIMIT10;
Is your feature request related to a problem or challenge?
Currently datafusion supports topk_aggregation(#7192 ), and I've noticed two potential areas for optimization:
topk_aggregation
only supports ordering by aggregations (order by agg), but doesn't support ordering by columnstopk_aggregation
cannot be used in cases where there is noorder by
clauseDescribe the solution you'd like
For the first one, I think we can directly use the current priority queue solution
For the second one:
The following query which without an
ORDER BY
clause is a non-deterministic query where returning aggregate results for any 10 keys is valid. Therefore, the simplest optimization method is to make the aggregate operation work in an ordered manner.When using ORDER BY, it runs faster
Describe alternatives you've considered
No response
Additional context
Just found that the
TopKAggregation
only support group by one column, I'm confused.datafusion/datafusion/physical-optimizer/src/topk_aggregation.rs
Line 57 in 0f5634e
The text was updated successfully, but these errors were encountered: