Aggregation Function Dispatch #2869

Yuhta · 2022-10-18T15:31:30Z

Yuhta
Oct 18, 2022
Collaborator

Problem

Currently in intermediate aggregation node, the function only receives the input type (intermediate type) and output type (intermediate type). Some functions need to merge the accumulators differently depending on the raw input type, but that information is not available to the worker node. This information is usually available in the planner though.

Solutions

1. Overwrite Intermediate Type

There are several ways to solve or work around this problem. The first is to encode the raw input type inside the intermediate type. One example is approx_percentile (#2621), where we change the intermediate type from varbinary to row, which contains the raw input type. This workaround works well, but requires overwriting the intermediate type in the planner, and this needs to be done for each function individually. In case of Presto, we need to change the coordinator code to do that (prestodb/presto#18386).

2. Get Raw Input Types from Planner

The second way of solving this is to get raw input types from planner and pass it down to the function factory. Author of the functions will receive an extra parameter of raw input types, and can use it to decide the implementation when input type is not raw. This is a more generic solution than changing the intermediate type, but requires some extra wiring. For example in Presto, you might need to get the resolved raw input types from the coordinator, pass it to presto_cpp through protocol, then pass it to Velox aggregation node. Also not every engine needs to support this: if an engine supports only single aggregation (i.e. no intermediate type), the input and output of the aggregation node is exactly the same as raw input and result, there is no need to pass the extra raw input type to the function factory.

3. Dispatch Base on Full Signature

A third way is to get the full signature from planner, and change the function dispatcher to resolve based on the full signature, instead of node input types only. To achieve this we need to change the registry to mapping from name to several groups of signatures, each group having its own function implementaion. Then during dispatch, we get a fully resolved function signature from planner, going through each group, find the first group that the function signature is conformed to, and call the corresponding implementation. This is a large change, requiring a lot wiring between planner and engine, rewriting the aggregation function dispatch code, and an implementation of type checking between function signatures in Velox. Functionally it's not a lot better than solution 2, except we can maybe dispatch base on result type in this case.

Our Choice

We are already doing solution 1 for some of the functions, and we can continue doing this if needed. Between solution 2 and 3, we must make a choice, and based on the implementation complexity and the value they bring, we think solution 2 should be enough for all our use cases while not introducing too much complexity.

Any new idea or discussion is welcome.

With the new AggregateCompanionAdapter (#4566), we will decide to go with option 1, since only this way we can make final result type resolvable from intermediate type, which is a requirement for the companion functions. For backward compatibility, we can make the intermediate type as ROW(VARBINARY, T1, T2, ...)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation Function Dispatch #2869

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Aggregation Function Dispatch #2869

Yuhta Oct 18, 2022 Collaborator

Problem

Solutions

1. Overwrite Intermediate Type

2. Get Raw Input Types from Planner

3. Dispatch Base on Full Signature

Our Choice

Related

Replies: 3 comments

Yuhta Oct 18, 2022 Collaborator Author

mbasmanova Mar 24, 2023 Collaborator

Yuhta Apr 13, 2023 Collaborator Author

Yuhta
Oct 18, 2022
Collaborator

Yuhta
Oct 18, 2022
Collaborator Author

mbasmanova
Mar 24, 2023
Collaborator

Yuhta
Apr 13, 2023
Collaborator Author