Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix table function execution without partitioning #21378

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

findepi
Copy link
Member

@findepi findepi commented Apr 3, 2024

Previously, when table function did not declare partitioning, it would be globally distributed, but on a worker node it would run single-threaded and first buffer all data in memory, like a one big WINDOW. After the change, the local execution processes input pages in a streaming fashion.

This commit also fixes property derivations for a case where table function is partitioned on empty list of symbols (global grouping).

Fixes #20398

@findepi findepi requested review from martint and kasiafi April 3, 2024 19:53
@cla-bot cla-bot bot added the cla-signed label Apr 3, 2024
@findepi
Copy link
Member Author

findepi commented Apr 3, 2024

cc @ebyhr @hovaesco

passThroughSpecifications,
processEmptyInput))
.flatMap(TableFunctionPartition::toOutputPages);
if (partitionChannels.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to to check if the List is empty as well? I tested locally on some custom table functions by cherry-picking this commit and ran into this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that empty list is legitimate, and requires global aggregation (one partition)

@findepi findepi force-pushed the findepi/exclude-columns-streaming branch from cb2452b to 2c28e2f Compare April 5, 2024 10:23
@findepi
Copy link
Member Author

findepi commented Apr 5, 2024

just rebased, no other changes

@findepi
Copy link
Member Author

findepi commented Apr 5, 2024

@martint @ebyhr PTAL

@findepi findepi requested a review from a team April 8, 2024 19:48
Previously, when table function did not declare partitioning, it would
be globally distributed, but on a worker node it would run
single-threaded and first buffer all data in memory, like a one big
WINDOW. After the change, the local execution processes input pages in a
streaming fashion.

This commit also fixes property derivations for a case where table
function is partitioned on empty list of symbols (global grouping).
@findepi findepi force-pushed the findepi/exclude-columns-streaming branch from 2c28e2f to 40ba52f Compare April 15, 2024 14:19
@findepi
Copy link
Member Author

findepi commented Apr 15, 2024

( rebased, no other changes )

@findepi
Copy link
Member Author

findepi commented Apr 15, 2024

draft -- this implementation has suboptimal lifecycle for TableFunctionDataProcessor
#21558 attempts to improve this

@tbaeg
Copy link
Member

tbaeg commented Dec 30, 2024

Any traction on this? We have been running a fork with these changes and would love for it to be in trino proper.

@mosabua
Copy link
Member

mosabua commented Jan 28, 2025

@kasiafi @findepi .. did you want to continue on this or close it?

@Praveen2112
Copy link
Member

@mosabua I think we can close - I would be opening a new PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Consuming large results with table functions causes "Query exceeded per-node memory limit" error
5 participants