Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) #15037

adriangb · 2025-03-05T22:15:40Z

Is your feature request related to a problem or challenge?

From discussion with @alamb yesterday the idea came up of optimizing queries like select * from data order by timestamp desc limit 10 for the case where the data is not perfectly sorted by timestamp but mostly follows a sorted pattern.

You can imagine this data gets created if multiple sources with clock skews, network delays, etc. are writing data and you don't do anything fancy to guarantee perfect sorting by timestamp (i.e. you naively write out the data to Parquet, maybe do some compaction, etc.). The point is that 99% of yesterday's files have a timestamp smaller than 99% of today's files but there may be a couple seconds of overlap between files. To be concrete, let's say this is our data:

file	min	max
1	1	10
2	9	19
3	20	31
4	30	35

Currently DataFusion will exhaustively open each file, read the timestamp column and feed it into a TopK.
I think we can do a lot better if we:

Use file stats to decide which files to work on first. In this case it makes sense to start with file 4 and 3 (assuming we have parallelism of 2).
Let's say that between those two we have 10 rows, so we've already filled up our TopK. The only way more things would get added to our TopK is if they are greater than the smallest item already seen (let's say that's 20, the smallest value in file 3).
Now we know just from statistics that we can skip files 2 and 1 because neither of them can have any timestamp > 20.

Extrapolating this to scenarios where you have years worth / TBs of data and want a limit 5 would yield orders of magnitude improvement I think.

@alamb mentioned this sounds similar to Dynamic Filters, I assume this must be a known technique (or my analysis may be completely wrong 😆 ) but I don't know what it would be called.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

alamb · 2025-03-05T22:21:39Z

@alamb mentioned this sounds similar to Dynamic Filters, I assume this must be a known technique (or my analysis may be completely wrong 😆 ) but I don't know what it would be called.

There was a talk at CIDR this year that mentioned this:

Sponsor Talk 3: The Fine Art of Work Skipping
Stefan Mandl, Snowflake

It seems they wrote a blog about it too here: https://www.snowflake.com/en/engineering-blog/optimizing-top-k-aggregation-snowflake/

adriangb · 2025-03-05T22:30:17Z

Nice to know I'm not totally off on the idea 😄

alamb · 2025-03-05T22:31:59Z

Nice to know I'm not totally off on the idea 😄

Not at all!

alamb · 2025-03-12T10:46:30Z

BTW I am pretty sure DuckDB is using this technique and why they are so much faster on ClickBench Q23:

Make ClickBench Q23 Go Faster #15177

adriangb · 2025-03-17T11:41:00Z

Does anyone have a handle on how we might implement this? I was thinking we’d need to add a method to exec operators called apply_filter but that basically sends down the additional filter and by default it gets forwarded to children until it hits an exec that knows what to do with it (eg DataSourceExec). But I’m not very clear beyond that.

alamb · 2025-03-18T20:06:08Z

Does anyone have a handle on how we might implement this? I was thinking we’d need to add a method to exec operators called apply_filter but that basically sends down the additional filter and by default it gets forwarded to children until it hits an exec that knows what to do with it (eg DataSourceExec). But I’m not very clear beyond that.

To begin with I would suggest:

Make a new PhysicalExpr named something like TopKRuntimeFilter
Add a physical optimizer pass that runs after all other passes (so the structure doesn't change) that finds TopK nodes and tries to find connected Scans the (start with some basic rules, don't try and go past joins, etc)
Add TopKRuntimeFilter to those scans

Then the trick will be to figure out how to share the TopKHeap created in the TopK operator

datafusion/datafusion/physical-plan/src/topk/mod.rs

Line 259 in 8c8b245

struct TopKHeap {

With the TopKRuntimeFilter

And then orchestrate concurrent access to it somehow

Closes apache#15037

adriangb · 2025-03-20T15:13:26Z

@alamb I implemented something like that in #15301

alamb · 2025-03-20T15:52:14Z

Thanks @adriangb -- I will try and review it asap (hopefully tomorrow afternoon or tomorrow)

Closes apache#15037

adriangb · 2025-03-25T19:58:16Z

We already have Statistics on PartitionedFile so we could potentially use Dynamic filters to prune based on those before opening the file

Closes apache#15037

alamb · 2025-03-28T19:58:40Z

@adriangb and I had a discussion about #15301

here are some notes:

Usecases:

TopK dynamic filter pushdown
- Prune files with dynamic filter based on statistics
- Prune row groups with dynamic filter based on statistics
- Prune row pages with dynamic filter based on statistics
- Apply during filtering when pushdown enabled
Join SIPs

Pros / Cons

The pros for merging this PR are:

We already have benchmarks that show some performance improvement
The cons:
It requires special implementation for any operators (like FileOpenenr) to take advantage of such filters. THis is not a blocker in my mind – but I do think implementing a PhysicalExpr is a cleaner design. As Adrian says, we can refactor it over time if/when PhysicalExpr gets more sophisticated
We will get even more performance when filter_pushdown is enabled (again maybe this is just follow on work)

Nice to haves

For a plan with multiple partitions (e.g. for 16 input partitions, we end up with 17 top heaps – one for each partition and then a global one), but this PR can only apply the per-partition top k value.
It would be nice to somehow be able to use all the top values (aka pick the smallest one) when filtering.
This PR takes a snapshot of the contents of the TopK heap when a file is opened and never changes it.
- This is good for pruning as all the pruning (file, row group and page) happens on file opening
- It is not as good for filter_pushdown when the values in the topK heap can change over the course of the query so using the snapshot means the dynamic filter doesn’t improve over time

I believe adrian is going to look into these – but I also think they could easily be done as a follow on PR

adriangb · 2025-03-29T12:45:21Z

wrt waiting for filter pushdown to be enabled by default, I think we're just making our lives harder by coupling them, especially since we can already test them together under a feature flag.

I also would like to leverage this work to justify a lot of other optimizations:

piping through file level stats and using those to prune with dynamic filters -> prune without even reading parquet metadata
roughly ordering files by the sort order based on stats -> this optimization becomes even more efficient
resolve merge conflicts and take next steps with per-file filters

This is unfortunately a blocker for all of that

alamb · 2025-03-31T14:51:10Z

I plan to spend a non trivial amount of time working on this with @adriangb this week

adriangb added the enhancement New feature or request label Mar 5, 2025

alamb mentioned this issue Mar 12, 2025

Make ClickBench Q23 Go Faster #15177

Open

adriangb added a commit to pydantic/datafusion that referenced this issue Mar 19, 2025

First draft of dynamic filters for TopK

661635e

Closes apache#15037

adriangb linked a pull request Mar 19, 2025 that will close this issue

Add dynamic pruning filters from TopK state #15301

Open

adriangb added a commit to pydantic/datafusion that referenced this issue Mar 20, 2025

First draft of dynamic filters for TopK

41ecc69

Closes apache#15037

adriangb added a commit to pydantic/datafusion that referenced this issue Mar 26, 2025

First draft of dynamic filters for TopK

102f6ba

Closes apache#15037

adriangb added a commit to pydantic/datafusion that referenced this issue Mar 27, 2025

First draft of dynamic filters for TopK

74df581

Closes apache#15037

adriangb added a commit to pydantic/datafusion that referenced this issue Mar 27, 2025

First draft of dynamic filters for TopK

57930cf

Closes apache#15037

alamb assigned alamb and adriangb Mar 31, 2025

alamb changed the title ~~Dynamic pruning filters from TopK state~~ Dynamic pruning filters from TopK state (optimize ORDER BY LIMIT queries) Mar 31, 2025

This was referenced Mar 31, 2025

[Epic] A collection of dynamic filtering related items #15512

Open

Weekly Plan (Andrew Lamb) March 31, 2025 #15528

Open

Extend TopK early termination to partially sorted inputs #15529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) #15037

Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) #15037

adriangb commented Mar 5, 2025

alamb commented Mar 5, 2025

adriangb commented Mar 5, 2025

alamb commented Mar 5, 2025

alamb commented Mar 12, 2025

adriangb commented Mar 17, 2025

alamb commented Mar 18, 2025

adriangb commented Mar 20, 2025

alamb commented Mar 20, 2025

adriangb commented Mar 25, 2025

alamb commented Mar 28, 2025

adriangb commented Mar 29, 2025

alamb commented Mar 31, 2025

Dynamic pruning filters from TopK state (optimize ORDER BY LIMIT queries) #15037

Dynamic pruning filters from TopK state (optimize ORDER BY LIMIT queries) #15037

Comments

adriangb commented Mar 5, 2025

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alamb commented Mar 5, 2025

adriangb commented Mar 5, 2025

alamb commented Mar 5, 2025

alamb commented Mar 12, 2025

adriangb commented Mar 17, 2025

alamb commented Mar 18, 2025

adriangb commented Mar 20, 2025

alamb commented Mar 20, 2025

adriangb commented Mar 25, 2025

alamb commented Mar 28, 2025

Usecases:

Pros / Cons

Nice to haves

adriangb commented Mar 29, 2025

alamb commented Mar 31, 2025

Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) #15037

Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) #15037