Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that predicate pushdown is working in the last cache #25578

Open
Tracked by #25091
hiltontj opened this issue Nov 22, 2024 · 0 comments
Open
Tracked by #25091

Validate that predicate pushdown is working in the last cache #25578

hiltontj opened this issue Nov 22, 2024 · 0 comments
Labels

Comments

@hiltontj
Copy link
Contributor

hiltontj commented Nov 22, 2024

Problem

The last cache TableProvider has code to handle predicate pushdown here, however, there are no tests to verify that it is actually working as intended.

So, in light of #25550 and #25550, we should have tests in place to verify that the predicate pushdown works.

Proposed solution

#25566 used a custom ExecutionPlan to verify the predicate pushdown is working, so we may consider following that example, since the code in the metadata cache is very similar to the last cache.

The idea is to capture the predicates evaluated so they can be displayed in the EXPLAIN output of a query that contains predicates destined for the cache.

This has the added benefit of validating that the pushdown is working, but also gives us the means to diagnose the same issue should we suspect pushdown is not working in the wild.

Alternatives

We could write lower level tests that validate the conversion function directly, but this would not have the added benefit mentioned above of also providing external visibility via EXPLAIN.

Additional context

Here is the custom implementor of the ExecutionPlan trait used by the metadata cache:

/// Custom implementor of the [`ExecutionPlan`] trait for use by the metadata cache
///
/// Wraps a [`MemoryExec`] from DataFusion, and mostly re-uses that. The special functionality
/// provided by this type is to track the predicates that are pushed down to the underlying cache
/// during query planning/execution.
///
/// # Example
///
/// For a query that does not provide any predicates, or one that does provide predicates, but they
/// do no get pushed down, the `EXPLAIN` for said query will contain a line for the `MetaCacheExec`
/// with no predicates, including what is emitted by the inner `MemoryExec`:
///
/// ```text
/// MetaCacheExec: inner=MemoryExec: partitions=1, partition_sizes=[1]
/// ```
///
/// For queries that do have predicates that get pushed down, the output will include them, e.g.:
///
/// ```text
/// MetaCacheExec: predicates=[[0 IN (us-east)], [1 IN (a,b)]] inner=MemoryExec: partitions=1, partition_sizes=[1]
/// ```
#[derive(Debug)]
struct MetaCacheExec {
inner: MemoryExec,
predicates: Option<IndexMap<ColumnId, Predicate>>,
}

Should the pushdown not be working, we may want to adopt a similar strategy to the metadata cache to converting filter Exprs using DataFusion's LiteralGuarantee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant