Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: metadata cache provider and datafusion trait impls #25566

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hiltontj
Copy link
Contributor

@hiltontj hiltontj commented Nov 18, 2024

Closes #25543
Closes #25544

This adds the MetaCacheProvider for managing metadata caches in the influxdb3 instance. This includes APIs to create caches through the WAL as well as from a catalog on initialization, to write data into the managed caches, and to query data out of them.

The query side is fairly involved, relying on Datafusion's TableFunctionImpl and TableProvider traits to make querying the cache using a user-defined table function (UDTF) possible.

The predicate code was modified to only support two kinds of predicates: IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion LiteralGuarantee which we leverage to derive the predicates from the incoming queries.

MetaCacheExec, a custom ExecutionPlan implementation was added specifically for the metadata cache that can report the predicates that are pushed down to the cache during query planning/execution.

A big set of tests was added to to check that queries are working, and that predicates are being pushed down properly.

Additional Notes

  • This PR moved the code in the meta_cache module of the influxdb3_cache crate around so that the primary mod.rs just contains tests, and then has the following additional modules:
    • cache.rs: (existing with some changes) core cache implementation - I left a couple comments on where the main changes were made to that code
    • provider.rs: (new) contains code for the new MetaCacheProvider
    • table_function.rs: (new) contains code for DataFusion trait implementations
  • The feature still needs to be accessible through the API, so that caches can be created, deleted, and viewed, but that will be done in follow-on issues as part of Epic: Metadata Cache #25539.

@hiltontj hiltontj force-pushed the hiltontj/meta-cache-write-path branch 4 times, most recently from 5d159c9 to 583f69e Compare November 21, 2024 20:02
Comment on lines +260 to +294
/// Compare the configuration of a given cache, producing a helpful error message if they differ
pub(crate) fn compare_config(&self, other: &Self) -> Result<(), anyhow::Error> {
if self.max_cardinality != other.max_cardinality {
bail!(
"incompatible `max_cardinality`, expected {}, got {}",
self.max_cardinality,
other.max_cardinality
);
}
if self.max_age != other.max_age {
bail!(
"incompatible `max_age`, expected {}, got {}",
self.max_age.as_secs(),
other.max_age.as_secs()
)
}
if self.column_ids != other.column_ids {
bail!(
"incompatible column id selection, expected {}, got {}",
self.column_ids
.iter()
.map(|id| id.to_string())
.collect::<Vec<_>>()
.join(","),
other
.column_ids
.iter()
.map(|id| id.to_string())
.collect::<Vec<_>>()
.join(","),
)
}

Ok(())
}
Copy link
Contributor Author

@hiltontj hiltontj Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is newly added to the core cache code, as it is used on cache creation in the MetaCacheProvider to give friendly errors if users attempt to overwrite an existing cache.

Comment on lines +468 to +478
/// A predicate that can be applied when gathering [`RecordBatch`]es from a [`MetaCache`]
///
/// This is intended to be derived from a set of filter expressions in Datafusion by analyzing
/// them with a `LiteralGuarantee`.
///
/// This uses a `BTreeSet` to store the values so that they are iterated over in sorted order.
#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) enum Predicate {
In(BTreeSet<Value>),
NotIn(BTreeSet<Value>),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type was changed from the previous enum that also contained Eq and NotEq variants. Those were no longer needed with how predicates are pushed down from DataFusion (see the table_function.rs module)

@hiltontj hiltontj force-pushed the hiltontj/meta-cache-write-path branch 2 times, most recently from 44d0ec2 to 9dc7cc8 Compare November 21, 2024 20:35
@hiltontj hiltontj requested review from pauldix, mgattozzi and praveen-influx and removed request for pauldix and mgattozzi November 21, 2024 20:35
@hiltontj hiltontj self-assigned this Nov 21, 2024
@hiltontj hiltontj added the v3 label Nov 21, 2024
@hiltontj hiltontj marked this pull request as ready for review November 21, 2024 20:35
This adds the MetaDataCacheProvider for managing metadata caches in the
influxdb3 instance. This includes APIs to create caches through the WAL
as well as from a catalog on initialization, to write data into the
managed caches, and to query data out of them.

The query side is fairly involved, relying on Datafusion's TableFunctionImpl
and TableProvider traits to make querying the cache using a user-defined
table function (UDTF) possible.

The predicate code was modified to only support two kinds of predicates:
IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion
LiteralGuarantee which we leverage to derive the predicates from the
incoming queries.

A custom ExecutionPlan implementation was added specifically for the
metadata cache that can report the predicates that are pushed down to
the cache during query planning/execution.

A big set of tests was added to to check that queries are working, and
that predicates are being pushed down properly.
@hiltontj hiltontj force-pushed the hiltontj/meta-cache-write-path branch from 9dc7cc8 to 088df39 Compare November 21, 2024 20:44
"| us-west | d |",
"+---------+------+",
],
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these test cases were removed because the Eq and NotEq variants were removed from the Predicate type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant