-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: metadata cache provider and datafusion trait impls #25566
base: main
Are you sure you want to change the base?
Conversation
5d159c9
to
583f69e
Compare
/// Compare the configuration of a given cache, producing a helpful error message if they differ | ||
pub(crate) fn compare_config(&self, other: &Self) -> Result<(), anyhow::Error> { | ||
if self.max_cardinality != other.max_cardinality { | ||
bail!( | ||
"incompatible `max_cardinality`, expected {}, got {}", | ||
self.max_cardinality, | ||
other.max_cardinality | ||
); | ||
} | ||
if self.max_age != other.max_age { | ||
bail!( | ||
"incompatible `max_age`, expected {}, got {}", | ||
self.max_age.as_secs(), | ||
other.max_age.as_secs() | ||
) | ||
} | ||
if self.column_ids != other.column_ids { | ||
bail!( | ||
"incompatible column id selection, expected {}, got {}", | ||
self.column_ids | ||
.iter() | ||
.map(|id| id.to_string()) | ||
.collect::<Vec<_>>() | ||
.join(","), | ||
other | ||
.column_ids | ||
.iter() | ||
.map(|id| id.to_string()) | ||
.collect::<Vec<_>>() | ||
.join(","), | ||
) | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is newly added to the core cache code, as it is used on cache creation in the MetaCacheProvider
to give friendly errors if users attempt to overwrite an existing cache.
/// A predicate that can be applied when gathering [`RecordBatch`]es from a [`MetaCache`] | ||
/// | ||
/// This is intended to be derived from a set of filter expressions in Datafusion by analyzing | ||
/// them with a `LiteralGuarantee`. | ||
/// | ||
/// This uses a `BTreeSet` to store the values so that they are iterated over in sorted order. | ||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub(crate) enum Predicate { | ||
In(BTreeSet<Value>), | ||
NotIn(BTreeSet<Value>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type was changed from the previous enum
that also contained Eq
and NotEq
variants. Those were no longer needed with how predicates are pushed down from DataFusion (see the table_function.rs
module)
44d0ec2
to
9dc7cc8
Compare
This adds the MetaDataCacheProvider for managing metadata caches in the influxdb3 instance. This includes APIs to create caches through the WAL as well as from a catalog on initialization, to write data into the managed caches, and to query data out of them. The query side is fairly involved, relying on Datafusion's TableFunctionImpl and TableProvider traits to make querying the cache using a user-defined table function (UDTF) possible. The predicate code was modified to only support two kinds of predicates: IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion LiteralGuarantee which we leverage to derive the predicates from the incoming queries. A custom ExecutionPlan implementation was added specifically for the metadata cache that can report the predicates that are pushed down to the cache during query planning/execution. A big set of tests was added to to check that queries are working, and that predicates are being pushed down properly.
9dc7cc8
to
088df39
Compare
"| us-west | d |", | ||
"+---------+------+", | ||
], | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these test cases were removed because the Eq
and NotEq
variants were removed from the Predicate
type.
Closes #25543
Closes #25544
This adds the
MetaCacheProvider
for managing metadata caches in theinfluxdb3
instance. This includes APIs to create caches through the WAL as well as from a catalog on initialization, to write data into the managed caches, and to query data out of them.The query side is fairly involved, relying on Datafusion's
TableFunctionImpl
andTableProvider
traits to make querying the cache using a user-defined table function (UDTF) possible.The predicate code was modified to only support two kinds of predicates: IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion
LiteralGuarantee
which we leverage to derive the predicates from the incoming queries.MetaCacheExec
, a customExecutionPlan
implementation was added specifically for the metadata cache that can report the predicates that are pushed down to the cache during query planning/execution.A big set of tests was added to to check that queries are working, and that predicates are being pushed down properly.
Additional Notes
meta_cache
module of theinfluxdb3_cache
crate around so that the primarymod.rs
just contains tests, and then has the following additional modules:cache.rs
: (existing with some changes) core cache implementation - I left a couple comments on where the main changes were made to that codeprovider.rs
: (new) contains code for the newMetaCacheProvider
table_function.rs
: (new) contains code for DataFusion trait implementations