-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Update min_statistics
and max_statistics
to be helpers, update docs
#10866
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb
Thanks LGTM.
I'm currently thinking about the following, which might be a refactor to consider once we support Data Pages as well?
In order to have less code duplication, we could probably introduce an enum and match in a single get_statistic
function (or macro like we already have). WDYT?
enum StatisticPrefix {
Min,
Max,
// ... and the rest (e.g NullCount, RowCount)
}
enum StatisticType<I> {
RowGroup{iter: I, prefix: StatisticPrefix},
DataPage{iter: I, prefix: StatisticPrefix},
// ... do we have more to support?
}
// This would be a macro like we already have in place
fn get_statistics<I: Iterator>(data_type: &DataType, statistic_type: StatisticType<I>) -> Result<ArrayRef>{
match data_type {
DataType::Boolean => {
match statistic_type {
StatisticType::RowGroup{iter, prefix} => {
// create boolean array from iter
// using prefix to create e.g. `MinBooleanRowGroupStatsIterator::new(iter)`
},
// do the same for StatisticType::DataPage
_ => unimplemented!()
}
}
// support all the other data_types
_ => unimplemented!()
}
}
Then in the StatisticsConverter
we could...
// impl StatisticConverter
// ...
fn row_group_mins<I>(&self, metadatas: I) -> Result<ArrayRef>
where
I: IntoIterator<Item = &'a RowGroupMetaData>,
{
let data_type = self.arrow_field.data_type();
let Some(parquet_index) = self.parquet_index else {
return Ok(self.make_null_array(data_type, metadatas));
};
let iter = metadatas
.into_iter()
.map(|x| x.column(parquet_index).statistics());
let statistic_type = StatisticType::RowGroup{iter, prefix: StatisticPrefix::Min};
get_statistics(data_type, statistic_type);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Which issue does this PR close?
Part of #10453
Inspired by #10852
Rationale for this change
The new
StatisticsConverter
API is ready, so we can remove the oldmin_statistics
andmax_statistics
from "crate" level APIsWhat changes are included in this PR?
min_statistics
andmax_statistics
APIs module privateAre these changes tested?
Are there any user-facing changes?