-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: store min, max, null count, and true count in column metadata #1164
Conversation
9f8df14
to
86af40b
Compare
fyi @lwwmanning I rebased your PR on develop. |
b54356c
to
6218d42
Compare
Alright, this does not break the current read/write tests (which might only be in Python doc tests?). We need to actually read and use the data to really test it though. |
@danking i will do a proper review in ~1h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly passing the conversion function seems like a hack. I think we would also like to specialize BoolMetadataAccumulator to hold true counts since they don't appear otherwise
@@ -258,6 +226,5 @@ mod tests { | |||
buffer[buffer_begin..buffer_end].len(), | |||
FOOTER_POSTSCRIPT_SIZE | |||
); | |||
assert_eq!(buffer[buffer_begin..buffer_end].len(), 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This asserts that we don't accidentally change FOOTER_POSTSCRIPT_SIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -83,74 +82,58 @@ impl<W: VortexWrite> LayoutWriter<W> { | |||
|
|||
async fn write_column_chunks<S>(&mut self, mut stream: S, column_idx: usize) -> VortexResult<()> | |||
where | |||
S: Stream<Item = VortexResult<Array>> + Unpin, | |||
S: Stream<Item = VortexResult<Array>> + Unpin + ArrayStream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess ArrayStream implies Stream? Can get rid of the earlier bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
impl<T> ExtremaAccumulator<T> { | ||
fn new(size_hint: usize, to_array: fn(Vec<Option<T>>) -> Array) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to pass this function here directly. All you want is impl Into<Array> for ExtremaAccumulator<...>
|
||
fn push_batch_byte_offsets(&mut self, batch_byte_offsets: Vec<u64>); | ||
|
||
fn into_layouts_and_metadata(self: Box<Self>) -> VortexResult<(VecDeque<Layout>, StructArray)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you split this method into separate trait you can avoid passing the conversion function and instead specialize as necessary.
pub trait IntoArrayMetadata {
fn into_array_metadata(self: Box<Self>) -> VortexResult<(VecDeque<Layout>, StructArray)>;
}
Erm. Ok. I might have gone rogue here so push back, if so. I implemented I also switched the writer.rs to use a new |
|
||
pub fn new_metadata_accumulator(dtype: &DType) -> Box<dyn MetadataAccumulator> { | ||
match dtype { | ||
DType::Null => Box::new(BasicAccumulator::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that we could probably also specialize all these to have/not-have null counts based on the nullability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can defer this one for later. I’m not sure the null ability matters for performance here
No description provided.