-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better iterators for VarBin/VarBinView that don't always copy #221
Conversation
let mut offsets: Vec<K> = Vec::with_capacity(vec.len() + 1); | ||
let mut values: Vec<u8> = Vec::new(); | ||
offsets.push(K::zero()); | ||
let mut builder = VarBinBuilder::<K>::with_capacity(vec.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered summing the lengths of the strings and pre-allocating the bytes array in VarBinBuilder? (in the Vec case, at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmmmmmmmmmmm, we actually did the summing to fgure out K
|
||
pub fn finish(&self, dtype: &DType) -> StatsSet { | ||
StatsSet::from(HashMap::from([ | ||
(Stat::Min, varbin_scalar(self.min.to_vec(), dtype)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if empty, then we'll push e.g., &[0xFF]
to min, but we probably just shouldn't set it? or should set something that more explicitly indicates that it's empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, before we ignored nulls so I missed it when I added proper handling
self.views.push(BinaryView { | ||
_ref: Ref::new( | ||
vbytes.len() as u32, | ||
vbytes[0..4].try_into().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic number 4 should probably be a named constant in the BinaryView namespace, no?
self.nulls.append_null(); | ||
} | ||
|
||
pub fn finish(&mut self, dtype: DType) -> VarBinViewArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other builder, should this just take self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a comment by @gatesn before that we need to have our own validdity builder since arrow methods require &mut
No description provided.