-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize interleave string ~2-3x faster #2944
Specialize interleave string ~2-3x faster #2944
Conversation
With #2947 this will be generalizable to also cover byte arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you
(*end - *start).to_usize().unwrap(), | ||
); | ||
let slice = | ||
std::slice::from_raw_parts(self.value_data.as_ptr().add(start), end - start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a drive by cleanup right?
_ => interleave_fallback(values, indices) | ||
} | ||
} | ||
|
||
/// Common functionality for interleaving arrays | ||
struct Interleave<'a, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments might help here specifically what null_count
and nulls
represent and what the generic T
is used for
as_primitive_array::<T>(*x) | ||
}) | ||
.collect(); | ||
let interleaved = Interleave::<'_, PrimitiveArray<T>>::new(values, indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
arrow-select/src/interleave.rs
Outdated
offsets.append(O::from_usize(0).unwrap()); | ||
for (a, b) in indices { | ||
let o = interleaved.arrays[*a].value_offsets(); | ||
let len = o[*b + 1].as_usize() - o[*b].as_usize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let len = o[*b + 1].as_usize() - o[*b].as_usize(); | |
// element length | |
let len = o[*b + 1].as_usize() - o[*b].as_usize(); |
let mut capacity = 0; | ||
let mut offsets = BufferBuilder::<O>::new(indices.len() + 1); | ||
offsets.append(O::from_usize(0).unwrap()); | ||
for (a, b) in indices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is clever -- do the offsets in one pass and the strings in another
Benchmark runs are scheduled for baseline = d625f0a and contender = 66ea66b. 66ea66b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #2864
Rationale for this change
What changes are included in this PR?
Adds a specialized implementation for string arrays
Are there any user-facing changes?
No