-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add null_buffer
length check to StringArrayBuilder
/LargeStringArrayBuilder
#13758
base: main
Are you sure you want to change the base?
Conversation
Add a safety check to ensure that the alignment of buffers cannot be overflowed. This introduces a panic if they are not aligned through a runtime assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jdockerty
As the same issue applies to StringArrayBuilder
perhaps you can add the check to that structure as well?
datafusion/functions/src/strings.rs
Outdated
self.offsets_buffer.len() / size_of::<i64>() - 1, | ||
"Null buffer and offsets buffer must be the same length" | ||
); | ||
assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have lead you astray in my suggestion -- I don't think the value_buffer (that stores the string data) will be the same length and since it stores variable length data I don't think we will know what the lenght should be
Thus I think we should remove this second assert (as it will likely fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I'll remove this 2nd assertion now 💯
BTW you could test this via cargo test -p datafusion-functions and cargo test --test sqllogictests If both those pass (and clippy and fmt) I think you'll be good to go |
I will file a ticket for this one - filed #13759 |
These buffers can be misaligned and it is not problematic, it is the `null_buffer` which we care about being of the same length.
This is in a similar vein to `LargeStringArray`, as the code is the same, except for `i32`'s instead of `i64`.
LargeStringArray
null_buffer
length check to StringArray
/LargeStringArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jdockerty -- this looks great to me 🙏 (thanks for the first contribution!)
I have a code improvement suggestion, but we could do it as a follow on PR (or never) so I don't think it is required for merge
null_buffer
length check to StringArray
/LargeStringArray
null_buffer
length check to StringArrayBuilder
/LargeStringArrayBuilder
pub fn finish(self, null_buffer: Option<NullBuffer>) -> StringArray { | ||
let row_count = self.offsets_buffer.len() / size_of::<i32>() - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🍳 👌
Add a safety check to ensure that the null buffer and offsets buffers are the same length. This introduces a panic if they are not aligned through a
runtime assertion.
Which issue does this PR close?
I couldn't find an issue for this, it was raised internally for InfluxData's use ofdatafusion
. I can create one if necessary 😄Fixes #13759
Rationale for this change
What changes are included in this PR?
See description.
Are these changes tested?
I believe the existing test suite should cover this, although I'm happy to add a forced
#[should_panic]
, I'm not quite sure how much value this provides though.Are there any user-facing changes?
Users of the
LargeStringArrayBuilder
must be aware that method can now panic when the invariants are not upheld.