Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add null_buffer length check to StringArrayBuilder/LargeStringArrayBuilder #13758

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jdockerty
Copy link

@jdockerty jdockerty commented Dec 13, 2024

Add a safety check to ensure that the null buffer and offsets buffers are the same length. This introduces a panic if they are not aligned through a
runtime assertion.

Which issue does this PR close?

I couldn't find an issue for this, it was raised internally for InfluxData's use of datafusion. I can create one if necessary 😄
Fixes #13759

Rationale for this change

What changes are included in this PR?

See description.

Are these changes tested?

I believe the existing test suite should cover this, although I'm happy to add a forced #[should_panic], I'm not quite sure how much value this provides though.

Are there any user-facing changes?

Users of the LargeStringArrayBuilder must be aware that method can now panic when the invariants are not upheld.

Add a safety check to ensure that the alignment of buffers cannot be
overflowed. This introduces a panic if they are not aligned through a
runtime assertion.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdockerty

As the same issue applies to StringArrayBuilder perhaps you can add the check to that structure as well?

self.offsets_buffer.len() / size_of::<i64>() - 1,
"Null buffer and offsets buffer must be the same length"
);
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have lead you astray in my suggestion -- I don't think the value_buffer (that stores the string data) will be the same length and since it stores variable length data I don't think we will know what the lenght should be

Thus I think we should remove this second assert (as it will likely fail)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I'll remove this 2nd assertion now 💯

@alamb
Copy link
Contributor

alamb commented Dec 13, 2024

BTW you could test this via

cargo test -p datafusion-functions

and

cargo test --test sqllogictests

If both those pass (and clippy and fmt) I think you'll be good to go

@alamb
Copy link
Contributor

alamb commented Dec 13, 2024

I couldn't find an issue for this, it was raised internally for InfluxData's use of datafusion. I can create one if necessary 😄

I will file a ticket for this one - filed #13759

These buffers can be misaligned and it is not problematic, it is the
`null_buffer` which we care about being of the same length.
This is in a similar vein to `LargeStringArray`, as the code is the
same, except for `i32`'s instead of `i64`.
@jdockerty jdockerty changed the title fix: add buffer size sanity check to LargeStringArray fix: add null_buffer length check to StringArray/LargeStringArray Dec 13, 2024
@jdockerty jdockerty marked this pull request as ready for review December 13, 2024 15:11
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdockerty -- this looks great to me 🙏 (thanks for the first contribution!)

I have a code improvement suggestion, but we could do it as a follow on PR (or never) so I don't think it is required for merge

datafusion/functions/src/strings.rs Outdated Show resolved Hide resolved
@jdockerty jdockerty changed the title fix: add null_buffer length check to StringArray/LargeStringArray fix: add null_buffer length check to StringArrayBuilder/LargeStringArrayBuilder Dec 13, 2024
pub fn finish(self, null_buffer: Option<NullBuffer>) -> StringArray {
let row_count = self.offsets_buffer.len() / size_of::<i32>() - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨‍🍳 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringArrayBuilder and LargeStringArrayBuilder don't check null buffer length
2 participants