-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DictionaryArray::occupancy #4415
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like nice addition to me -- thank you @tustvold
let len = self.values.len(); | ||
let mut builder = BooleanBufferBuilder::new(len); | ||
builder.resize(len); | ||
let slice = builder.as_slice_mut(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: calling this target
or dest
or dst
I think would be a more descriptive name for what it is rather than slice
let mut builder = BooleanBufferBuilder::new(len); | ||
builder.resize(len); | ||
let slice = builder.as_slice_mut(); | ||
match self.keys.nulls().filter(|n| n.null_count() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fancy way to check for nulls -- TIL Option::filter
👍
Some(n) => { | ||
let v = self.keys.values(); | ||
n.valid_indices() | ||
.for_each(|idx| set_bit(slice, v[idx].as_usize())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any value in calling valid_slices
here instead? I assume it would be a tradeoff for sparse dictionaries and mostly used dictionaries.
As an aside, I really like how the new ScalarBuffer / NullBuffer structures are so easy to work with ❤️ |
is this PR waiting on anything else? Or shall we merge it? |
Which issue does this PR close?
Part of #4414
Relates to #506
Rationale for this change
Split out from #3558 this adds a
DictionaryArray::occupancy
which provides a mask of the "used" values. This is effectively a selection vector for the values (#4095)What changes are included in this PR?
Are there any user-facing changes?