Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk based iteration in accumulate_indices #13451

Merged
merged 11 commits into from
Nov 20, 2024
Merged
Prev Previous commit
Next Next commit
fmt
Signed-off-by: Jay Zhan <jayzhan211@gmail.com>
  • Loading branch information
jayzhan211 committed Nov 16, 2024
commit d0c8e05c8cb97fd691b4137b6001061b62ba334a
Original file line number Diff line number Diff line change
Expand Up @@ -402,10 +402,8 @@ pub fn accumulate_indices<F>(

group_indices_chunks.zip(bit_chunks.iter()).for_each(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if just using set_indices here is possible to iterate on valid indices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite surprise that the fixed chunk size (64) is faster than iterating set_indices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems it is faster (generating faster code) when valid values are high enough.

|(group_index_chunk, mask)| {
// index_mask has value 1 << i in the loop
let mut index_mask = 1;
group_index_chunk.iter().for_each(|&group_index| {
// valid bit was set, real vale
let is_valid = (mask & index_mask) != 0;
if is_valid {
index_fn(group_index);
Expand Down Expand Up @@ -478,18 +476,14 @@ pub fn accumulate_indices<F>(
.zip(valid_bit_chunks.iter())
.zip(filter_bit_chunks.iter())
.for_each(|((group_index_chunk, valid_mask), filter_mask)| {
// index_mask has value 1 << i in the loop
let mut index_mask = 1;
group_index_chunk.iter().for_each(
|&group_index| {
// valid bit was set, real value
let is_valid = (filter_mask & valid_mask & index_mask) != 0;
if is_valid {
index_fn(group_index);
}
index_mask <<= 1;
},
)
group_index_chunk.iter().for_each(|&group_index| {
let is_valid = (filter_mask & valid_mask & index_mask) != 0;
if is_valid {
index_fn(group_index);
}
index_mask <<= 1;
})
});

// handle any remaining bits (after the initial 64)
Expand All @@ -499,7 +493,8 @@ pub fn accumulate_indices<F>(
.iter()
.enumerate()
.for_each(|(i, &group_index)| {
let is_valid = remainder_filter_bits & remainder_valids_bits & (1 << i) != 0;
let is_valid =
remainder_filter_bits & remainder_valids_bits & (1 << i) != 0;
if is_valid {
index_fn(group_index)
}
Expand Down
Loading