Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unsafe Send impl from PriorityMap #12289

Conversation

findepi
Copy link
Member

@findepi findepi commented Sep 2, 2024

It's not necessary to use unsafe Send impl. It's enough to require the referenced trait objects as Send.

It's not necessary to use unsafe Send impl. It's enough to require the
referenced trait objects as Send.
@github-actions github-actions bot added the physical-expr Physical Expressions label Sep 2, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @findepi

Hopefully @avantgardnerio can also have a look as I think he did the initial implementation of this topk heap

capacity: usize,
mapper: Vec<(usize, usize)>,
}

// JUSTIFICATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avantgardnerio -- can you please remind me how we tested this / how we can double check that this doesn't cause a performance regression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't verify benchmarks, but per static code anaysis, PriorityMap is required to be Send. if you just remove trait implementation (marker), code won't compile.
With this PR, PriorityMap is still Send. The only difference is that this is now inferred by the compiler, so safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I ran the benchmark, and it was slower than without the optimization. So I made this change and it got however much faster is listed in the comment.

I sounds from @findepi though like the question is now moot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing unsafe impl Send for PriorityMap {} line alone gives compilation error, because rustc does not infer PriorityMap to be Send

 main *$ cargo build
   Compiling datafusion-physical-plan v41.0.0 (/Users/findepi/repos/datafusion/datafusion/physical-plan)
error[E0277]: `(dyn ArrowHashTable + 'static)` cannot be sent between threads safely
   --> datafusion/physical-plan/src/aggregates/mod.rs:249:9
    |
249 | /         match stream {
250 | |             StreamType::AggregateStream(stream) => Box::pin(stream),
251 | |             StreamType::GroupedHash(stream) => Box::pin(stream),
252 | |             StreamType::GroupedPriorityQueue(stream) => Box::pin(stream),
253 | |         }
    | |_________^ `(dyn ArrowHashTable + 'static)` cannot be sent between threads safely
    |
    = help: the trait `std::marker::Send` is not implemented for `(dyn ArrowHashTable + 'static)`, which is required by `GroupedTopKAggregateStream: std::marker::Send`
    = note: required for `std::ptr::Unique<(dyn ArrowHashTable + 'static)>` to implement `std::marker::Send`
note: required because it appears within the type `Box<(dyn ArrowHashTable + 'static)>`
   --> /Users/findepi/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:237:12
    |
237 | pub struct Box<
    |            ^^^
note: required because it appears within the type `PriorityMap`
   --> datafusion/physical-plan/src/aggregates/topk/priority_map.rs:27:12
    |
27  | pub struct PriorityMap {
    |            ^^^^^^^^^^^
note: required because it appears within the type `GroupedTopKAggregateStream`
   --> datafusion/physical-plan/src/aggregates/topk_stream.rs:39:12
    |
39  | pub struct GroupedTopKAggregateStream {
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: required for the cast from `Pin<Box<GroupedTopKAggregateStream>>` to `Pin<Box<dyn RecordBatchStream + std::marker::Send>>`

error[E0277]: `(dyn ArrowHeap + 'static)` cannot be sent between threads safely
   --> datafusion/physical-plan/src/aggregates/mod.rs:249:9
    |
249 | /         match stream {
250 | |             StreamType::AggregateStream(stream) => Box::pin(stream),
251 | |             StreamType::GroupedHash(stream) => Box::pin(stream),
252 | |             StreamType::GroupedPriorityQueue(stream) => Box::pin(stream),
253 | |         }
    | |_________^ `(dyn ArrowHeap + 'static)` cannot be sent between threads safely
    |
    = help: the trait `std::marker::Send` is not implemented for `(dyn ArrowHeap + 'static)`, which is required by `GroupedTopKAggregateStream: std::marker::Send`
    = note: required for `std::ptr::Unique<(dyn ArrowHeap + 'static)>` to implement `std::marker::Send`
note: required because it appears within the type `Box<(dyn ArrowHeap + 'static)>`
   --> /Users/findepi/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:237:12
    |
237 | pub struct Box<
    |            ^^^
note: required because it appears within the type `PriorityMap`
   --> datafusion/physical-plan/src/aggregates/topk/priority_map.rs:27:12
    |
27  | pub struct PriorityMap {
    |            ^^^^^^^^^^^
note: required because it appears within the type `GroupedTopKAggregateStream`
   --> datafusion/physical-plan/src/aggregates/topk_stream.rs:39:12
    |
39  | pub struct GroupedTopKAggregateStream {
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: required for the cast from `Pin<Box<GroupedTopKAggregateStream>>` to `Pin<Box<dyn RecordBatchStream + std::marker::Send>>`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `datafusion-physical-plan` (lib) due to 2 previous errors

however, removing unsafe impl Send for PriorityMap {} plus other changes in this PR keeps PriorityMap as Send, so the code works exactly as it does on current main

Copy link
Contributor

@avantgardnerio avantgardnerio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the change here is just change the unsafe Send to Safe Send without performance degrade? Looks good to me

@findepi
Copy link
Member Author

findepi commented Sep 3, 2024

yes

@jayzhan211 jayzhan211 merged commit 4a227c5 into apache:main Sep 3, 2024
24 checks passed
@findepi findepi deleted the findepi/remove-unsafe-send-impl-from-prioritymap-34b751 branch September 3, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants