-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize acquire-release operations in CFP #7264
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Both passes use StructUtils::StructScanner to analyze struct operations. Add support for RMW operations to this utility and update its users to provide the new `noteRMW` hook. Test that GTO and TypeRefining optimizations work as expected in the presence of RMW operations, but leave a proper implementation in ConstantFieldPropagation to a later PR. To allow TypeRefining to refine field types based only on the "replacement" operand and not the "expected" operand of cmpxchg operations, update validation to allow the "expected" field to be a supertype of the accessed field type as long as it is still equality comparable. WebAssembly/shared-everything-threads#92 clarifies this intended typing in the upstream proposal.
RMW operations generally modify the field they access, so they inhibit constant field propagation. The exceptions are xchg and cmpxchg operations, which are like sets (conditional sets in the case of cmpxchg) that happen to do gets as well. Just like for sets, we can still optimize when the value written by xchg or cmpxchg happens to be either the same constant value we know is already in the field or is a copy from some instance of the same field. There is potential for further optimization when we know an xchg or cmpxchg sets a value that was read from some other field that we know has the same constant value, but that's a missing optimization for plain sets as well. There is also further optimization potential when we can reason that a cmpxchg that would write a different value will never perform the write because the comparison would never succeed. Add tests with TODOs so we remember these potential optimizations in the future.
We previously avoided optimizing acquire-release operations in ConstantFieldPropagation to avoid having to replace them with acquire-release fences, which are potentially more expensive than normal acquire-release operations. However, in #7168 we reasoned that optimized acquire gets of constant fields do not need to be replaced with fences because they never necessarily synchronize with a set on another thread; a get can be considered to read the same constant value from before the set. Apply that reasoning to CFP and start optimizing acquire-release operations without replacing them with fences.
kripken
reviewed
Feb 3, 2025
test/lit/passes/cfp-rmw.wast
Outdated
;; CHECK-NEXT: (i32.const 0) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (i32.const 0) | ||
;; CHECK-NEXT: ) | ||
(func $rmw-cmpxchg-acqrel (param (ref $A) i32) (result i32) | ||
;; Making the accesses acqrel instead of seqcst means that the replacement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updating
test/lit/passes/cfp-rmw.wast
Outdated
;; CHECK-NEXT: (i32.const 0) | ||
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: (i32.const 0) | ||
;; CHECK-NEXT: ) | ||
(func $rmw-cmpxchg-acqrel (param (ref $A) i32) (result i32) | ||
;; Making the accesses acqrel instead of seqcst means that the replacement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
kripken
approved these changes
Feb 4, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We previously avoided optimizing acquire-release operations in
ConstantFieldPropagation to avoid having to replace them with
acquire-release fences, which are potentially more expensive than normal
acquire-release operations. However, in #7168 we reasoned that optimized
acquire gets of constant fields do not need to be replaced with fences
because they never necessarily synchronize with a set on another thread;
a get can be considered to read the same constant value from before the
set. Apply that reasoning to CFP and start optimizing acquire-release
operations without replacing them with fences.