-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ShuffleUnsafe
methods and optimise Shuffle
for non-constant indices
#99596
base: main
Are you sure you want to change the base?
Conversation
Note regarding the
|
Benchmark results of my AVX2 code ( Yes, this is a very micro benchmark, but results are pretty reproducible on my machine (within ~%10 usually), and are probably pretty close to reality since it should be pretty quick (but obviously this doesn't measure the overhead with surrounding code due to more pipeline usage, etc.). (edit: there was likely an issue with this benchmark turning into a no-op) |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this again.
We're already using the so-far-internal Vector128.ShuffleUnsafe
in a bunch of places. Should we be using Vector256.ShuffleUnsafe
somewhere?
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
It seems to me that all the current uses of |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs
Outdated
Show resolved
Hide resolved
Can someone please check I won't accidentally regress mono :) |
Mono changes look good to me. Thanks for your contribution. |
Re: 9868e73 |
I don't think there's any issue with the runtime relying on specific behaviour. For external libraries, I think one of the following approaches makes sense:
I think the approach needs to be consistent for all of them, so I removed the Another option, which I briefly mentioned in a comment somewhere, is to expose a variant like |
I'm fine with only documenting "anything above 15 is UB". |
Yes, I've been careful to not use the AVX-512 one for this method for this reason. I will add a comment at some point to explain this in the method (assuming I don't forget). |
This comment was marked as outdated.
This comment was marked as outdated.
Benchmark results on Intel
|
- Optimise comparison in `gtNewSimdShuffleNodeVariable` for xarch - Optimise for constant vector in Vector256.Shuffle{Unsafe} when have AVX2 only
Shuffle
with variable indices on coreclr (for all types)Shuffle
onVector256
(with signed/unsigned bytes and shorts)Vector256
shuffle withAvx2.Shuffle
(for signed/unsigned bytes and shorts)Todo tasks:
VectorXXX.ShuffleUnsafe
for vectors of other element typesShuffle
inShuffleUnsafe
fallbackCodegen:
Shuffle With AVX2
ShuffleUnsafe With AVX2
Shuffle With Sse4.2
ShuffleUnsafe With Sse4.2