Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Improve containment for widening intrinsics #109474

Merged
merged 9 commits into from
Nov 21, 2024

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Nov 2, 2024

This adds containment support for the AVX-512 integer widening loads and fixes a few problems with the existing logic. In summary, it:

  • Replaces a faulty calculation that prevented containment of full vector loads for some of the added AVX-512 instructions.
  • Supports SIMD scalar load containment in more situations.
  • Prevents containment in places it was previously allowed and shouldn't have been.

Examples:

Enabled containment of scalar loads for all qualifying widening instructions

Scalar load containment was previously disabled for all widening instructions except movddup.

static unsafe Vector128<short> ShouldContainScalarLoad(byte* ptr)
{
    return Sse41.ConvertToVector128Int16(Sse2.LoadScalarVector128((ulong*)ptr).AsByte());
}

Before

vmovq    xmm0, qword ptr [rdx]
vpmovzxbw xmm0, xmm0

After

vpmovzxbw xmm0, qword ptr [rdx]

Disabled containment of scalar loads smaller than the instruction requirement

Scalar load containment was enabled unconditionally for movddup. This example should not be contained because it reads only 4 bytes, while the contained load will read 8.

static unsafe Vector128<double> ShouldNotContainDdup(byte* ptr)
{
    return Sse3.MoveAndDuplicate(Sse.LoadScalarVector128((float*)ptr).AsDouble());
}

Before

vmovddup xmm0, qword ptr [rdx]

After

vmovss   xmm0, dword ptr [rdx]
vmovddup xmm0, xmm0

Disabled containment of aligned loads smaller than 16 bytes in MinOpts

In MinOpts, we typically allow containment of aligned loads on non-VEX hardware because the instructions will fault on unaligned addresses, however this is not true for instructions that load smaller values. This example was previously contained with MinOpts, EnableAVX=0.

static unsafe Vector128<short> ShouldNotContainAlignedLoad(byte* ptr)
{
    return Sse41.ConvertToVector128Int16(Sse2.LoadAlignedVector128(ptr));
}

Before

pmovzxbw xmm0, qword ptr [rax]

After

movdqa   xmm0, xmmword ptr [rax]
pmovzxbw xmm0, xmm0

Remaining diffs are all improvements.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 2, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Nov 2, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved
src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved
src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved
Comment on lines -9467 to -9468
supportsAlignedSIMDLoads = !comp->canUseVexEncoding() || !comp->opts.MinOpts();
supportsUnalignedSIMDLoads = comp->canUseVexEncoding();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions are AVX+ so they imply VEX encoding.

@tannergooding
Copy link
Member

CC. @dotnet/jit-contrib for secondary review.

@BruceForstall BruceForstall merged commit 975f4ea into dotnet:main Nov 21, 2024
107 of 108 checks passed
@saucecontrol saucecontrol deleted the widening-containment branch November 21, 2024 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants