JIT: Improve containment for widening intrinsics #109474

saucecontrol · 2024-11-02T06:51:55Z

This adds containment support for the AVX-512 integer widening loads and fixes a few problems with the existing logic. In summary, it:

Replaces a faulty calculation that prevented containment of full vector loads for some of the added AVX-512 instructions.
Supports SIMD scalar load containment in more situations.
Prevents containment in places it was previously allowed and shouldn't have been.

Examples:

Enabled containment of scalar loads for all qualifying widening instructions

Scalar load containment was previously disabled for all widening instructions except movddup.

static unsafe Vector128<short> ShouldContainScalarLoad(byte* ptr)
{
    return Sse41.ConvertToVector128Int16(Sse2.LoadScalarVector128((ulong*)ptr).AsByte());
}

Before

vmovq    xmm0, qword ptr [rdx]
vpmovzxbw xmm0, xmm0

After

vpmovzxbw xmm0, qword ptr [rdx]

Disabled containment of scalar loads smaller than the instruction requirement

Scalar load containment was enabled unconditionally for movddup. This example should not be contained because it reads only 4 bytes, while the contained load will read 8.

static unsafe Vector128<double> ShouldNotContainDdup(byte* ptr)
{
    return Sse3.MoveAndDuplicate(Sse.LoadScalarVector128((float*)ptr).AsDouble());
}

Before

vmovddup xmm0, qword ptr [rdx]

After

vmovss   xmm0, dword ptr [rdx]
vmovddup xmm0, xmm0

Disabled containment of aligned loads smaller than 16 bytes in MinOpts

In MinOpts, we typically allow containment of aligned loads on non-VEX hardware because the instructions will fault on unaligned addresses, however this is not true for instructions that load smaller values. This example was previously contained with MinOpts, EnableAVX=0.

static unsafe Vector128<short> ShouldNotContainAlignedLoad(byte* ptr)
{
    return Sse41.ConvertToVector128Int16(Sse2.LoadAlignedVector128(ptr));
}

Before

pmovzxbw xmm0, qword ptr [rax]

After

movdqa   xmm0, xmmword ptr [rax]
pmovzxbw xmm0, xmm0

Remaining diffs are all improvements.

dotnet-policy-service · 2024-11-02T06:52:33Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/lowerxarch.cpp

saucecontrol · 2024-11-02T17:10:46Z

src/coreclr/jit/lowerxarch.cpp

-                    supportsAlignedSIMDLoads   = !comp->canUseVexEncoding() || !comp->opts.MinOpts();
-                    supportsUnalignedSIMDLoads = comp->canUseVexEncoding();


These instructions are AVX+ so they imply VEX encoding.

src/coreclr/jit/lowerxarch.cpp

tannergooding · 2024-11-20T20:54:21Z

CC. @dotnet/jit-contrib for secondary review.

improve containment for widening intrinsics

5eae536

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 2, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Nov 2, 2024

saucecontrol added 2 commits November 2, 2024 00:03

apply formatting patch

aaa70c2

tidying

972cd2a

build-analysis bot mentioned this pull request Nov 2, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

saucecontrol marked this pull request as ready for review November 2, 2024 16:51

saucecontrol commented Nov 2, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved

saucecontrol commented Nov 2, 2024

View reviewed changes

tannergooding reviewed Nov 3, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved

saucecontrol added 4 commits November 2, 2024 21:55

use tuple type for load size factor

600f0d7

apply formatting patch

490c2f9

fix build

408430b

Merge branch 'main' into widening-containment

c80058b

tannergooding reviewed Nov 20, 2024

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved

tannergooding approved these changes Nov 20, 2024

View reviewed changes

saucecontrol added 2 commits November 20, 2024 15:44

whitespace

7facafb

revert emitter changes

a384933

BruceForstall approved these changes Nov 21, 2024

View reviewed changes

BruceForstall merged commit 975f4ea into dotnet:main Nov 21, 2024
107 of 108 checks passed

saucecontrol deleted the widening-containment branch November 21, 2024 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Improve containment for widening intrinsics #109474

JIT: Improve containment for widening intrinsics #109474

saucecontrol commented Nov 2, 2024 •

edited

Loading

dotnet-policy-service bot commented Nov 2, 2024

saucecontrol Nov 2, 2024

tannergooding commented Nov 20, 2024

		supportsAlignedSIMDLoads = !comp->canUseVexEncoding() \|\| !comp->opts.MinOpts();
		supportsUnalignedSIMDLoads = comp->canUseVexEncoding();

JIT: Improve containment for widening intrinsics #109474

JIT: Improve containment for widening intrinsics #109474

Conversation

saucecontrol commented Nov 2, 2024 • edited Loading

Enabled containment of scalar loads for all qualifying widening instructions

Disabled containment of scalar loads smaller than the instruction requirement

Disabled containment of aligned loads smaller than 16 bytes in MinOpts

dotnet-policy-service bot commented Nov 2, 2024

saucecontrol Nov 2, 2024

Choose a reason for hiding this comment

tannergooding commented Nov 20, 2024

saucecontrol commented Nov 2, 2024 •

edited

Loading