Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] [APX] Enable additional General Purpose Registers. #108799

Draft
wants to merge 69 commits into
base: main
Choose a base branch
from

Conversation

DeepakRajendrakumaran
Copy link
Contributor

This PR is built on top of #108796

What this PR does

  1. Add eGPR to available register on x64 in JIT and related changes to turn these on/off based on APX availability
    Link to related commit
  2. A LSRA_LIMIT_EXT_GPR_SET register stress mode to force eGPR register usage when possible.
    Link to related commit
  3. Some minor changes to turn on Rex2 encoding with eGPR
    Link to related commit
  4. Temporary changes to mask away eGPR for currently un-supported instructions - primarily ones requiring eEVEX + imul + movszx (This commit will be removed once we have support for these but was essential for testing)
    Link to related commit
  5. Minor flags to gets altjit to work(need to make sure if this is conflicting with Ruihan's changes)
    Link to related commit

Testing

  • Ran tests using sde(specifically src/tests/JIT) using Ruihan's script
  • Ran superpmi for src/tests/JIT using altjit feature

Analysis of superpmi results

Summary from JitAnalyze


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 823288813
Total bytes of diff: 823634058
Total bytes of delta: 345245 (0.04 % of base)
Total relative delta: NaN
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
       98472 : JIT\Methodical\Methodical_do\Methodical_do.dasm (3.18% of base)
       98472 : JIT\Methodical\Methodical_ro\Methodical_ro.dasm (3.18% of base)
       87180 : JIT\Methodical\Methodical_d1\Methodical_d1.dasm (3.77% of base)
       87180 : JIT\Methodical\Methodical_r1\Methodical_r1.dasm (3.75% of base)
       11382 : JIT\Methodical\Methodical_r2\Methodical_r2.dasm (0.87% of base)
       11382 : JIT\Methodical\Methodical_d2\Methodical_d2.dasm (0.89% of base)
        3422 : JIT\HardwareIntrinsics\Arm\Sve\Sve_ro\Sve_ro.dasm (0.05% of base)
        1599 : JIT\HardwareIntrinsics\Arm\AdvSimd\AdvSimd_ro\AdvSimd_ro.dasm (0.02% of base)
        1271 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_Avx512_ro\X86_Avx512F_ro.dasm (0.04% of base)
        1271 : JIT\HardwareIntrinsics\X86_Avx512\Avx512F\Avx512F_ro\X86_Avx512F_ro.dasm (0.04% of base)
        1205 : JIT\HardwareIntrinsics\X86\Sse2\Sse2_ro\X86_Sse2_ro.dasm (0.06% of base)
        1205 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_ro\X86_Sse2_ro.dasm (0.06% of base)
        1166 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_Avx10v1_ro\X86_Avx10v1_Vector128_ro.dasm (0.09% of base)
        1166 : JIT\HardwareIntrinsics\X86_Avx10v1\Avx10v1_Vector128\Avx10v1_Vector128_ro\X86_Avx10v1_Vector128_ro.dasm (0.09% of base)
        1091 : JIT\HardwareIntrinsics\X86_Avx\Avx2\Avx2_ro\X86_Avx2_ro.dasm (0.05% of base)
        1091 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_Avx_ro\X86_Avx2_ro.dasm (0.05% of base)
        1091 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_Avx_r\X86_Avx2_ro.dasm (0.05% of base)
        1016 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_Avx10v1_ro\X86_Avx10v1_Vector256_ro.dasm (0.08% of base)
        1016 : JIT\HardwareIntrinsics\X86_Avx10v1\Avx10v1_Vector256\Avx10v1_Vector256_ro\X86_Avx10v1_Vector256_ro.dasm (0.08% of base)
         879 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_Avx_r\X86_Avx1_ro.dasm (0.07% of base)

Top file improvements (bytes):
       -9932 : JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\Vector512_1_ro.dasm (-0.40% of base)
       -9932 : JIT\HardwareIntrinsics\General\Vector512_1\Vector512_1_ro\Vector512_1_ro.dasm (-0.40% of base)
       -9435 : JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\Vector512_ro.dasm (-0.23% of base)
       -9435 : JIT\HardwareIntrinsics\General\Vector512\Vector512_ro\Vector512_ro.dasm (-0.23% of base)
       -7973 : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_ro\X86_Sse2_handwritten_ro.dasm (-1.92% of base)
       -7973 : JIT\HardwareIntrinsics\X86\Sse2\Sse2_handwritten_ro\X86_Sse2_handwritten_ro.dasm (-1.92% of base)
       -4763 : JIT\Regression\JitBlue\GitHub_17777\GitHub_17777\GitHub_17777.dasm (-1.32% of base)
       -2935 : JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\Vector128_ro.dasm (-0.09% of base)
       -2935 : JIT\HardwareIntrinsics\General\Vector128\Vector128_ro\Vector128_ro.dasm (-0.09% of base)
       -2496 : JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\Vector64_ro.dasm (-0.07% of base)
       -2496 : JIT\HardwareIntrinsics\General\Vector64\Vector64_ro\Vector64_ro.dasm (-0.07% of base)
       -2311 : JIT\HardwareIntrinsics\General\Vector256\Vector256_ro\Vector256_ro.dasm (-0.07% of base)
       -2311 : JIT\HardwareIntrinsics\HardwareIntrinsics_General_ro\Vector256_ro.dasm (-0.07% of base)
       -1088 : JIT\Methodical\Arrays\huge\huge_b_r\huge_b_r.dasm (-19.19% of base)
       -1088 : JIT\Methodical\Arrays\huge\huge_i4_r\huge_i4_r.dasm (-18.74% of base)
       -1088 : JIT\Methodical\Arrays\huge\huge_r4_r\huge_r4_r.dasm (-18.14% of base)
       -1088 : JIT\Methodical\Arrays\huge\huge_r8_r\huge_r8_r.dasm (-18.32% of base)
       -1088 : JIT\Methodical\Methodical_r1\huge_i4_r.dasm (-18.74% of base)
       -1088 : JIT\Methodical\Methodical_r1\huge_r4_r.dasm (-18.14% of base)
       -1088 : JIT\Methodical\Methodical_r1\huge_r8_r.dasm (-18.32% of base)

852 total files with Code Size differences (180 improved, 672 regressed), 4485 unchanged.

Top method regressions (bytes):
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - i4rem:TestEntryPoint():int (FullOpts)
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_d1\Methodical_d1.dasm - i4rem:TestEntryPoint():int (FullOpts)
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - i4rem:TestEntryPoint():int (FullOpts)
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_r1\Methodical_r1.dasm - i4rem:TestEntryPoint():int (FullOpts)
        8982 ( 8.50% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - i8rem:TestEntryPoint():int (FullOpts)
        8982 ( 8.50% of base) : JIT\Methodical\Methodical_d1\Methodical_d1.dasm - i8rem:TestEntryPoint():int (FullOpts)
        8982 ( 8.50% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - i8rem:TestEntryPoint():int (FullOpts)
        8982 ( 8.50% of base) : JIT\Methodical\Methodical_r1\Methodical_r1.dasm - i8rem:TestEntryPoint():int (FullOpts)
        8133 ( 7.81% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - u4div:TestEntryPoint():int (FullOpts)
        8133 ( 7.81% of base) : JIT\Methodical\Methodical_d1\Methodical_d1.dasm - u4div:TestEntryPoint():int (FullOpts)
        8133 ( 7.81% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - u4div:TestEntryPoint():int (FullOpts)
        8133 ( 7.81% of base) : JIT\Methodical\Methodical_r1\Methodical_r1.dasm - u4div:TestEntryPoint():int (FullOpts)
        8034 ( 7.99% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - i4div:TestEntryPoint():int (FullOpts)
        8034 ( 7.99% of base) : JIT\Methodical\Methodical_d1\Methodical_d1.dasm - i4div:TestEntryPoint():int (FullOpts)
        8034 ( 7.99% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - i4div:TestEntryPoint():int (FullOpts)
        8034 ( 7.99% of base) : JIT\Methodical\Methodical_r1\Methodical_r1.dasm - i4div:TestEntryPoint():int (FullOpts)
        8010 ( 8.28% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - r8div:TestEntryPoint():int (FullOpts)
        8010 ( 8.28% of base) : JIT\Methodical\Methodical_d1\Methodical_d1.dasm - r8div:TestEntryPoint():int (FullOpts)
        8010 ( 8.28% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - r8div:TestEntryPoint():int (FullOpts)
        8010 ( 8.28% of base) : JIT\Methodical\Methodical_r1\Methodical_r1.dasm - r8div:TestEntryPoint():int (FullOpts)

Top method improvements (bytes):
       -4763 (-1.33% of base) : JIT\Regression\JitBlue\GitHub_17777\GitHub_17777\GitHub_17777.dasm - Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int (FullOpts)
       -1088 (-19.19% of base) : JIT\Methodical\Arrays\huge\huge_b_r\huge_b_r.dasm - JitTest_huge_b_huge_il.Test:Main():int (FullOpts)
       -1088 (-19.19% of base) : JIT\Methodical\Methodical_r1\huge_b_r.dasm - JitTest_huge_b_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.74% of base) : JIT\Methodical\Arrays\huge\huge_i4_r\huge_i4_r.dasm - JitTest_huge_i4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.74% of base) : JIT\Methodical\Methodical_r1\huge_i4_r.dasm - JitTest_huge_i4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.14% of base) : JIT\Methodical\Arrays\huge\huge_r4_r\huge_r4_r.dasm - JitTest_huge_r4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.14% of base) : JIT\Methodical\Methodical_r1\huge_r4_r.dasm - JitTest_huge_r4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.32% of base) : JIT\Methodical\Arrays\huge\huge_r8_r\huge_r8_r.dasm - JitTest_huge_r8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.32% of base) : JIT\Methodical\Methodical_r1\huge_r8_r.dasm - JitTest_huge_r8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.59% of base) : JIT\Methodical\Methodical_r1\huge_u8_r.dasm - JitTest_huge_u8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.59% of base) : JIT\Methodical\Arrays\huge\huge_u8_r\huge_u8_r.dasm - JitTest_huge_u8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.29% of base) : JIT\Methodical\Methodical_r2\hugedim_r.dasm - JitTest_hugedim_arrays_il.Test:Main():int (FullOpts)
       -1088 (-18.29% of base) : JIT\Methodical\int64\arrays\hugedim_r\hugedim_r.dasm - JitTest_hugedim_arrays_il.Test:Main():int (FullOpts)
        -781 (-11.87% of base) : JIT\Methodical\VT\port\huge_gcref_r\huge_gcref_r.dasm - JitTest_huge_gcref_port_il.Test:Main():int (FullOpts)
        -781 (-11.87% of base) : JIT\Methodical\Methodical_r2\huge_gcref_r.dasm - JitTest_huge_gcref_port_il.Test:Main():int (FullOpts)
        -781 (-11.87% of base) : JIT\Methodical\Methodical_r1\huge_struct_r.dasm - JitTest_huge_struct_huge_il.Test:Main():int (FullOpts)
        -781 (-11.87% of base) : JIT\Methodical\Arrays\huge\huge_struct_r\huge_struct_r.dasm - JitTest_huge_struct_huge_il.Test:Main():int (FullOpts)
        -749 (-11.58% of base) : JIT\Methodical\Arrays\huge\huge_objref_r\huge_objref_r.dasm - JitTest_huge_objref_huge_il.Test:Main():int (FullOpts)
        -749 (-11.58% of base) : JIT\Methodical\Methodical_r1\huge_objref_r.dasm - JitTest_huge_objref_huge_il.Test:Main():int (FullOpts)
        -361 (-11.14% of base) : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[long,long]:CheckUnpack(IntelHardwareIntrinsicTest.SSE2.CheckMethodSixteenOfAll`2[long,long]):ubyte:this (FullOpts)

Top method regressions (percentages):
         362 (27.38% of base) : JIT\HardwareIntrinsics\Arm\AdvSimd\AdvSimd_ro\AdvSimd_ro.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.VectorLookupExtension_4Test__VectorTableLookupExtensionByte:.ctor():this (FullOpts)
         362 (27.30% of base) : JIT\HardwareIntrinsics\Arm\AdvSimd.Arm64\AdvSimd.Arm64_ro\AdvSimd.Arm64_ro.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.VectorLookupExtension_4Test__VectorTableLookupExtensionByte:.ctor():this (FullOpts)
         362 (27.12% of base) : JIT\HardwareIntrinsics\Arm\AdvSimd\AdvSimd_ro\AdvSimd_ro.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.VectorLookupExtension_4Test__VectorTableLookupExtensionSByte:.ctor():this (FullOpts)
         362 (27.04% of base) : JIT\HardwareIntrinsics\Arm\AdvSimd.Arm64\AdvSimd.Arm64_ro\AdvSimd.Arm64_ro.dasm - JIT.HardwareIntrinsics.Arm._AdvSimd.Arm64.VectorLookupExtension_4Test__VectorTableLookupExtensionSByte:.ctor():this (FullOpts)
         357 (14.89% of base) : JIT\Methodical\Methodical_others\Methodical_others.dasm - Test_baduwinfo1:bar(System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String,System.String):int (FullOpts)
          54 (12.68% of base) : JIT\Methodical\Methodical_others\Methodical_others.dasm - structinreg.Program2:test23(structinreg.Test23):int (FullOpts)
         137 (12.24% of base) : JIT\superpmi\superpmicollect\Bytemark\Bytemark.dasm - IDEAEncryption:cipher_idea(ubyte[],ubyte[],int,ushort[]) (FullOpts)
         137 (12.24% of base) : JIT\Performance\CodeQuality\Bytemark\Bytemark\Bytemark.dasm - IDEAEncryption:cipher_idea(ubyte[],ubyte[],int,ushort[]) (FullOpts)
          23 (11.17% of base) : JIT\Performance\JIT.performance\fannkuch-redux-9.dasm - BenchmarksGame.FannkuchRedux_9:FirstPermutation(ulong,ulong,ulong,int,int) (FullOpts)
          23 (11.17% of base) : JIT\Performance\CodeQuality\BenchmarksGame\fannkuch-redux\fannkuch-redux-9\fannkuch-redux-9.dasm - BenchmarksGame.FannkuchRedux_9:FirstPermutation(ulong,ulong,ulong,int,int) (FullOpts)
          20 (10.31% of base) : JIT\Directed\tailcall\more_tailcalls\more_tailcalls.dasm - Program:IL_STUB_InstantiatingStub(System.Object,System.Object,System.Object,System.Object,System.Object,System.Object,System.Object,System.Object,int,int,System.Span`1[int],int):int (FullOpts)
        1595 ( 9.63% of base) : JIT\Methodical\Methodical_r2\Methodical_r2.dasm - r4NaNsub:TestEntryPoint():int (FullOpts)
        1595 ( 9.63% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - r4NaNsub:TestEntryPoint():int (FullOpts)
        1595 ( 9.63% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - r4NaNsub:TestEntryPoint():int (FullOpts)
        1595 ( 9.63% of base) : JIT\Methodical\Methodical_d2\Methodical_d2.dasm - r4NaNsub:TestEntryPoint():int (FullOpts)
         106 ( 9.52% of base) : JIT\Directed\array-il\_Arrayscomplex3\_Arrayscomplex3.dasm - Complex2_Array_Test:Main():int (FullOpts)
         106 ( 9.52% of base) : JIT\Directed\Directed_3\_Arrayscomplex3.dasm - Complex2_Array_Test:Main():int (FullOpts)
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_do\Methodical_do.dasm - i4rem:TestEntryPoint():int (FullOpts)
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_d1\Methodical_d1.dasm - i4rem:TestEntryPoint():int (FullOpts)
        9850 ( 9.50% of base) : JIT\Methodical\Methodical_ro\Methodical_ro.dasm - i4rem:TestEntryPoint():int (FullOpts)

Top method improvements (percentages):
       -1088 (-19.19% of base) : JIT\Methodical\Arrays\huge\huge_b_r\huge_b_r.dasm - JitTest_huge_b_huge_il.Test:Main():int (FullOpts)
       -1088 (-19.19% of base) : JIT\Methodical\Methodical_r1\huge_b_r.dasm - JitTest_huge_b_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.74% of base) : JIT\Methodical\Arrays\huge\huge_i4_r\huge_i4_r.dasm - JitTest_huge_i4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.74% of base) : JIT\Methodical\Methodical_r1\huge_i4_r.dasm - JitTest_huge_i4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.59% of base) : JIT\Methodical\Methodical_r1\huge_u8_r.dasm - JitTest_huge_u8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.59% of base) : JIT\Methodical\Arrays\huge\huge_u8_r\huge_u8_r.dasm - JitTest_huge_u8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.32% of base) : JIT\Methodical\Arrays\huge\huge_r8_r\huge_r8_r.dasm - JitTest_huge_r8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.32% of base) : JIT\Methodical\Methodical_r1\huge_r8_r.dasm - JitTest_huge_r8_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.29% of base) : JIT\Methodical\Methodical_r2\hugedim_r.dasm - JitTest_hugedim_arrays_il.Test:Main():int (FullOpts)
       -1088 (-18.29% of base) : JIT\Methodical\int64\arrays\hugedim_r\hugedim_r.dasm - JitTest_hugedim_arrays_il.Test:Main():int (FullOpts)
       -1088 (-18.14% of base) : JIT\Methodical\Arrays\huge\huge_r4_r\huge_r4_r.dasm - JitTest_huge_r4_huge_il.Test:Main():int (FullOpts)
       -1088 (-18.14% of base) : JIT\Methodical\Methodical_r1\huge_r4_r.dasm - JitTest_huge_r4_huge_il.Test:Main():int (FullOpts)
        -112 (-16.26% of base) : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[ubyte,long]:GetEightOneDataPoint(int):System.ValueTuple`4[System.ValueTuple`8[ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,System.ValueTuple`1[ubyte]],System.ValueTuple`8[ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,System.ValueTuple`1[ubyte]],long,long]:this (FullOpts)
        -112 (-16.26% of base) : JIT\HardwareIntrinsics\X86\Sse2\Sse2_handwritten_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[ubyte,long]:GetEightOneDataPoint(int):System.ValueTuple`4[System.ValueTuple`8[ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,System.ValueTuple`1[ubyte]],System.ValueTuple`8[ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,System.ValueTuple`1[ubyte]],long,long]:this (FullOpts)
        -112 (-15.80% of base) : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[short,long]:GetEightOneDataPoint(int):System.ValueTuple`4[System.ValueTuple`8[short,short,short,short,short,short,short,System.ValueTuple`1[short]],System.ValueTuple`8[short,short,short,short,short,short,short,System.ValueTuple`1[short]],long,long]:this (FullOpts)
        -112 (-15.80% of base) : JIT\HardwareIntrinsics\X86\Sse2\Sse2_handwritten_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[short,long]:GetEightOneDataPoint(int):System.ValueTuple`4[System.ValueTuple`8[short,short,short,short,short,short,short,System.ValueTuple`1[short]],System.ValueTuple`8[short,short,short,short,short,short,short,System.ValueTuple`1[short]],long,long]:this (FullOpts)
        -297 (-15.66% of base) : JIT\Performance\JIT.performance\MDMulMatrix.dasm - Benchstone.MDBenchI.MDMulMatrix:Inner(int[,],int[,],int[,]) (FullOpts)
        -297 (-15.66% of base) : JIT\Performance\CodeQuality\Benchstones\MDBenchI\MDMulMatrix\MDMulMatrix\MDMulMatrix.dasm - Benchstone.MDBenchI.MDMulMatrix:Inner(int[,],int[,],int[,]) (FullOpts)
        -349 (-14.64% of base) : JIT\HardwareIntrinsics\HardwareIntrinsics_X86_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[ubyte,long]:CheckPackSaturate(IntelHardwareIntrinsicTest.SSE2.CheckMethodSixteen`2[ubyte,long]):ubyte:this (FullOpts)
        -349 (-14.64% of base) : JIT\HardwareIntrinsics\X86\Sse2\Sse2_handwritten_ro\X86_Sse2_handwritten_ro.dasm - IntelHardwareIntrinsicTest.SSE2.TestTableSse2`2[ubyte,long]:CheckPackSaturate(IntelHardwareIntrinsicTest.SSE2.CheckMethodSixteen`2[ubyte,long]):ubyte:this (FullOpts)

15016 total methods with Code Size differences (5105 improved, 9911 regressed), 1031138 unchanged.


Why arm tests from \HardwareIntrinsics\Arm\AdvSimd\AdvSimd_ro\AdvSimd_ro.dasm show up here and generates x86 code. My theory is that since we are compiling for x86 using aljit, it takes the software fallback path for arm instrinsics and generates x86 code. See how IsSupported() is generating false below

image

I'm ignoring these for now

Some interesting code samples highlighting changes introduced due to enabling additional GPRs

Case 1

A very simple case with r16 being used

see V03 loc0
In this case, a spill is reduced and we see instruction reduction. The cost of using this eGPR is slightly higher encoding size with Rex2. We do not add this to the calculus while doing reg allocation

<details>
<summary><span style="color:green">-3</span> (<span style="color:green">-4.92%</span>) : 9473.dasm - System.Threading.Tasks.Task:AtomicStateUpdate(int,int):ubyte:this (Tier1)</summary>
<div style="margin-left:1em">

```diff
@@ -1,3 +1,5 @@
+
+ Deepak methName = AtomicStateUpdate 
 ; Assembly listing for method System.Threading.Tasks.Task:AtomicStateUpdate(int,int):ubyte:this (Tier1)
 ; Emitting BLENDED_CODE for X64 with AVX512 - Windows
 ; Tier1 code
@@ -11,49 +13,45 @@
 ;  V00 this         [V00,T00] (  5,  4   )     ref  ->  rcx         this class-hnd single-def <System.Threading.Tasks.Task>
 ;  V01 arg1         [V01,T02] (  4,  3   )     int  ->  rdx         single-def
 ;  V02 arg2         [V02,T03] (  4,  3   )     int  ->   r8         single-def
-;  V03 loc0         [V03,T01] (  5,  5   )     int  ->  [rsp+0x04]  spill-single-def
+;  V03 loc0         [V03,T01] (  5,  5   )     int  ->  r16        
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 0
 
 G_M2073_IG01:        ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG
-       push     rax
-						;; size=1 bbWeight=1 PerfScore 1.00
+						;; size=0 bbWeight=1 PerfScore 0.00
 G_M2073_IG02:        ; bbWeight=1, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, isz
        ; gcrRegs +[rcx]
-       mov      eax, dword ptr [rcx+0x34]
-       mov      dword ptr [rsp+0x04], eax
-       test     eax, r8d
+       mov      r16, dword ptr [rcx+0x34]
+       test     r16, r8d
        jne      SHORT G_M2073_IG05
-       lea      r10, bword ptr [rcx+0x34]
-       ; byrRegs +[r10]
-       mov      r9d, eax
-       or       r9d, edx
+       lea      r17, bword ptr [rcx+0x34]
+       ; byrRegs +[r17]
+       mov      r18, r16
+       or       r18, edx
+       mov      eax, r16
        lock     
-       cmpxchg  dword ptr [r10], r9d
-       cmp      eax, dword ptr [rsp+0x04]
+       cmpxchg  dword ptr [r17], r18
+       cmp      eax, r16
        jne      SHORT G_M2073_IG04
        mov      eax, 1
-						;; size=38 bbWeight=1 PerfScore 26.50
+						;; size=46 bbWeight=1 PerfScore 24.00
 G_M2073_IG03:        ; bbWeight=1, epilog, nogc, extend
-       add      rsp, 8
        ret      
-						;; size=5 bbWeight=1 PerfScore 1.25
+						;; size=1 bbWeight=1 PerfScore 1.00
 G_M2073_IG04:        ; bbWeight=0, gcrefRegs=0002 {rcx}, byrefRegs=0000 {}, byref, epilog, nogc
-       ; byrRegs -[r10]
-       add      rsp, 8
+       ; byrRegs -[r17]
        tail.jmp [System.Threading.Tasks.Task:AtomicStateUpdateSlow(int,int):ubyte:this]
-						;; size=10 bbWeight=0 PerfScore 0.00
+						;; size=6 bbWeight=0 PerfScore 0.00
 G_M2073_IG05:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        ; gcrRegs -[rcx]
        xor      eax, eax
-						;; size=2 bbWeight=0 PerfScore 0.00
+						;; size=4 bbWeight=0 PerfScore 0.00
 G_M2073_IG06:        ; bbWeight=0, epilog, nogc, extend
-       add      rsp, 8
        ret      
-						;; size=5 bbWeight=0 PerfScore 0.00
+						;; size=1 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 61, prolog size 1, PerfScore 28.75, instruction count 20, allocated bytes for code 61 (MethodHash=e528f7e6) for method System.Threading.Tasks.Task:AtomicStateUpdate(int,int):ubyte:this (Tier1)
+; Total bytes of code 58, prolog size 0, PerfScore 25.00, instruction count 16, allocated bytes for code 58 (MethodHash=e528f7e6) for method System.Threading.Tasks.Task:AtomicStateUpdate(int,int):ubyte:this (Tier1)
 ; ============================================================

Case 2

An example of lack of eEVEX/instructions not having eGPR support causing regression

In this example, we use imul. We currently have not enabled eGPR usage for imul. This means if the input to imul is in an eGPR, we insert a mov to move it to a lower GPR. This further adds to register usage

 ; Assembly listing for method AssignRect:first_assignments(int[,],short[,]):int (FullOpts)
 ; Emitting BLENDED_CODE for X64 with AVX512 - Windows
 ; FullOpts code
@@ -9,577 +11,486 @@
 ;
 ;  V00 arg0         [V00,T06] ( 13, 362   )     ref  ->  rcx         class-hnd single-def <int[,]>
 ;  V01 arg1         [V01,T10] ( 17, 213   )     ref  ->  rdx         class-hnd single-def <short[,]>
-;  V02 loc0         [V02,T04] ( 28, 528.50)   short  ->  registers  
-;  V03 loc1         [V03,T03] ( 27, 677   )   short  ->  registers  
+;  V02 loc0         [V02,T04] ( 28, 528.50)   short  ->  r17        
+;  V03 loc1         [V03,T03] ( 27, 677   )   short  ->  r18        
 ;  V04 loc2         [V04,T02] ( 28, 810   )   short  ->  registers  
-;  V05 loc3         [V05,T25] (  6,  25   )   short  ->  [rsp+0x60] 
-;  V06 loc4         [V06,T27] (  9,  21   )   short  ->  [rsp+0x5C] 
-;  V07 loc5         [V07,T12] (  8, 168   )   short  ->  r14        
-;  V08 loc6         [V08,T11] ( 13, 168.25)     int  ->  registers  
+;  V05 loc3         [V05,T25] (  6,  25   )   short  ->  r25        
+;  V06 loc4         [V06,T27] (  9,  21   )   short  ->  r20        
+;  V07 loc5         [V07,T12] (  8, 168   )   short  ->  r26        
+;  V08 loc6         [V08,T11] ( 13, 168.25)     int  ->  r16        
 ;  V09 OutArgs      [V09    ] (  1,   1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;  V10 tmp1         [V10,T01] ( 45,2118   )     int  ->  registers   "MD array shared temp"
 ;  V11 tmp2         [V11,T00] ( 48,2166   )     int  ->  registers   "MD array shared temp"
-;  V12 cse0         [V12,T18] (  3,  40   )     int  ->  [rsp+0x58]  spill-single-def "CSE #14: aggressive"
-;  V13 cse1         [V13,T19] (  3,  40   )     int  ->  [rsp+0x54]  spill-single-def "CSE #18: aggressive"
-;  V14 cse2         [V14,T29] (  3,  10   )     int  ->  [rsp+0x50]  spill-single-def "CSE #24: aggressive"
-;  V15 cse3         [V15,T23] (  5,  26   )     int  ->  [rsp+0x4C]  multi-def "CSE #21: aggressive"
-;  V16 cse4         [V16,T20] (  3,  40   )     int  ->  [rsp+0x48]  spill-single-def "CSE #22: aggressive"
-;  V17 cse5         [V17,T17] (  2,  68   )     int  ->  [rsp+0x44]  spill-single-def hoist "CSE #06: aggressive"
-;  V18 cse6         [V18,T21] (  2,   8   )     int  ->  [rsp+0x40]  spill-single-def "CSE #13: aggressive"
-;  V19 cse7         [V19,T22] (  2,   8   )     int  ->  [rsp+0x3C]  spill-single-def "CSE #17: aggressive"
-;  V20 cse8         [V20,T26] (  2,  17   )     int  ->  [rsp+0x38]  spill-single-def hoist "CSE #19: aggressive"
-;  V21 cse9         [V21,T28] (  2,  17   )     int  ->  r11         hoist "CSE #02: aggressive"
-;  V22 cse10        [V22,T30] (  2,   2   )     int  ->  [rsp+0x34]  spill-single-def "CSE #23: aggressive"
-;  V23 cse11        [V23,T24] (  4,  18   )     int  ->  [rsp+0x30]  multi-def "CSE #20: aggressive"
-;  V24 cse12        [V24,T05] ( 15, 512   )     int  ->  registers   "CSE #08: aggressive"
-;  V25 cse13        [V25,T07] ( 21, 330   )     int  ->  rdi         "CSE #04: aggressive"
-;  V26 cse14        [V26,T14] (  7, 145   )     int  ->  r15         "CSE #05: aggressive"
-;  V27 cse15        [V27,T09] (  7, 220   )     int  ->  r12         hoist "CSE #07: aggressive"
-;  V28 cse16        [V28,T16] ( 10, 123   )     int  ->  [rsp+0x2C]  "CSE #01: aggressive"
-;  V29 cse17        [V29,T15] ( 10, 138   )     int  ->  rbx         hoist "CSE #03: aggressive"
-;  V30 cse18        [V30,T13] ( 10, 153   )     int  ->  rbp         "CSE #09: aggressive"
-;  V31 cse19        [V31,T08] (  8, 288   )     int  ->  r13         "CSE #12: aggressive"
-;  TEMP_01                                      int  ->  [rsp+0x64]
+;  V12 cse0         [V12,T18] (  3,  40   )     int  ->  r26         "CSE #14: aggressive"
+;  V13 cse1         [V13,T19] (  3,  40   )     int  ->  rbp         "CSE #18: aggressive"
+;  V14 cse2         [V14,T29] (  3,  10   )     int  ->  [rsp+0x2C]  spill-single-def "CSE #24: aggressive"
+;  V15 cse3         [V15,T23] (  5,  26   )     int  ->  registers   multi-def "CSE #21: aggressive"
+;  V16 cse4         [V16,T20] (  3,  40   )     int  ->  rbp         "CSE #22: aggressive"
+;  V17 cse5         [V17,T17] (  2,  68   )     int  ->  r28         hoist "CSE #06: aggressive"
+;  V18 cse6         [V18,T21] (  2,   8   )     int  ->  r18         "CSE #13: aggressive"
+;  V19 cse7         [V19,T22] (  2,   8   )     int  ->  r17         "CSE #17: aggressive"
+;  V20 cse8         [V20,T26] (  2,  17   )     int  ->  r25         hoist "CSE #19: aggressive"
+;  V21 cse9         [V21,T28] (  2,  17   )     int  ->  r20         hoist "CSE #02: aggressive"
+;  V22 cse10        [V22,T30] (  2,   2   )     int  ->  [rsp+0x28]  spill-single-def "CSE #23: aggressive"
+;  V23 cse11        [V23,T24] (  4,  18   )     int  ->  r24         multi-def "CSE #20: aggressive"
+;  V24 cse12        [V24,T05] ( 15, 512   )     int  ->  r30         "CSE #08: aggressive"
+;  V25 cse13        [V25,T07] ( 21, 330   )     int  ->  r22         "CSE #04: aggressive"
+;  V26 cse14        [V26,T14] (  7, 145   )     int  ->  r27         "CSE #05: aggressive"
+;  V27 cse15        [V27,T09] (  7, 220   )     int  ->  r29         hoist "CSE #07: aggressive"
+;  V28 cse16        [V28,T16] ( 10, 123   )     int  ->  r19         "CSE #01: aggressive"
+;  V29 cse17        [V29,T15] ( 10, 138   )     int  ->  r21         hoist "CSE #03: aggressive"
+;  V30 cse18        [V30,T13] ( 10, 153   )     int  ->  r23         "CSE #09: aggressive"
+;  V31 cse19        [V31,T08] (  8, 288   )     int  ->  r31         "CSE #12: aggressive"
 ;
-; Lcl frame size = 104
+; Lcl frame size = 48
 
 G_M36001_IG01:        ; bbWeight=0.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-       push     r15
-       push     r14
-       push     r13
-       push     r12
-       push     rdi
-       push     rsi
        push     rbp
-       push     rbx
-       sub      rsp, 104
-						;; size=16 bbWeight=0.25 PerfScore 2.06
+       sub      rsp, 48
+						;; size=8 bbWeight=0.25 PerfScore 0.31
 G_M36001_IG02:        ; bbWeight=0.25, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref
        ; gcrRegs +[rcx rdx]
-       xor      eax, eax
-       xor      r8d, r8d
-						;; size=5 bbWeight=0.25 PerfScore 0.12
+       xor      r16, r16
+       xor      r17, r17
+						;; size=8 bbWeight=0.25 PerfScore 0.12
 G_M36001_IG03:        ; bbWeight=1, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref
-       xor      r10d, r10d
-       mov      r9d, dword ptr [rdx+0x18]
-       mov      r11d, r8d
-       sub      r11d, r9d
-       mov      ebx, dword ptr [rdx+0x10]
-						;; size=16 bbWeight=1 PerfScore 4.75
+       xor      r18, r18
+       mov      r19, dword ptr [rdx+0x18]
+       mov      r20, r17
+       sub      r20, r19
+       mov      r21, dword ptr [rdx+0x10]
+						;; size=22 bbWeight=1 PerfScore 4.75
 G_M36001_IG04:        ; bbWeight=16, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref, isz
-       mov      esi, r11d
-       cmp      esi, ebx
-       jae      G_M36001_IG59
-       mov      edi, dword ptr [rdx+0x14]
-       imul     esi, edi
-       mov      ebp, dword ptr [rdx+0x1C]
-       mov      r14d, r10d
-       sub      r14d, ebp
-       cmp      r14d, edi
-       jae      G_M36001_IG59
-       add      r14d, esi
-       mov      esi, r14d
-       mov      word  ptr [rdx+2*rsi+0x20], 0
-       inc      r10d
-       movsx    r10, r10w
-       cmp      r10d, 101
+       mov      ebp, r20
+       cmp      ebp, r21
+       jae      G_M36001_IG49
+       mov      r22, dword ptr [rdx+0x14]
+       mov      eax, r22
+       imul     ebp, eax
+       mov      r23, dword ptr [rdx+0x1C]
+       mov      r24, r18
+       sub      r24, r23
+       cmp      r24, r22
+       jae      G_M36001_IG49
+       add      ebp, r24
+       mov      eax, ebp
+       mov      word  ptr [rdx+2*rax+0x20], 0
+       inc      r18
+       movsx    r18, r18w
+       cmp      r18, 101
        jl       SHORT G_M36001_IG04
-						;; size=61 bbWeight=16 PerfScore 200.00
+						;; size=81 bbWeight=16 PerfScore 204.00
 G_M36001_IG05:        ; bbWeight=4, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref, isz
-       inc      r8d
-       movsx    r8, r8w
-       cmp      r8d, 101
-       mov      dword ptr [rsp+0x2C], r9d
+       inc      r17
+       movsx    r17, r17w
+       cmp      r17, 101
        jl       SHORT G_M36001_IG03
-						;; size=18 bbWeight=4 PerfScore 11.00
+						;; size=15 bbWeight=4 PerfScore 7.00
 G_M36001_IG06:        ; bbWeight=1, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref
-       xor      r11d, r11d
-       mov      dword ptr [rsp+0x5C], r11d
-						;; size=8 bbWeight=1 PerfScore 1.25
+       xor      r20, r20
+						;; size=4 bbWeight=1 PerfScore 0.25
 G_M36001_IG07:        ; bbWeight=1, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref
-       xor      esi, esi
-       mov      dword ptr [rsp+0x60], esi
-       xor      r8d, r8d
-						;; size=9 bbWeight=1 PerfScore 1.50
+       xor      r25, r25
+       xor      r17, r17
+						;; size=8 bbWeight=1 PerfScore 0.50
 G_M36001_IG08:        ; bbWeight=4, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref
-       xor      r14d, r14d
-       xor      r10d, r10d
-       mov      r15d, dword ptr [rcx+0x18]
-       mov      r13d, r8d
-       sub      r13d, r15d
-       mov      dword ptr [rsp+0x44], r13d
-       mov      r12d, dword ptr [rcx+0x10]
-						;; size=25 bbWeight=4 PerfScore 24.00
+       xor      r26, r26
+       xor      r18, r18
+       mov      r27, dword ptr [rcx+0x18]
+       mov      r28, r17
+       sub      r28, r27
+       mov      r29, dword ptr [rcx+0x10]
+						;; size=26 bbWeight=4 PerfScore 20.00
 G_M36001_IG09:        ; bbWeight=64, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref, isz
-       mov      r11d, r13d
-       cmp      r11d, r12d
-       jae      G_M36001_IG59
-       mov      esi, dword ptr [rcx+0x14]
-       imul     r11d, esi
-       mov      r13d, dword ptr [rcx+0x1C]
-       mov      r9d, r10d
-       sub      r9d, r13d
-       cmp      r9d, esi
-       jae      G_M36001_IG59
-       add      r11d, r9d
-       mov      r9d, r11d
-       cmp      dword ptr [rcx+4*r9+0x20], 0
+       mov      ebp, r28
+       cmp      ebp, r29
+       jae      G_M36001_IG49
+       mov      r30, dword ptr [rcx+0x14]
+       mov      eax, r30
+       imul     eax, ebp
+       mov      r31, dword ptr [rcx+0x1C]
+       mov      r24, r18
+       sub      r24, r31
+       cmp      r24, r30
+       jae      G_M36001_IG49
+       add      eax, r24
+       cmp      dword ptr [rcx+4*rax+0x20], 0
        jne      SHORT G_M36001_IG11
-						;; size=52 bbWeight=64 PerfScore 880.00
+						;; size=62 bbWeight=64 PerfScore 880.00
 G_M36001_IG10:        ; bbWeight=32, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref, isz
-       mov      r11d, r8d
-       sub      r11d, dword ptr [rsp+0x2C]
-       cmp      r11d, ebx
-       jae      G_M36001_IG59
-       imul     r11d, edi
-       mov      r9d, r10d
-       sub      r9d, ebp
-       cmp      r9d, edi
-       jae      G_M36001_IG59
-       add      r11d, r9d
-       mov      r9d, r11d
-       cmp      word  ptr [rdx+2*r9+0x20], 0
+       mov      ebp, r17
+       sub      ebp, r19
+       cmp      ebp, r21
+       jae      G_M36001_IG49
+       mov      eax, r22
+       imul     eax, ebp
+       mov      r24, r18
+       sub      r24, r23
+       cmp      r24, r22
+       jae      G_M36001_IG49
+       add      eax, r24
+       cmp      word  ptr [rdx+2*rax+0x20], 0
        jne      SHORT G_M36001_IG11
-       inc      r14d
-       movsx    r14, r14w
-       mov      eax, r10d
-						;; size=61 bbWeight=32 PerfScore 400.00
+       inc      r26
+       movsx    r26, r26w
+       mov      r16, r18
+						;; size=69 bbWeight=32 PerfScore 344.00
 G_M36001_IG11:        ; bbWeight=64, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref
-       inc      r10d
-       movsx    r10, r10w
...

Update comments.

Merge the REX2 changes into the original legacy emit path

bug fix: Set REX2.W with correct mask code.

register encoding and prefix emitting logics.

Add REX2 prefix emit logic

bug fixes

Add Stress mode for REX2 encoding and some bug fixes

resolve comments:
1. add assertion check for UD opcodes.
2. add checks for EGPRs.

Add REX2 to emitOutputAM, and let LEA to be REX2 compatible.

Add REX2.X encoding for SIB byte

But fixes: add REX2 prefix on the path in RI where MOV is specially handled.

Enable REX2 encoding for `movups`

fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT

legacy map index-er

bug fixes

some clean-up

Adding initial APX unit testing path.

Adding a coredistools dll that has LLVM APX disasm capability.

It must be coppied into a CORE_ROOT manually.

clean up work for REX2

narrow the REX2 scope to `sub` only

some clean up based on the comments.

bug fix

resolve comment
 - SV path is mostly for debugging purposes

Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
Refactor REX2 encoding stress logics.
(this will have side effect that the estimated code will go up and mismatch with actual code size.)
@DeepakRajendrakumaran
Copy link
Contributor Author

DeepakRajendrakumaran commented Nov 20, 2024

@kunalspathak

Now that CPUID changes have merged, ran superpmi TP and I have a problem

image

Ran the scripts shared by Kunal a while back to debug why this is happening

The following is for libraries

Base: 798636572986, Diff: 837269651550, +4.8374%

?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z                                                                                            : 7483341082 : +105.48%  : 15.71% : +0.9370%
?allocateRegistersMinimal@LinearScan@@QEAAXXZ                                                                                                              : 5166096591 : +51.73%   : 10.84% : +0.6469%
?allocateRegisters@LinearScan@@QEAAXXZ                                                                                                                     : 3501980510 : +32.45%   : 7.35%  : +0.4385%
?processKills@LinearScan@@AEAAXPEAVRefPosition@@@Z                                                                                                         : 2761837171 : +53.97%   : 5.80%  : +0.3458%
?genConsumeReg@CodeGen@@IEAA?AW4_regNumber_enum@@PEAUGenTree@@@Z                                                                                           : 2114364155 : +56.59%   : 4.44%  : +0.2647%
?TakesRex2Prefix@emitter@@QEBA_NPEBUinstrDesc@1@@Z                                                                                                         : 1652787168 : NA        : 3.47%  : +0.2070%
?freeRegisters@LinearScan@@AEAAXUregMaskTP@@@Z                                                                                                             : 1645251557 : +62.83%   : 3.45%  : +0.2060%
?mergeRegisterPreferences@Interval@@QEAAX_K@Z                                                                                                              : 1424229795 : +2637.42% : 2.99%  : +0.1783%
?AddX86PrefixIfNeeded@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                                      : 1332532027 : NA        : 2.80%  : +0.1669%
?AddX86PrefixIfNeededAndNotPresent@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                         : 1247317388 : NA        : 2.62%  : +0.1562%
?gcMarkRegPtrVal@GCInfo@@QEAAXW4_regNumber_enum@@W4var_types@@@Z                                                                                           : 1236233831 : +174.95%  : 2.59%  : +0.1548%
??$select@$0A@@RegisterSelection@LinearScan@@QEAA_KPEAVInterval@@PEAVRefPosition@@@Z                                                                       : 1044477092 : +10.11%   : 2.19%  : +0.1308%
?assignPhysReg@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z                                                                                            : 749700826  : +42.11%   : 1.57%  : +0.0939%
?genCodeForBBlist@CodeGen@@IEAAXXZ                                                                                                                         : 707125092  : +11.03%   : 1.48%  : +0.0885%
?allocateRegMinimal@LinearScan@@AEAA?AW4_regNumber_enum@@PEAVInterval@@PEAVRefPosition@@@Z                                                                 : 704654429  : +15.88%   : 1.48%  : +0.0882%
?buildKillPositionsForNode@LinearScan@@AEAA_NPEAUGenTree@@IUregMaskTP@@@Z                                                                                  : 658845785  : +64.48%   : 1.38%  : +0.0825%
?emitOutputInstr@emitter@@IEAA_KPEAUinsGroup@@PEAUinstrDesc@1@PEAPEAE@Z                                                                                    : 658192653  : +9.65%    : 1.38%  : +0.0824%
?emitGCregDeadUpd@emitter@@QEAAXW4_regNumber_enum@@PEAE@Z                                                                                                  : 629879757  : +107.83%  : 1.32%  : +0.0789%
?updateAssignedInterval@LinearScan@@AEAAXPEAVRegRecord@@PEAVInterval@@@Z                                                                                   : 546122060  : +24.24%   : 1.15%  : +0.0684%
?emitStackPopLargeStk@emitter@@QEAAXPEAE_NEI@Z                                                                                                             : 525848563  : +104.66%  : 1.10%  : +0.0658%
?emitGetAdjustedSize@emitter@@QEBAIPEAUinstrDesc@1@_K@Z                                                                                                    : 487696755  : +31.37%   : 1.02%  : +0.0611%
?emitGCregLiveUpd@emitter@@QEAAXW4GCtype@@W4_regNumber_enum@@PEAE@Z                                                                                        : 451135285  : +59.41%   : 0.95%  : +0.0565%
?buildPhysRegRecords@LinearScan@@AEAAXXZ                                                                                                                   : 417375644  : +52.32%   : 0.88%  : +0.0523%
?AddRexWPrefix@emitter@@QEAA_KPEBUinstrDesc@1@_K@Z                                                                                                         : 337122934  : +62.86%   : 0.71%  : +0.0422%
?TakesEvexPrefix@emitter@@QEBA_NPEBUinstrDesc@1@@Z                                                                                                         : 326871135  : +13.69%   : 0.69%  : +0.0409%
?newRefPosition@LinearScan@@AEAAPEAVRefPosition@@PEAVInterval@@IW4RefType@@PEAUGenTree@@_KI@Z                                                              : 289859613  : +3.27%    : 0.61%  : +0.0363%
??0LinearScan@@QEAA@PEAVCompiler@@@Z                                                                                                                       : 287558884  : +56.87%   : 0.60%  : +0.0360%
?emitOutputRexOrSimdPrefixIfNeeded@emitter@@QEAAIW4instruction@@PEAEAEA_K@Z                                                                                : 276256843  : +10.64%   : 0.58%  : +0.0346%
?emitIns_Call@emitter@@QEAAXW4EmitCallType@1@PEAUCORINFO_METHOD_STRUCT_@@PEAX_JW4emitAttr@@AEBQEA_KUregMaskTP@@6AEBVDebugInfo@@W4_regNumber_enum@@8I3_N9@Z : 251568991  : +17.79%   : 0.53%  : +0.0315%
?resetAllRegistersState@LinearScan@@AEAAXXZ                                                                                                                : 250671960  : +48.42%   : 0.53%  : +0.0314%
?emitUpdateLiveGCregs@emitter@@QEAAXW4GCtype@@UregMaskTP@@PEAE@Z                                                                                           : 236180536  : +61.03%   : 0.50%  : +0.0296%
?BuildNode@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                                : 211945171  : +3.63%    : 0.44%  : +0.0265%
?genUpdateRegLife@CodeGenInterface@@QEAAXPEBVLclVarDsc@@_N1@Z                                                                                              : 208334297  : +146.29%  : 0.44%  : +0.0261%
?unassignPhysReg@LinearScan@@AEAAXPEAVRegRecord@@PEAVRefPosition@@@Z                                                                                       : 204285611  : +8.70%    : 0.43%  : +0.0256%
?BuildCall@LinearScan@@AEAAHPEAUGenTreeCall@@@Z                                                                                                            : 201972715  : +19.34%   : 0.42%  : +0.0253%
?genProduceReg@CodeGen@@IEAAXPEAUGenTree@@@Z                                                                                                               : 156386903  : +5.43%    : 0.33%  : +0.0196%
?emitGetGCRegsSavedOrModified@emitter@@QEAA?AUregMaskTP@@PEAUCORINFO_METHOD_STRUCT_@@@Z                                                                    : 155613852  : NA        : 0.33%  : +0.0195%
??$resolveRegisters@$00@LinearScan@@QEAAXXZ                                                                                                                : 154302371  : +4.84%    : 0.32%  : +0.0193%
??$compChangeLife@$00@Compiler@@QEAAXAEBQEA_K@Z                                                                                                            : 150051997  : +21.15%   : 0.31%  : +0.0188%
?genPushCalleeSavedRegisters@CodeGen@@IEAAXXZ                                                                                                              : 136488370  : +268.86%  : 0.29%  : +0.0171%
?BuildRMWUses@LinearScan@@AEAAHPEAUGenTree@@00_K1@Z                                                                                                        : 119460162  : NA        : 0.25%  : +0.0150%
?emitInsSize@emitter@@QEAAIPEAUinstrDesc@1@_K_N@Z                                                                                                          : 113904280  : +11.97%   : 0.24%  : +0.0143%
??$resolveRegisters@$0A@@LinearScan@@QEAAXXZ                                                                                                               : 99510485   : +3.12%    : 0.21%  : +0.0125%
?BuildIndir@LinearScan@@AEAAHPEAUGenTreeIndir@@@Z                                                                                                          : 96091030   : +48.43%   : 0.20%  : +0.0120%
?compInitOptions@Compiler@@IEAAXPEAVJitFlags@@@Z                                                                                                           : 89279923   : +9.31%    : 0.19%  : +0.0112%
?instGen_Set_Reg_To_Imm@CodeGen@@QEAAXW4emitAttr@@W4_regNumber_enum@@_JW4insFlags@@@Z                                                                      : 78050532   : +26.93%   : 0.16%  : +0.0098%
?resolveLocalRef@LinearScan@@AEAAXPEAUBasicBlock@@PEAUGenTreeLclVar@@PEAVRefPosition@@@Z                                                                   : 74859133   : +3.74%    : 0.16%  : +0.0094%
??$allocateReg@$0A@@LinearScan@@AEAA?AW4_regNumber_enum@@PEAVInterval@@PEAVRefPosition@@@Z                                                                 : 74540254   : +6.75%    : 0.16%  : +0.0093%
memset                                                                                                                                                     : 73679442   : +1.13%    : 0.15%  : +0.0092%
?emitOutputRI@emitter@@QEAAPEAEPEAEPEAUinstrDesc@1@@Z                                                                                                      : 67994420   : +6.26%    : 0.14%  : +0.0085%
?insEncodeReg012@emitter@@QEAAIPEBUinstrDesc@1@W4_regNumber_enum@@W4emitAttr@@PEA_K@Z                                                                      : 65961952   : +6.54%    : 0.14%  : +0.0083%
?genSetRegToConst@CodeGen@@IEAAXW4_regNumber_enum@@W4var_types@@PEAUGenTree@@@Z                                                                            : 63572129   : +16.91%   : 0.13%  : +0.0080%
?emitInsSizeSV@emitter@@QEAAIPEAUinstrDesc@1@_KHH@Z                                                                                                        : 58355992   : +5.91%    : 0.12%  : +0.0073%
?BuildDefWithKills@LinearScan@@AEAAXPEAUGenTree@@H_KUregMaskTP@@@Z                                                                                         : 56553707   : +40.78%   : 0.12%  : +0.0071%
?BuildCast@LinearScan@@AEAAHPEAUGenTreeCast@@@Z                                                                                                            : 56461244   : NA        : 0.12%  : +0.0071%
?BuildStoreLocDef@LinearScan@@AEAAXPEAUGenTreeLclVarCommon@@PEAVLclVarDsc@@PEAVRefPosition@@H@Z                                                            : 53688042   : +14.79%   : 0.11%  : +0.0067%
?emitOutputRR@emitter@@QEAAPEAEPEAEPEAUinstrDesc@1@@Z                                                                                                      : 53279956   : +3.55%    : 0.11%  : +0.0067%
?genCallInstruction@CodeGen@@IEAAXPEAUGenTreeCall@@@Z                                                                                                      : 50084108   : +5.82%    : 0.11%  : +0.0063%
?emitHandleMemOp@emitter@@AEAAXPEAUGenTreeIndir@@PEAUinstrDesc@1@W4insFormat@1@W4instruction@@@Z                                                           : -58626864  : -10.34%   : 0.12%  : -0.0073%
?getMatchingConstants@LinearScan@@AEAA_K_KPEAVInterval@@PEAVRefPosition@@@Z                                                                                : -79107557  : -100.00%  : 0.17%  : -0.0099%
?emitSizeOfInsDsc_CNS@emitter@@AEBA_KPEAUinstrDesc@1@@Z                                                                                                    : -90499395  : -98.48%   : 0.19%  : -0.0113%
?BuildRMWUses@LinearScan@@AEAAHPEAUGenTree@@00_K@Z                                                                                                         : -120949346 : -100.00%  : 0.25%  : -0.0151%
?BuildGCWriteBarrier@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                      : -146449406 : -100.00%  : 0.31%  : -0.0183%
?associateRefPosWithInterval@LinearScan@@AEAAXPEAVRefPosition@@@Z                                                                                          : -188074386 : -3.81%    : 0.39%  : -0.0235%
?addKillForRegs@LinearScan@@AEAAXUregMaskTP@@I@Z                                                                                                           : -213435792 : -100.00%  : 0.45%  : -0.0267%
?BuildSimple@LinearScan@@AEAAHPEAUGenTree@@@Z                                                                                                              : -345016623 : -99.92%   : 0.72%  : -0.0432%
?genCodeForTreeNode@CodeGen@@IEAAXPEAUGenTree@@@Z                                                                                                          : -414160388 : -6.66%    : 0.87%  : -0.0519%
?updateRegisterPreferences@Interval@@QEAAX_K@Z                                                                                                             : -580317174 : -100.00%  : 1.22%  : -0.0727%
?AddSimdPrefixIfNeededAndNotPresent@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                        : -885312893 : -100.00%  : 1.86%  : -0.1109%
?AddSimdPrefixIfNeeded@emitter@@QEAA_KPEBUinstrDesc@1@_KW4emitAttr@@@Z                                                                                     : -984986225 : -100.00%  : 2.07%  : -0.1233%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants