Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] [APX] Enable additional General Purpose Registers. #108799

Draft
wants to merge 69 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
5acc239
script-gen logics.
Ruihan-Yin Jul 2, 2024
96cd7c7
CPUID check logics for APX.
Ruihan-Yin Jul 2, 2024
ff60d96
XSTATE updates.
Ruihan-Yin Jul 2, 2024
d15096f
revert debug codes.
Ruihan-Yin Jul 8, 2024
5797645
Ruihan: POC with REX2
Ruihan-Yin Mar 25, 2024
880e980
resolve merge error.
Ruihan-Yin Jul 2, 2024
4ee360f
resolve comments
Ruihan-Yin May 17, 2024
ea8de4f
refactor register encoding for REX2
Ruihan-Yin May 20, 2024
e35549e
merge REX2 path to legacy path
Ruihan-Yin May 21, 2024
a3e1233
Enable REX2 in more instructions.
Ruihan-Yin May 30, 2024
b6d4704
Avoid repeatedly estimate the size of REX2 prefix
Ruihan-Yin Jun 3, 2024
3ec231b
Enable REX2 encoding on RI and SV path
Ruihan-Yin Jun 5, 2024
75a2ed6
Add rex2 support to rotate and shift.
Ruihan-Yin Jun 6, 2024
71f57ff
CR session.
Ruihan-Yin Jun 7, 2024
58cee8c
Testing infra updates: assert REX2 is enabled.
Ruihan-Yin Jun 11, 2024
fdcd651
revert rcl_N and rcr_N, tp and latency data for these instructions is…
Ruihan-Yin Jun 11, 2024
4749cac
partially enable REX2 on emitOutputAM, case covered: R_AR and AR_R.
Ruihan-Yin Jun 12, 2024
8b61ac2
Adding unit tests.
Ruihan-Yin Jun 13, 2024
4593ca9
push, pop, inc, dec, neg, not, xadd, shld, shrd, cmpxchg, setcc, bswap.
Ruihan-Yin Jun 26, 2024
78c5a3b
bug fix for bswap
Ruihan-Yin Jun 27, 2024
a3688c1
bt
Ruihan-Yin Jun 28, 2024
8238544
xchg, idiv
Ruihan-Yin Jul 1, 2024
e7a0beb
Make sure add REX2 prefix if register encoding for EGPRs are being ca…
Ruihan-Yin Jul 2, 2024
60de08a
Ensure code size is correctly computed in R_R_I path.
Ruihan-Yin Jul 8, 2024
c910bf8
clean up
Ruihan-Yin Jul 9, 2024
c6856d3
Change all AddSimdPrefix to AddX86Prefix
Ruihan-Yin Jul 15, 2024
bb70d8b
div, mulEAX
Ruihan-Yin Jul 16, 2024
957048d
filter out test from REX2 encoding when using ACC form.
Ruihan-Yin Jul 19, 2024
3389a46
Make sure REX prefix will not be added when emitting with REX2.
Ruihan-Yin Jul 24, 2024
59353b7
resolve comments.
Ruihan-Yin Aug 5, 2024
9f8f67c
Add APX doc.
Ruihan-Yin Aug 28, 2024
a68865c
script-gen changes.
Ruihan-Yin Aug 28, 2024
6dfb92b
XSTATE changes
Ruihan-Yin Aug 28, 2024
b998cdf
hand-written CPUID check part
Ruihan-Yin Aug 28, 2024
3446e28
fix
Ruihan-Yin Aug 28, 2024
9bc008a
Fix merge error.
Ruihan-Yin Aug 28, 2024
44163d7
bug fixes
Ruihan-Yin Aug 28, 2024
600abd0
Bug fix
Ruihan-Yin Aug 29, 2024
81b47c6
bug fix
Ruihan-Yin Aug 29, 2024
d516feb
resolve commnets.
Ruihan-Yin Sep 3, 2024
42f1729
Merge remote-tracking branch 'origin/main' into apx-cpuid-sept
Ruihan-Yin Oct 23, 2024
e8f2b2d
Merge branch 'apx-cpuid-sept' into apx-rex2-oct
Ruihan-Yin Oct 23, 2024
640c474
make sure the APX debug knob is only available under debug build.
Ruihan-Yin Oct 24, 2024
d90bf49
Merge remote-tracking branch 'origin/main' into apx-cpuid-sept
Ruihan-Yin Nov 4, 2024
79b23c8
re-generate the ISA changes to propagate the changes in ThunkGenerator.
Ruihan-Yin Nov 4, 2024
8e1ba4a
resolve comments
Ruihan-Yin Nov 8, 2024
ea8949f
use byte code for EGPR XSAVE logics.
Ruihan-Yin Nov 8, 2024
597e797
resolve comments.
Ruihan-Yin Nov 8, 2024
8feeac9
Merge branch 'apx-cpuid-sept' into apx-rex2-oct
Ruihan-Yin Nov 12, 2024
af9c002
clean up some out-dated code.
Ruihan-Yin Nov 12, 2024
b174e27
enable movsxd
Ruihan-Yin Nov 12, 2024
117c401
Merge branch 'main' into apx-CPUID-XSAVE
tannergooding Nov 13, 2024
6b3e8c0
Enable "Call"
Ruihan-Yin Nov 13, 2024
8cc7e8e
Merge remote-tracking branch 'origin/main' into apx-CPUID-XSAVE
Ruihan-Yin Nov 14, 2024
b4868a0
Enable "JMP"
Ruihan-Yin Nov 15, 2024
c77bdd9
Merge remote-tracking branch 'Ruihan/apx-CPUID-XSAVE' into apx-rex2-oct
Ruihan-Yin Nov 18, 2024
51e95b3
resolve merge errors
Ruihan-Yin Nov 18, 2024
03ea122
Merge remote-tracking branch 'origin/main' into apx-rex2-oct
Ruihan-Yin Nov 18, 2024
641681a
formatting
Ruihan-Yin Nov 18, 2024
f8e720a
remote coredistools.dll for internal tests only
Ruihan-Yin Nov 18, 2024
866253d
bug fix
Ruihan-Yin Nov 19, 2024
ae5e3ab
Enabling Additional Registers in JIT.
DeepakRajendrakumaran Oct 2, 2024
b739ebe
Adding StressMode for testing eGPRs.
DeepakRajendrakumaran Aug 7, 2024
bd1784d
Add REX2 Encoding support for eGPR.
DeepakRajendrakumaran Nov 19, 2024
703e9ac
Temporarily masking away eGPR for instructions requiring extendedEvex.
DeepakRajendrakumaran Aug 7, 2024
662de71
Possible flags to add.
DeepakRajendrakumaran Nov 19, 2024
acd912c
Adding temporary helper
DeepakRajendrakumaran Nov 19, 2024
d50aa94
Fix formatting
DeepakRajendrakumaran Nov 19, 2024
6bbccb4
Fixing type.
DeepakRajendrakumaran Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,7 @@ RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableSSE41, W("EnableSSE41")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableSSE42, W("EnableSSE42"), 1, "Allows SSE4.2+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableSSSE3, W("EnableSSSE3"), 1, "Allows SSSE3+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableX86Serialize, W("EnableX86Serialize"), 1, "Allows X86Serialize+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAPX, W("EnableAPX"), 1, "Allows APX+ features to be disabled")
#elif defined(TARGET_ARM64)
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableArm64AdvSimd, W("EnableArm64AdvSimd"), 1, "Allows Arm64 AdvSimd+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableArm64Aes, W("EnableArm64Aes"), 1, "Allows Arm64 Aes+ hardware intrinsics to be disabled")
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,7 @@ class CodeGen final : public CodeGenInterface

#if defined(TARGET_AMD64)
void genAmd64EmitterUnitTestsSse2();
void genAmd64EmitterUnitTestsApx();
#endif

#endif // defined(DEBUG)
Expand Down
6 changes: 6 additions & 0 deletions src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ void CodeGenInterface::CopyRegisterInfo()
#if defined(TARGET_AMD64)
rbmAllFloat = compiler->rbmAllFloat;
rbmFltCalleeTrash = compiler->rbmFltCalleeTrash;
rbmAllInt = compiler->rbmAllInt;
rbmIntCalleeTrash = compiler->rbmIntCalleeTrash;
#endif // TARGET_AMD64

rbmAllMask = compiler->rbmAllMask;
Expand Down Expand Up @@ -5348,6 +5350,10 @@ void CodeGen::genFnProlog()
// will be skipped.
bool initRegZeroed = false;
regMaskTP excludeMask = intRegState.rsCalleeRegArgMaskLiveIn;
#if defined(TARGET_AMD64)
// TODO-Xarch-apx : Revert. Excluding eGPR so that it's not used for non REX2 supported movs.
excludeMask = excludeMask | RBM_HIGHINT;
#endif // !defined(TARGET_AMD64)

#ifdef TARGET_ARM
// If we have a variable sized frame (compLocallocUsed is true)
Expand Down
10 changes: 10 additions & 0 deletions src/coreclr/jit/codegeninterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,16 +76,26 @@ class CodeGenInterface

#if defined(TARGET_AMD64)
regMaskTP rbmAllFloat;
regMaskTP rbmAllInt;
regMaskTP rbmFltCalleeTrash;
regMaskTP rbmIntCalleeTrash;

FORCEINLINE regMaskTP get_RBM_ALLFLOAT() const
{
return this->rbmAllFloat;
}
FORCEINLINE regMaskTP get_RBM_ALLINT() const
{
return this->rbmAllInt;
}
FORCEINLINE regMaskTP get_RBM_FLT_CALLEE_TRASH() const
{
return this->rbmFltCalleeTrash;
}
FORCEINLINE regMaskTP get_RBM_INT_CALLEE_TRASH() const
{
return this->rbmIntCalleeTrash;
}
#endif // TARGET_AMD64

#if defined(TARGET_XARCH)
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/codegenlinear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2702,6 +2702,10 @@ void CodeGen::genEmitterUnitTests()
{
genAmd64EmitterUnitTestsSse2();
}
if (unitTestSectionAll || (strstr(unitTestSection, "apx") != nullptr))
{
genAmd64EmitterUnitTestsApx();
}

#elif defined(TARGET_ARM64)
if (unitTestSectionAll || (strstr(unitTestSection, "general") != nullptr))
Expand Down
200 changes: 200 additions & 0 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1805,7 +1805,14 @@ void CodeGen::genCodeForReturnTrap(GenTreeOp* tree)
inst_JMP(EJ_je, skipLabel);

// emit the call to the EE-helper that stops for GC (or other reasons)
#if defined(TARGET_AMD64)
// TODO-Xarch-apx : Revert. Excluding eGPR so that it's not used for non REX2 supported movs. Revisit this one.
// Might not be necessary.
regNumber tmpReg = internalRegisters.GetSingle(tree, RBM_ALLINT_INIT);
#else
regNumber tmpReg = internalRegisters.GetSingle(tree, RBM_ALLINT);
#endif

assert(genIsValidIntReg(tmpReg));

genEmitHelperCall(CORINFO_HELP_STOP_FOR_GC, 0, EA_UNKNOWN, tmpReg);
Expand Down Expand Up @@ -9053,6 +9060,199 @@ void CodeGen::genAmd64EmitterUnitTestsSse2()
GetEmitter()->emitIns_R_R_R(INS_cvtsd2ss, EA_8BYTE, REG_XMM0, REG_XMM1, REG_XMM2);
}

/*****************************************************************************
* Unit tests for the APX instructions.
*/

void CodeGen::genAmd64EmitterUnitTestsApx()
{
emitter* theEmitter = GetEmitter();

genDefineTempLabel(genCreateTempLabel());

// This test suite needs REX2 enabled.
assert(theEmitter->emitComp->DoJitStressRex2Encoding());

theEmitter->emitIns_R_R(INS_add, EA_1BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_2BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_8BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_or, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_adc, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_sbb, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_and, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_sub, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_xor, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_cmp, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_test, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_bsf, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_bsr, EA_4BYTE, REG_EAX, REG_ECX);

theEmitter->emitIns_R_R(INS_cmovo, EA_4BYTE, REG_EAX, REG_ECX);

theEmitter->emitIns_Mov(INS_mov, EA_4BYTE, REG_EAX, REG_ECX, false);
theEmitter->emitIns_Mov(INS_movsx, EA_2BYTE, REG_EAX, REG_ECX, false);
theEmitter->emitIns_Mov(INS_movzx, EA_2BYTE, REG_EAX, REG_ECX, false);

theEmitter->emitIns_R_R(INS_popcnt, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_lzcnt, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_tzcnt, EA_4BYTE, REG_EAX, REG_ECX);

theEmitter->emitIns_R_I(INS_add, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_add, EA_2BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_or, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_adc, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_sbb, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_and, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_sub, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_xor, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_cmp, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_test, EA_4BYTE, REG_EAX, 0x05);

theEmitter->emitIns_R_I(INS_mov, EA_4BYTE, REG_EAX, 0xE0);

// JIT tend to compress imm64 to imm32 if higher half is all-zero, make sure this test checks the path for imm64.
theEmitter->emitIns_R_I(INS_mov, EA_8BYTE, REG_RAX, 0xFFFF000000000000);

// shf reg, cl
theEmitter->emitIns_R(INS_rol, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_ror, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcl, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcr, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shl, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shr, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_sar, EA_4BYTE, REG_EAX);

// shf reg, 1
theEmitter->emitIns_R(INS_rol_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_ror_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcl_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcr_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shl_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shr_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_sar_1, EA_4BYTE, REG_EAX);

// shf reg, imm8
theEmitter->emitIns_R_I(INS_shl_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_shr_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_sar_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_rol_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_ror_N, EA_4BYTE, REG_ECX, 0x05);
// TODO-xarch-apx: not enable these 2 for now.
// theEmitter->emitIns_R_I(INS_rcl_N, EA_4BYTE, REG_ECX, 0x05);
// theEmitter->emitIns_R_I(INS_rcr_N, EA_4BYTE, REG_ECX, 0x05);

theEmitter->emitIns_R_AR(INS_lea, EA_4BYTE, REG_ECX, REG_EAX, 4);

theEmitter->emitIns_R_AR(INS_mov, EA_1BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_mov, EA_2BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_mov, EA_4BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_mov, EA_8BYTE, REG_ECX, REG_EAX, 4);

theEmitter->emitIns_R_AR(INS_add, EA_1BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_add, EA_2BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_add, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_add, EA_8BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_or, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_adc, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_sbb, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_and, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_sub, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_xor, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_cmp, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_test, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_bsf, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_bsr, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_popcnt, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_lzcnt, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_tzcnt, EA_4BYTE, REG_EAX, REG_ECX, 4);

theEmitter->emitIns_AR_R(INS_add, EA_1BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_add, EA_2BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_add, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_add, EA_8BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_or, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_adc, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_sbb, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_and, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_sub, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_xor, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_cmp, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_test, EA_4BYTE, REG_EAX, REG_ECX, 4);

theEmitter->emitIns_R_AR(INS_movsx, EA_2BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_movzx, EA_2BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_cmovo, EA_4BYTE, REG_EAX, REG_ECX, 4);

theEmitter->emitIns_R_S(INS_add, EA_1BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_add, EA_2BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_add, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_add, EA_8BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_or, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_adc, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_sbb, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_and, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_sub, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_xor, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_cmp, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_test, EA_4BYTE, REG_EAX, 0, 0);

theEmitter->emitIns_S_I(INS_shl_N, EA_4BYTE, 0, 0, 4);
theEmitter->emitIns_S(INS_shl_1, EA_4BYTE, 0, 4);

// theEmitter->emitIns_R_S(INS_movsx, EA_2BYTE, REG_ECX, 1, 2);
// theEmitter->emitIns_R_S(INS_movzx, EA_2BYTE, REG_EAX, 1, 2);
theEmitter->emitIns_R_S(INS_cmovo, EA_4BYTE, REG_EAX, 1, 2);

theEmitter->emitIns_R(INS_pop, EA_PTRSIZE, REG_EAX);
theEmitter->emitIns_R(INS_push, EA_PTRSIZE, REG_EAX);
theEmitter->emitIns_R(INS_pop_hide, EA_PTRSIZE, REG_EAX);
theEmitter->emitIns_R(INS_push_hide, EA_PTRSIZE, REG_EAX);

theEmitter->emitIns_S(INS_pop, EA_PTRSIZE, 1, 2);
theEmitter->emitIns_I(INS_push, EA_PTRSIZE, 50);
// TODO-XArch-apx: figure out a way to test emitIns_A, which will require a GenTreeIndir* input.

theEmitter->emitIns_R(INS_inc, EA_4BYTE, REG_EAX);
theEmitter->emitIns_AR(INS_inc, EA_2BYTE, REG_EAX, 2);
theEmitter->emitIns_S(INS_inc, EA_2BYTE, 1, 2);
theEmitter->emitIns_R(INS_dec, EA_4BYTE, REG_EAX);
theEmitter->emitIns_AR(INS_dec, EA_2BYTE, REG_EAX, 2);
theEmitter->emitIns_S(INS_dec, EA_2BYTE, 1, 2);

theEmitter->emitIns_R(INS_neg, EA_2BYTE, REG_EAX);
theEmitter->emitIns_S(INS_neg, EA_2BYTE, 1, 2);
theEmitter->emitIns_R(INS_not, EA_2BYTE, REG_EAX);
theEmitter->emitIns_S(INS_not, EA_2BYTE, 1, 2);

// TODO-XArch-apx: xadd does not have RM opcode, made it cannot be encoded with emitIns_R_R.
theEmitter->emitIns_AR_R(INS_xadd, EA_4BYTE, REG_EAX, REG_EDX, 2);
theEmitter->emitIns_S_R(INS_xadd, EA_2BYTE, REG_EAX, 1, 2);

theEmitter->emitIns_R_R_I(INS_shld, EA_4BYTE, REG_EAX, REG_ECX, 5);
theEmitter->emitIns_R_R_I(INS_shrd, EA_2BYTE, REG_EAX, REG_ECX, 5);
// TODO-XArch-apx: S_R_I path only accepts SEE or VEX instructions,
// so I assuem shld/shrd will not be taking the first argument from stack.
// theEmitter->emitIns_S_R_I(INS_shld, EA_2BYTE, 1, 2, REG_EAX, 5);
// theEmitter->emitIns_S_R_I(INS_shrd, EA_2BYTE, 1, 2, REG_EAX, 5);

theEmitter->emitIns_AR_R(INS_cmpxchg, EA_2BYTE, REG_EAX, REG_EDX, 2);

theEmitter->emitIns_R(INS_seto, EA_1BYTE, REG_EDX);

theEmitter->emitIns_R(INS_bswap, EA_8BYTE, REG_EDX);

// INS_bt only has reg-to-reg form.
theEmitter->emitIns_R_R(INS_bt, EA_2BYTE, REG_EAX, REG_EDX);

theEmitter->emitIns_R(INS_idiv, EA_8BYTE, REG_EDX);

theEmitter->emitIns_R_R(INS_xchg, EA_8BYTE, REG_EAX, REG_EDX);

theEmitter->emitIns_R(INS_div, EA_8BYTE, REG_EDX);
theEmitter->emitIns_R(INS_mulEAX, EA_8BYTE, REG_EDX);
}

#endif // defined(DEBUG) && defined(TARGET_AMD64)

#ifdef PROFILING_SUPPORTED
Expand Down
23 changes: 23 additions & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2297,6 +2297,13 @@ void Compiler::compSetProcessor()
codeGen->GetEmitter()->SetUseEvexEncoding(true);
// TODO-XArch-AVX512 : Revisit other flags to be set once avx512 instructions are added.
}
if (canUseRex2Encoding() || DoJitStressRex2Encoding())
{
// TODO-Xarch-apx:
// At this stage, since no machine will pass the CPUID check for APX, we need a special stress mode that
// enables REX2 on incompatible platform, `DoJitStressRex2Encoding` is expected to be removed eventually.
codeGen->GetEmitter()->SetUseRex2Encoding(true);
}
}
#endif // TARGET_XARCH
}
Expand Down Expand Up @@ -3387,12 +3394,23 @@ void Compiler::compInitOptions(JitFlags* jitFlags)
rbmFltCalleeTrash = RBM_FLT_CALLEE_TRASH_INIT;
cntCalleeTrashFloat = CNT_CALLEE_TRASH_FLOAT_INIT;

rbmAllInt = RBM_ALLINT_INIT;
rbmIntCalleeTrash = RBM_INT_CALLEE_TRASH_INIT;
cntCalleeTrashInt = CNT_CALLEE_TRASH_INT_INIT;

if (canUseEvexEncoding())
{
rbmAllFloat |= RBM_HIGHFLOAT;
rbmFltCalleeTrash |= RBM_HIGHFLOAT;
cntCalleeTrashFloat += CNT_CALLEE_TRASH_HIGHFLOAT;
}

if (canUseApxEncodings())
{
rbmAllInt |= RBM_HIGHINT;
rbmIntCalleeTrash |= RBM_HIGHINT;
cntCalleeTrashInt += CNT_CALLEE_TRASH_HIGHINT;
}
#endif // TARGET_AMD64

#if defined(TARGET_XARCH)
Expand Down Expand Up @@ -6235,6 +6253,11 @@ int Compiler::compCompile(CORINFO_MODULE_HANDLE classPtr,
instructionSetFlags.AddInstructionSet(InstructionSet_AVX10v1_V512);
instructionSetFlags.AddInstructionSet(InstructionSet_EVEX);
}

if (JitConfig.EnableAPX() != 0)
{
instructionSetFlags.AddInstructionSet(InstructionSet_APX);
}
#endif

// These calls are important and explicitly ordered to ensure that the flags are correct in
Expand Down
Loading
Loading