-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Added SVE APIs - Test*
, ExtractVector
#103739
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
…t coverage for TestAnyTrue, TestFirstTrue, TestLastTrue.
Test*
, Extract*
Test*
, ExtractVector
@dotnet/arm64-contrib @kunalspathak this is ready. During stress testing, the test failures are from predicate register callee-save stuff.
This was on the main test wrapper and it inlined all the methods. Occurs when TieredCompilation=0 and JitStress=2. Basically, value numbering is not handling TYP_MASK for ARM64. |
I don't see changes from #99743 in files |
@@ -1266,6 +1266,27 @@ GenTree* Lowering::LowerHWIntrinsic(GenTreeHWIntrinsic* node) | |||
return LowerHWIntrinsicCmpOp(node, GT_NE); | |||
} | |||
|
|||
case NI_Sve_TestAnyTrue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happening here? Is it ensuring the bool return is set set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because TestAnyTrue
is tied to the instruction ptest
, the instruction itself doesn't have a destination register; it only sets the conditional flags.
This lowering transformation effectively handles the conditional flags and returns the appropriate 'bool' value we expect. Changing TestAnyTrue
's gtType
to TYP_VOID
ensures we won't allocate a destination register for that particular node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing TestAnyTrue's gtType to TYP_VOID ensures we won't allocate a destination register for that particular node.
I think I get this part. What I am trying to understand is how we make sure that the underlying operation is doing what it is supposed to do:
- TestAnyTrue: Return true if at least one element is active and if at least one active element of op is true.
- TestFirstTrue: Return true if at least one element is active and if the first active element of op is true.
- TestLastTrue: Return true if at least one element is active and if the last active element of op is true.
Can you share the disassembly of each of those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some more understanding on the operation of the API. Added comments around that.
("SveTestTest.template", new Dictionary<string, string> { ["TestName"] = "SveTestFirstTrue_short_custom1", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "TestFirstTrue", ["MaskBaseType"] = "Int16", ["Op1Value"] = "Helpers.CreateAndFillMaskFromLastElement<Int16>(1)", ["Op2Value"] = "Helpers.CreateAndFillMaskFromSecondToLastElement<Int16>(1)", ["ValidateEntry"] = "result != true"}), | ||
("SveTestTest.template", new Dictionary<string, string> { ["TestName"] = "SveTestLastTrue_short_custom1", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "TestLastTrue", ["MaskBaseType"] = "Int16", ["Op1Value"] = "Helpers.CreateAndFillMaskFromFirstElement<Int16>(1)", ["Op2Value"] = "Helpers.CreateAndFillMaskFromSecondElement<Int16>(1)", ["ValidateEntry"] = "result != true"}), | ||
|
||
("SveExtractVectorTest.template", new Dictionary<string, string> { ["TestName"] = "SveExtractVector_Byte_1", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "ExtractVector", ["RetVectorType"] = "Vector", ["RetBaseType"] = "Byte", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "Byte", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "Byte", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetByte()", ["NextValueOp2"] = "TestLibrary.Generator.GetByte()", ["ElementIndex"] = "1", ["ValidateIterResult"] = "Helpers.ExtractVector(firstOp, secondOp, ElementIndex, i) != result[i]"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should reuse the template SveVecImmTernOpFirstArgTest
that we have for dotproduct
after you fix the Op2BaseType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This template was already based on ExtractVectorTest
which is a really similar API
src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Show resolved
Hide resolved
{ | ||
assert(targetReg != op2Reg); | ||
|
||
GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg, /* canSkip */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be MOVPRFX
because we are generating the destructive form. Please check https://docsmirror.github.io/A64/2023-06/ext_z_zi.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I didn't know about this; I'll fix it.
@@ -1266,6 +1266,27 @@ GenTree* Lowering::LowerHWIntrinsic(GenTreeHWIntrinsic* node) | |||
return LowerHWIntrinsicCmpOp(node, GT_NE); | |||
} | |||
|
|||
case NI_Sve_TestAnyTrue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing TestAnyTrue's gtType to TYP_VOID ensures we won't allocate a destination register for that particular node.
I think I get this part. What I am trying to understand is how we make sure that the underlying operation is doing what it is supposed to do:
- TestAnyTrue: Return true if at least one element is active and if at least one active element of op is true.
- TestFirstTrue: Return true if at least one element is active and if the first active element of op is true.
- TestLastTrue: Return true if at least one element is active and if the last active element of op is true.
Can you share the disassembly of each of those?
src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs
Show resolved
Hide resolved
Regarding the PredTest: Library pseudocode for aarch64/functions/sve/PredTest
// PredTest()
// ==========
bits(4) PredTest(bits(N) mask, bits(N) result, integer esize)
bit n = [FirstActive](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.FirstActive.3)(mask, result, esize);
bit z = [NoneActive](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.NoneActive.3)(mask, result, esize);
bit c = NOT [LastActive](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.LastActive.3)(mask, result, esize);
bit v = '0';
return n:z:c:v; FirstActive: // FirstActive()
// =============
bit FirstActive(bits(N) mask, bits(N) x, integer esize)
integer elements = N DIV (esize DIV 8);
for e = 0 to elements-1
if [ActivePredicateElement](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.ActivePredicateElement.3)(mask, e, esize) then
return [PredicateElement](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.PredicateElement.3)(x, e, esize);
return '0'; NoneActive: // NoneActive()
// ============
bit NoneActive(bits(N) mask, bits(N) x, integer esize)
integer elements = N DIV (esize DIV 8);
for e = 0 to elements-1
if [ActivePredicateElement](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.ActivePredicateElement.3)(mask, e, esize) && [ActivePredicateElement](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.ActivePredicateElement.3)(x, e, esize) then
return '0';
return '1'; LastActive: // LastActiveElement()
// ===================
integer LastActiveElement(bits(N) mask, integer esize)
integer elements = N DIV (esize DIV 8);
for e = elements-1 downto 0
if [ActivePredicateElement](https://developer.arm.com/documentation/ddi0602/2024-03/Shared-Pseudocode/aarch64-functions-sve?lang=en#impl-aarch64.ActivePredicateElement.3)(mask, e, esize) then return e;
return -1; |
Also, we should add |
@kunalspathak ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.ExtractVectorTest__SveExtractVector_Int64_1:Wrapper[long]():System.Numerics.Vector`1[long]:this (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T04] ( 3, 3 ) ref -> x0 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.ExtractVectorTest__SveExtractVector_Int64_1>
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V02 tmp1 [V02,T08] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V03 tmp2 [V03,T00] ( 4, 8 ) byref -> x20 single-def "Inlining Arg"
;* V04 tmp3 [V04 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V05 tmp4 [V05,T05] ( 2, 4 ) long -> x0 "Inlining Arg"
; V06 tmp5 [V06,T02] ( 3, 6 ) long -> x1 "Inlining Arg"
; V07 tmp6 [V07,T01] ( 3, 6 ) byref -> x19 single-def "Inlining Arg"
;* V08 tmp7 [V08 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V09 tmp8 [V09,T06] ( 2, 4 ) long -> x0 "Inlining Arg"
; V10 tmp9 [V10,T03] ( 3, 6 ) long -> x1 "Inlining Arg"
; V11 cse0 [V11,T07] ( 3, 3 ) byref -> x19 "CSE #01: aggressive"
;
; Lcl frame size = 0
G_M10163_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x30]!
stp d8, d9, [sp, #0x10]
stp x19, x20, [sp, #0x20]
mov fp, sp
;; size=16 bbWeight=1 PerfScore 3.50
G_M10163_IG02: ;; offset=0x0010
add x19, x0, #48
mov x20, x19
ldrsb wzr, [x20]
add x0, x20, #32
movz x1, #0x72D0 // code for System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
movk x1, #584 LSL #16
movk x1, #0x7FFD LSL #32
ldr x1, [x1]
blr x1
ldr x1, [x20, #0x18]
add x0, x0, x1
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
ptrue p0.d
ld1d { z8.d }, p0/z, [x0]
add x0, x19, #40
movz x1, #0x72D0 // code for System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
movk x1, #584 LSL #16
movk x1, #0x7FFD LSL #32
ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
ldr x1, [x19, #0x18]
add x0, x0, x1
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
ptrue p0.d
ld1d { z0.d }, p0/z, [x0]
mov v8.d[1], v9.d[0]
ext z8.b, z8.b, z0.b, #8
mov v0.16b, v8.16b
;; size=132 bbWeight=1 PerfScore 50.50
G_M10163_IG03: ;; offset=0x0094
ldp x19, x20, [sp, #0x20]
ldp d8, d9, [sp, #0x10]
ldp fp, lr, [sp], #0x30
ret lr
;; size=16 bbWeight=1 PerfScore 4.00
; Total bytes of code 164, prolog size 16, PerfScore 58.00, instruction count 41, allocated bytes for code 164 (MethodHash=f4b8d84c) for method JIT.HardwareIntrinsics.Arm._Sve.ExtractVectorTest__SveExtractVector_Int64_1:Wrapper[long]():System.Numerics.Vector`1[long]:this (FullOpts)
; ============================================================ ExtractVector using a non-constant index ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.ExtractVectorTest__SveExtractVector_Int64_1:WrapperWithIndex[long](ubyte):System.Numerics.Vector`1[long]:this (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T04] ( 3, 3 ) ref -> x0 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.ExtractVectorTest__SveExtractVector_Int64_1>
; V01 arg1 [V01,T05] ( 3, 3 ) ubyte -> x19 single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V03 tmp1 [V03,T09] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V04 tmp2 [V04,T00] ( 4, 8 ) byref -> x21 single-def "Inlining Arg"
;* V05 tmp3 [V05 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V06 tmp4 [V06,T06] ( 2, 4 ) long -> x0 "Inlining Arg"
; V07 tmp5 [V07,T02] ( 3, 6 ) long -> x1 "Inlining Arg"
; V08 tmp6 [V08,T01] ( 3, 6 ) byref -> x20 single-def "Inlining Arg"
;* V09 tmp7 [V09 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V10 tmp8 [V10,T07] ( 2, 4 ) long -> x0 "Inlining Arg"
; V11 tmp9 [V11,T03] ( 3, 6 ) long -> x1 "Inlining Arg"
; V12 cse0 [V12,T08] ( 3, 3 ) byref -> x20 "CSE #01: aggressive"
;
; Lcl frame size = 8
G_M49680_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x40]!
stp d8, d9, [sp, #0x18]
stp x19, x20, [sp, #0x28]
str x21, [sp, #0x38]
mov fp, sp
mov w19, w1
;; size=24 bbWeight=1 PerfScore 5.00
G_M49680_IG02: ;; offset=0x0018
add x20, x0, #48
mov x21, x20
ldrsb wzr, [x21]
add x0, x21, #32
movz x1, #0x72D0 // code for System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
movk x1, #582 LSL #16
movk x1, #0x7FFD LSL #32
ldr x1, [x1]
blr x1
ldr x1, [x21, #0x18]
add x0, x0, x1
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
ptrue p0.d
ld1d { z8.d }, p0/z, [x0]
add x0, x20, #40
movz x1, #0x72D0 // code for System.Runtime.InteropServices.GCHandle:AddrOfPinnedObject():long:this
movk x1, #582 LSL #16
movk x1, #0x7FFD LSL #32
ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
ldr x1, [x20, #0x18]
add x0, x0, x1
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
ptrue p0.d
ld1d { z1.d }, p0/z, [x0]
uxtb w0, w19
mov v8.d[1], v9.d[0]
mov v0.16b, v8.16b
movz x1, #0xEB08 // code for System.Runtime.Intrinsics.Arm.Sve:ExtractVector(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long],ubyte):System.Numerics.Vector`1[long]
movk x1, #0x479 LSL #16
movk x1, #0x7FFD LSL #32
ldr x1, [x1]
blr x1
;; size=152 bbWeight=1 PerfScore 54.50
G_M49680_IG03: ;; offset=0x00B0
ldr x21, [sp, #0x38]
ldp x19, x20, [sp, #0x28]
ldp d8, d9, [sp, #0x18]
ldp fp, lr, [sp], #0x40
ret lr
;; size=20 bbWeight=1 PerfScore 6.00
; Total bytes of code 196, prolog size 20, PerfScore 65.50, instruction count 49, allocated bytes for code 196 (MethodHash=b8cb3def) for method JIT.HardwareIntrinsics.Arm._Sve.ExtractVectorTest__SveExtractVector_Int64_1:WrapperWithIndex[long](ubyte):System.Numerics.Vector`1[long]:this (FullOpts)
; ============================================================ Without a constant index, in rationalization it's made into a Call, which the impl is this: ; Assembly listing for method System.Runtime.Intrinsics.Arm.Sve:ExtractVector(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long],ubyte):System.Numerics.Vector`1[long] (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T02] ( 3, 3 ) simd16 -> d0 HFA(simd16) single-def <System.Numerics.Vector`1[long]>
; V01 arg1 [V01,T03] ( 3, 3 ) simd16 -> d1 HFA(simd16) single-def <System.Numerics.Vector`1[long]>
; V02 arg2 [V02,T00] ( 3, 3 ) ubyte -> x0 single-def
;# V03 OutArgs [V03 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V04 cse0 [V04,T01] ( 3, 3 ) int -> x0 "CSE #01: aggressive"
;
; Lcl frame size = 0
G_M4290_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x10]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M4290_IG02: ;; offset=0x0008
uxtb w0, w0
cmp w0, #2
bhs G_M4290_IG06
cbnz w0, G_M4290_IG04
;; size=16 bbWeight=1 PerfScore 3.00
G_M4290_IG03: ;; offset=0x0018
ext z0.b, z0.b, z1.b, #0
b G_M4290_IG05
;; size=8 bbWeight=1 PerfScore 3.00
G_M4290_IG04: ;; offset=0x0020
ext z0.b, z0.b, z1.b, #8
;; size=4 bbWeight=1 PerfScore 2.00
G_M4290_IG05: ;; offset=0x0024
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
G_M4290_IG06: ;; offset=0x002C
bl CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
brk_windows #0
;; size=8 bbWeight=0 PerfScore 0.00
; Total bytes of code 52, prolog size 8, PerfScore 11.50, instruction count 13, allocated bytes for code 52 (MethodHash=cc42ef3d) for method System.Runtime.Intrinsics.Arm.Sve:ExtractVector(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long],ubyte):System.Numerics.Vector`1[long] (FullOpts)
; ============================================================ |
The CodeGen::HWIntrinsicImmOpHelper::HWIntrinsicImmOpHelper(CodeGen* codeGen, GenTree* immOp, GenTreeHWIntrinsic* intrin)
: codeGen(codeGen)
, endLabel(nullptr)
, nonZeroLabel(nullptr)
, branchTargetReg(REG_NA)
{
assert(codeGen != nullptr);
assert(varTypeIsIntegral(immOp));
if (immOp->isContainedIntOrIImmed())
{
nonConstImmReg = REG_NA;
immValue = (int)immOp->AsIntCon()->IconValue();
immLowerBound = immValue;
immUpperBound = immValue;
}
else |
@dotnet/arm64-contrib @kunalspathak this is ready again. Ran SveTest APIs:
|
@@ -69,6 +69,7 @@ HARDWARE_INTRINSIC(Sve, CreateWhileLessThanOrEqualMask8Bit, | |||
HARDWARE_INTRINSIC(Sve, Divide, -1, 2, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_sdiv, INS_sve_udiv, INS_sve_sdiv, INS_sve_udiv, INS_sve_fdiv, INS_sve_fdiv}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, DotProduct, -1, 3, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_sdot, INS_sve_udot, INS_sve_sdot, INS_sve_udot, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_HasRMWSemantics) | |||
HARDWARE_INTRINSIC(Sve, DotProductBySelectedScalar, -1, 4, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_sdot, INS_sve_udot, INS_sve_sdot, INS_sve_udot, INS_invalid, INS_invalid}, HW_Category_SIMDByIndexedElement, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics|HW_Flag_LowVectorOperation) | |||
HARDWARE_INTRINSIC(Sve, ExtractVector, -1, 3, true, {INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext, INS_sve_ext}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics|HW_Flag_SpecialCodeGen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be of category HW_Category_SIMDByIndexedElement
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other ExtractVector
APIs from AdvSimd do not have them marked with HW_Category_SIMDByIndexedElement
@@ -198,6 +199,9 @@ HARDWARE_INTRINSIC(Sve, StoreNarrowing, | |||
HARDWARE_INTRINSIC(Sve, StoreNonTemporal, -1, 3, true, {INS_sve_stnt1b, INS_sve_stnt1b, INS_sve_stnt1h, INS_sve_stnt1h, INS_sve_stnt1w, INS_sve_stnt1w, INS_sve_stnt1d, INS_sve_stnt1d, INS_sve_stnt1w, INS_sve_stnt1d}, HW_Category_MemoryStore, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_ExplicitMaskedOperation|HW_Flag_SpecialCodeGen|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, Subtract, -1, 2, true, {INS_sve_sub, INS_sve_sub, INS_sve_sub, INS_sve_sub, INS_sve_sub, INS_sve_sub, INS_sve_sub, INS_sve_sub, INS_sve_fsub, INS_sve_fsub}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, SubtractSaturate, -1, 2, true, {INS_sve_sqsub, INS_sve_uqsub, INS_sve_sqsub, INS_sve_uqsub, INS_sve_sqsub, INS_sve_uqsub, INS_sve_sqsub, INS_sve_uqsub, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation) | |||
HARDWARE_INTRINSIC(Sve, TestAnyTrue, -1, 2, true, {INS_sve_ptest, INS_sve_ptest, INS_sve_ptest, INS_sve_ptest, INS_sve_ptest, INS_sve_ptest, INS_sve_ptest, INS_sve_ptest, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does Test*
needs SpecialCodeGen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 things:
- Need to assert that the dst register is
REG_NA
- I need to pass
INS_OPTS_SCALABLE_B
onemitIns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense then.
|
||
private static readonly int Op1ElementCount = Unsafe.SizeOf<{Op1VectorType}<{Op1BaseType}>>() / sizeof({Op1BaseType}); | ||
private static readonly int Op2ElementCount = Unsafe.SizeOf<{Op2VectorType}<{Op2BaseType}>>() / sizeof({Op2BaseType}); | ||
private static readonly int RetElementCount = Unsafe.SizeOf<{RetVectorType}<{RetBaseType}>>() / sizeof({RetBaseType}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why this cannot be shared with existing templates?
test.RunStructFldScenario(this); | ||
} | ||
|
||
public void RunUnsupportedScenario() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. Can you please reuse the existing template? If not, can you confirm from which template this was created and what were the differences that was not letting us reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template was ExtractVectorTest.template
. The only differences are alignment and when using the Load
APIs, I needed to create a mask.
} | ||
|
||
/// Find any occurrence where both left and right and set | ||
static bool TestAnyTrue(Vector<{MaskBaseType}> left, Vector<{MaskBaseType}> right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These helpers should be part of Helpers.cs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to duplicate them 8 times and they would only be used for this specific test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you need to still fix the formatting.
Seems the format jobs are having trouble:
I'm seeing it happen on other PRs. |
Contributes to #99957
Adds:
ExtractVector
TestAnyTrue
TestFirstTrue
TestLastTrue