[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

sushgokh · 2024-11-29T06:37:11Z

…vision

In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases.

For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

Tested the patch on Neoverse-v2 with SPEC17(C/C++). No regressions observed.

…vision In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases. For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

llvmbot · 2024-11-29T06:37:46Z

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-powerpc

Author: Sushant Gokhale (sushgokh)

Changes

…vision

In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases.

For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

Patch is 93.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118055.diff

29 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+9-5)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+7-5)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+2-1)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-5)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34-22)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+3-2)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/BPF/BPFTargetTransformInfo.h (+3-1)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+3-3)
(modified) llvm/test/Analysis/CostModel/AArch64/div.ll (+141-141)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/div.ll (+7-29)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 985ca1532e0149..50f1210e82f1e2 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1297,13 +1297,16 @@ class TargetTransformInfo {
   /// provide even more information.
   /// \p TLibInfo is used to search for platform specific vector library
   /// functions for instructions that might be converted to calls (e.g. frem).
+  /// \p Scalars refers to individual scalars/instructions being used for
+  /// vectorization.
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty,
       TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
       ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
-      const TargetLibraryInfo *TLibInfo = nullptr) const;
+      const TargetLibraryInfo *TLibInfo = nullptr,
+      ArrayRef<Value *> Scalars = {}) const;
 
   /// Returns the cost estimation for alternating opcode pattern that can be
   /// lowered to a single instruction on the target. In X86 this is for the
@@ -2099,7 +2102,8 @@ class TargetTransformInfo::Concept {
   virtual InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr) = 0;
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) = 0;
   virtual InstructionCost getAltInstrCost(
       VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
       const SmallBitVector &OpcodeMask,
@@ -2780,10 +2784,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) override {
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) override {
     return Impl.getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,
-                                       Args, CxtI);
+                                       Args, CxtI, Scalars);
   }
   InstructionCost getAltInstrCost(VectorType *VecTy, unsigned Opcode0,
                                   unsigned Opcode1,
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 38aba183f6a173..4aa255a909edbf 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -581,11 +581,13 @@ class TargetTransformInfoImplBase {
 
   unsigned getMaxInterleaveFactor(ElementCount VF) const { return 1; }
 
-  InstructionCost getArithmeticInstrCost(
-      unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
-      TTI::OperandValueInfo Opd1Info, TTI::OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) const {
+  InstructionCost getArithmeticInstrCost(unsigned Opcode, Type *Ty,
+                                         TTI::TargetCostKind CostKind,
+                                         TTI::OperandValueInfo Opd1Info,
+                                         TTI::OperandValueInfo Opd2Info,
+                                         ArrayRef<const Value *> Args,
+                                         const Instruction *CxtI = nullptr,
+                                         ArrayRef<Value *> Scalars = {}) const {
     // Widenable conditions will eventually lower into constants, so some
     // operations with them will be trivially optimized away.
     auto IsWidenableCondition = [](const Value *V) {
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 98cbb4886642bf..4a27470763db83 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -924,7 +924,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     // Check if any of the operands are vector operands.
     const TargetLoweringBase *TLI = getTLI();
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 1fb2b9836de0cc..3f8745b1e06ecc 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -927,7 +927,7 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     OperandValueInfo Op1Info, OperandValueInfo Op2Info,
     ArrayRef<const Value *> Args, const Instruction *CxtI,
-    const TargetLibraryInfo *TLibInfo) const {
+    const TargetLibraryInfo *TLibInfo, ArrayRef<Value *> Scalars) const {
 
   // Use call cost for frem intructions that have platform specific vector math
   // functions, as those will be replaced with calls later by SelectionDAG or
@@ -942,10 +942,8 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
       return getCallInstrCost(nullptr, VecTy, {VecTy, VecTy}, CostKind);
   }
 
-  InstructionCost Cost =
-      TTIImpl->getArithmeticInstrCost(Opcode, Ty, CostKind,
-                                      Op1Info, Op2Info,
-                                      Args, CxtI);
+  InstructionCost Cost = TTIImpl->getArithmeticInstrCost(
+      Opcode, Ty, CostKind, Op1Info, Op2Info, Args, CxtI, Scalars);
   assert(Cost >= 0 && "TTI should not produce negative costs!");
   return Cost;
 }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d1536a276a9040..f9434b1691d746 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3376,8 +3376,8 @@ InstructionCost AArch64TTIImpl::getScalarizationOverhead(
 InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // The code-generator is currently not able to handle scalable vectors
   // of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
@@ -3442,8 +3442,8 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     if (!VT.isVector() && VT.getSizeInBits() > 64)
       return getCallInstrCost(/*Function*/ nullptr, Ty, {Ty, Ty}, CostKind);
 
-    InstructionCost Cost = BaseT::getArithmeticInstrCost(
-        Opcode, Ty, CostKind, Op1Info, Op2Info);
+    InstructionCost Cost =
+        BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info);
     if (Ty->isVectorTy()) {
       if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
         // SDIV/UDIV operations are lowered using SVE, then we can have less
@@ -3472,29 +3472,41 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
-        // If one of the operands is a uniform constant then the cost for each
-        // element is Cost for insertion, extraction and division.
-        // Insertion cost = 2, Extraction Cost = 2, Division = cost for the
-        // operation with scalar type
-        if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
-            (Op2Info.isConstant() && Op2Info.isUniform())) {
-          if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+          if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
+              (Op2Info.isConstant() && Op2Info.isUniform())) {
             InstructionCost DivCost = BaseT::getArithmeticInstrCost(
                 Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info);
-            return (4 + DivCost) * VTy->getNumElements();
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost + DivCost) * VTy->getNumElements();
+          } else if (!Scalars.empty()) {
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+            for (auto *V : Scalars) {
+              auto *I = cast<Instruction>(V);
+              Cost +=
+                  getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind,
+                                         TTI::getOperandInfo(I->getOperand(0)),
+                                         TTI::getOperandInfo(I->getOperand(1)));
+            }
+          } else {
+            // FIXME: The initial cost calculated should have considered extract
+            // cost twice. For now, we just add additional cost to avoid
+            // underestimating the total cost.
+            Cost += Cost;
           }
+        } else {
+          // We can't predict the cost of div/extract/insert without knowing the
+          // vector width.
+          Cost.setInvalid();
         }
-        // On AArch64, without SVE, vector divisions are expanded
-        // into scalar divisions of each pair of elements.
-        Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
-                                       CostKind, Op1Info, Op2Info);
-        Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
-                                       Op1Info, Op2Info);
       }
-
-      // TODO: if one of the arguments is scalar, then it's not necessary to
-      // double the cost of handling the vector elements.
-      Cost += Cost;
     }
     return Cost;
   }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 201bc831b816b3..8845efd241f8f0 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -218,7 +218,8 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE,
                                             const SCEV *Ptr);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 5160851f8c4424..e1d718aa8a3509 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -546,8 +546,8 @@ bool GCNTTIImpl::getTgtMemIntrinsic(IntrinsicInst *Inst,
 InstructionCost GCNTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index 10956861650ab3..414e4b8f7284a7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -157,7 +157,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
                                  const Instruction *I = nullptr);
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 0e29648a7a284f..9bf37d8457b02f 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -10,6 +10,7 @@
 #include "ARMSubtarget.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/ADT/APInt.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/CodeGen/CostTable.h"
@@ -1349,8 +1350,8 @@ InstructionCost ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 InstructionCost ARMTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
   if (ST->isThumb() && CostKind == TTI::TCK_CodeSize && Ty->isIntegerTy(1)) {
     // Make operations on i1 relatively expensive as this often involves
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 3a4f940088b2e3..bef278e211a418 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -258,7 +258,8 @@ class ARMTTIImpl : public BasicTTIImplBase<ARMTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost
   getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
diff --git a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
index bf0bef3a2b2f98..ca52bd375021c2 100644
--- a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
+++ b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
@@ -16,6 +16,7 @@
 #define LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H
 
 #include "BPFTargetMachine.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/CodeGen/BasicTTIImpl.h"
 #include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
@@ -61,7 +62,8 @@ class BPFTTIImpl : public BasicTTIImplBase<BPFTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
     if (ISD == ISD::ADD && CostKind == TTI::TCK_RecipThroughput)
       return SCEVCheapExpansionBudget.getValue() + 1;
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
index bbb9d065b62435..53572321ecec62 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
@@ -273,8 +273,8 @@ InstructionCost HexagonTTIImpl::getCmpSelInstrCost(
 InstructionCost HexagonTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // TODO: Handle more cost kinds.
   if (CostKind != TTI::TCK_RecipThroughput)
     return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
index 826644d08d1ac0..abaa25136574ae 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
@@ -142,7 +142,8 @@ class HexagonTTIImpl : public BasicTTIImplBase<HexagonTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
   InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
                                    TTI::CastContextHint CCH,
                                    TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
index 5fe63e4a2e0312..5000a93a818b0a 100644
--- a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
+++ b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
@@ -94,7 +94,8 @@ class LanaiTTIImpl : public BasicTTIImplBase<LanaiTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
 
     switch (ISD) {
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
index 4ec2ec100ab08d..2c194d9fff1ad4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
@@ -485,8 +485,8 @@ NVPTXTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
 InstructionCost NVPTXTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
index 0f4fb280b2d996..3d34f8d97ddc47 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
@@ -98,7 +98,8 @@ class NVPTXTTIImpl : public BasicTTIImplBase<NVPTXTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKi...
[truncated]

llvmbot · 2024-11-29T06:37:47Z

@llvm/pr-subscribers-backend-aarch64

Author: Sushant Gokhale (sushgokh)

Changes

…vision

In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases.

For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

Patch is 93.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118055.diff

29 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+9-5)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+7-5)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+2-1)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-5)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34-22)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+3-2)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/BPF/BPFTargetTransformInfo.h (+3-1)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+3-3)
(modified) llvm/test/Analysis/CostModel/AArch64/div.ll (+141-141)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/div.ll (+7-29)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 985ca1532e0149..50f1210e82f1e2 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1297,13 +1297,16 @@ class TargetTransformInfo {
   /// provide even more information.
   /// \p TLibInfo is used to search for platform specific vector library
   /// functions for instructions that might be converted to calls (e.g. frem).
+  /// \p Scalars refers to individual scalars/instructions being used for
+  /// vectorization.
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty,
       TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
       ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
-      const TargetLibraryInfo *TLibInfo = nullptr) const;
+      const TargetLibraryInfo *TLibInfo = nullptr,
+      ArrayRef<Value *> Scalars = {}) const;
 
   /// Returns the cost estimation for alternating opcode pattern that can be
   /// lowered to a single instruction on the target. In X86 this is for the
@@ -2099,7 +2102,8 @@ class TargetTransformInfo::Concept {
   virtual InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr) = 0;
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) = 0;
   virtual InstructionCost getAltInstrCost(
       VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
       const SmallBitVector &OpcodeMask,
@@ -2780,10 +2784,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) override {
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) override {
     return Impl.getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,
-                                       Args, CxtI);
+                                       Args, CxtI, Scalars);
   }
   InstructionCost getAltInstrCost(VectorType *VecTy, unsigned Opcode0,
                                   unsigned Opcode1,
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 38aba183f6a173..4aa255a909edbf 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -581,11 +581,13 @@ class TargetTransformInfoImplBase {
 
   unsigned getMaxInterleaveFactor(ElementCount VF) const { return 1; }
 
-  InstructionCost getArithmeticInstrCost(
-      unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
-      TTI::OperandValueInfo Opd1Info, TTI::OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) const {
+  InstructionCost getArithmeticInstrCost(unsigned Opcode, Type *Ty,
+                                         TTI::TargetCostKind CostKind,
+                                         TTI::OperandValueInfo Opd1Info,
+                                         TTI::OperandValueInfo Opd2Info,
+                                         ArrayRef<const Value *> Args,
+                                         const Instruction *CxtI = nullptr,
+                                         ArrayRef<Value *> Scalars = {}) const {
     // Widenable conditions will eventually lower into constants, so some
     // operations with them will be trivially optimized away.
     auto IsWidenableCondition = [](const Value *V) {
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 98cbb4886642bf..4a27470763db83 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -924,7 +924,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     // Check if any of the operands are vector operands.
     const TargetLoweringBase *TLI = getTLI();
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 1fb2b9836de0cc..3f8745b1e06ecc 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -927,7 +927,7 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     OperandValueInfo Op1Info, OperandValueInfo Op2Info,
     ArrayRef<const Value *> Args, const Instruction *CxtI,
-    const TargetLibraryInfo *TLibInfo) const {
+    const TargetLibraryInfo *TLibInfo, ArrayRef<Value *> Scalars) const {
 
   // Use call cost for frem intructions that have platform specific vector math
   // functions, as those will be replaced with calls later by SelectionDAG or
@@ -942,10 +942,8 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
       return getCallInstrCost(nullptr, VecTy, {VecTy, VecTy}, CostKind);
   }
 
-  InstructionCost Cost =
-      TTIImpl->getArithmeticInstrCost(Opcode, Ty, CostKind,
-                                      Op1Info, Op2Info,
-                                      Args, CxtI);
+  InstructionCost Cost = TTIImpl->getArithmeticInstrCost(
+      Opcode, Ty, CostKind, Op1Info, Op2Info, Args, CxtI, Scalars);
   assert(Cost >= 0 && "TTI should not produce negative costs!");
   return Cost;
 }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d1536a276a9040..f9434b1691d746 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3376,8 +3376,8 @@ InstructionCost AArch64TTIImpl::getScalarizationOverhead(
 InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // The code-generator is currently not able to handle scalable vectors
   // of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
@@ -3442,8 +3442,8 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     if (!VT.isVector() && VT.getSizeInBits() > 64)
       return getCallInstrCost(/*Function*/ nullptr, Ty, {Ty, Ty}, CostKind);
 
-    InstructionCost Cost = BaseT::getArithmeticInstrCost(
-        Opcode, Ty, CostKind, Op1Info, Op2Info);
+    InstructionCost Cost =
+        BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info);
     if (Ty->isVectorTy()) {
       if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
         // SDIV/UDIV operations are lowered using SVE, then we can have less
@@ -3472,29 +3472,41 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
-        // If one of the operands is a uniform constant then the cost for each
-        // element is Cost for insertion, extraction and division.
-        // Insertion cost = 2, Extraction Cost = 2, Division = cost for the
-        // operation with scalar type
-        if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
-            (Op2Info.isConstant() && Op2Info.isUniform())) {
-          if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+          if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
+              (Op2Info.isConstant() && Op2Info.isUniform())) {
             InstructionCost DivCost = BaseT::getArithmeticInstrCost(
                 Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info);
-            return (4 + DivCost) * VTy->getNumElements();
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost + DivCost) * VTy->getNumElements();
+          } else if (!Scalars.empty()) {
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+            for (auto *V : Scalars) {
+              auto *I = cast<Instruction>(V);
+              Cost +=
+                  getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind,
+                                         TTI::getOperandInfo(I->getOperand(0)),
+                                         TTI::getOperandInfo(I->getOperand(1)));
+            }
+          } else {
+            // FIXME: The initial cost calculated should have considered extract
+            // cost twice. For now, we just add additional cost to avoid
+            // underestimating the total cost.
+            Cost += Cost;
           }
+        } else {
+          // We can't predict the cost of div/extract/insert without knowing the
+          // vector width.
+          Cost.setInvalid();
         }
-        // On AArch64, without SVE, vector divisions are expanded
-        // into scalar divisions of each pair of elements.
-        Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
-                                       CostKind, Op1Info, Op2Info);
-        Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
-                                       Op1Info, Op2Info);
       }
-
-      // TODO: if one of the arguments is scalar, then it's not necessary to
-      // double the cost of handling the vector elements.
-      Cost += Cost;
     }
     return Cost;
   }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 201bc831b816b3..8845efd241f8f0 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -218,7 +218,8 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE,
                                             const SCEV *Ptr);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 5160851f8c4424..e1d718aa8a3509 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -546,8 +546,8 @@ bool GCNTTIImpl::getTgtMemIntrinsic(IntrinsicInst *Inst,
 InstructionCost GCNTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index 10956861650ab3..414e4b8f7284a7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -157,7 +157,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
                                  const Instruction *I = nullptr);
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 0e29648a7a284f..9bf37d8457b02f 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -10,6 +10,7 @@
 #include "ARMSubtarget.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/ADT/APInt.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/CodeGen/CostTable.h"
@@ -1349,8 +1350,8 @@ InstructionCost ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 InstructionCost ARMTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
   if (ST->isThumb() && CostKind == TTI::TCK_CodeSize && Ty->isIntegerTy(1)) {
     // Make operations on i1 relatively expensive as this often involves
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 3a4f940088b2e3..bef278e211a418 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -258,7 +258,8 @@ class ARMTTIImpl : public BasicTTIImplBase<ARMTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost
   getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
diff --git a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
index bf0bef3a2b2f98..ca52bd375021c2 100644
--- a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
+++ b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
@@ -16,6 +16,7 @@
 #define LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H
 
 #include "BPFTargetMachine.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/CodeGen/BasicTTIImpl.h"
 #include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
@@ -61,7 +62,8 @@ class BPFTTIImpl : public BasicTTIImplBase<BPFTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
     if (ISD == ISD::ADD && CostKind == TTI::TCK_RecipThroughput)
       return SCEVCheapExpansionBudget.getValue() + 1;
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
index bbb9d065b62435..53572321ecec62 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
@@ -273,8 +273,8 @@ InstructionCost HexagonTTIImpl::getCmpSelInstrCost(
 InstructionCost HexagonTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // TODO: Handle more cost kinds.
   if (CostKind != TTI::TCK_RecipThroughput)
     return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
index 826644d08d1ac0..abaa25136574ae 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
@@ -142,7 +142,8 @@ class HexagonTTIImpl : public BasicTTIImplBase<HexagonTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
   InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
                                    TTI::CastContextHint CCH,
                                    TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
index 5fe63e4a2e0312..5000a93a818b0a 100644
--- a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
+++ b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
@@ -94,7 +94,8 @@ class LanaiTTIImpl : public BasicTTIImplBase<LanaiTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
 
     switch (ISD) {
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
index 4ec2ec100ab08d..2c194d9fff1ad4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
@@ -485,8 +485,8 @@ NVPTXTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
 InstructionCost NVPTXTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
index 0f4fb280b2d996..3d34f8d97ddc47 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
@@ -98,7 +98,8 @@ class NVPTXTTIImpl : public BasicTTIImplBase<NVPTXTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKi...
[truncated]

sushgokh requested review from madhur13490, davemgreen, sjoerdmeijer and david-arm November 29, 2024 06:37

llvmbot added backend:ARM backend:AArch64 backend:AMDGPU backend:Hexagon backend:RISC-V backend:PowerPC backend:SystemZ backend:WebAssembly backend:X86 vectorizers backend:NVPTX llvm:analysis llvm:transforms labels Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

sushgokh commented Nov 29, 2024 •

edited

Loading

llvmbot commented Nov 29, 2024 •

edited

Loading

llvmbot commented Nov 29, 2024

[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

Are you sure you want to change the base?

[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

Conversation

sushgokh commented Nov 29, 2024 • edited Loading

llvmbot commented Nov 29, 2024 • edited Loading

llvmbot commented Nov 29, 2024

sushgokh commented Nov 29, 2024 •

edited

Loading

llvmbot commented Nov 29, 2024 •

edited

Loading