Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Fold align assume into load's !align metadata if possible. #108958

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 17, 2024

If an alignment assumption is valid in the context of a corresponding load of the pointer the assumption applies to, the assumption can be replaced !align metadata on the load.

The benefits of folding it into !align are that existing code makes better use of !align and it allows removing the now-redundant call instructions.

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

If an alignment assumption is valid in the context of a corresponding load of the pointer the assumption applies to, the assumption can be replaced !align metadata on the load.

The benefits of folding it into !align are that existing code makes better use of !align and it allows removing the now-redundant call instructions.


Full diff: https://github.com/llvm/llvm-project/pull/108958.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (+21-4)
  • (modified) llvm/test/Transforms/InstCombine/assume-align.ll (+4-3)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 61011d55227e7b..596de10e20b6de 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -3076,12 +3076,13 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
       // TODO: apply range metadata for range check patterns?
     }
 
-    // Separate storage assumptions apply to the underlying allocations, not any
-    // particular pointer within them. When evaluating the hints for AA purposes
-    // we getUnderlyingObject them; by precomputing the answers here we can
-    // avoid having to do so repeatedly there.
     for (unsigned Idx = 0; Idx < II->getNumOperandBundles(); Idx++) {
       OperandBundleUse OBU = II->getOperandBundleAt(Idx);
+
+      // Separate storage assumptions apply to the underlying allocations, not any
+      // particular pointer within them. When evaluating the hints for AA purposes
+      // we getUnderlyingObject them; by precomputing the answers here we can
+      // avoid having to do so repeatedly there.
       if (OBU.getTagName() == "separate_storage") {
         assert(OBU.Inputs.size() == 2);
         auto MaybeSimplifyHint = [&](const Use &U) {
@@ -3095,6 +3096,22 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
         MaybeSimplifyHint(OBU.Inputs[0]);
         MaybeSimplifyHint(OBU.Inputs[1]);
       }
+
+      // Try to fold alignment assumption into a load's !align metadata, if the assumption is valid in the load's context.
+      if (OBU.getTagName() == "align" && OBU.Inputs.size() == 2) {
+        auto *LI = dyn_cast<LoadInst>(OBU.Inputs[0]);
+        if (!LI || !isValidAssumeForContext(II, LI, &DT, /*AllowEphemerals=*/true))
+          continue;
+        auto *Align = cast<ConstantInt>(OBU.Inputs[1]);
+        if (!isPowerOf2_64(Align->getZExtValue()))
+          continue;
+        LI->setMetadata(LLVMContext::MD_align,
+                        MDNode::get(II->getContext(),
+                                    ValueAsMetadata::getConstant(
+                                        Align)));
+        auto *New = CallBase::removeOperandBundle(II, OBU.getTagID());
+        return New;
+      }
     }
 
     // Convert nonnull assume like:
diff --git a/llvm/test/Transforms/InstCombine/assume-align.ll b/llvm/test/Transforms/InstCombine/assume-align.ll
index 2b8ca5d25fd1a8..65256377696a59 100644
--- a/llvm/test/Transforms/InstCombine/assume-align.ll
+++ b/llvm/test/Transforms/InstCombine/assume-align.ll
@@ -123,11 +123,9 @@ define i8 @assume_align_non_pow2(ptr %p) {
   ret i8 %v
 }
 
-; TODO: Can fold alignment assumption into !align metadata on load.
 define ptr @fold_assume_align_pow2_of_loaded_pointer_into_align_metadata(ptr %p) {
 ; CHECK-LABEL: @fold_assume_align_pow2_of_loaded_pointer_into_align_metadata(
-; CHECK-NEXT:    [[P2:%.*]] = load ptr, ptr [[P:%.*]], align 8
-; CHECK-NEXT:    call void @llvm.assume(i1 true) [ "align"(ptr [[P2]], i64 8) ]
+; CHECK-NEXT:    [[P2:%.*]] = load ptr, ptr [[P:%.*]], align 8, !align [[META0:![0-9]+]]
 ; CHECK-NEXT:    ret ptr [[P2]]
 ;
   %p2 = load ptr, ptr %p
@@ -171,3 +169,6 @@ define ptr @dont_fold_assume_align_zero_of_loaded_pointer_into_align_metadata(pt
   call void @llvm.assume(i1 true) [ "align"(ptr %p2, i64 0) ]
   ret ptr %p2
 }
+;.
+; CHECK: [[META0]] = !{i64 8}
+;.

Copy link

github-actions bot commented Sep 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

fhahn added a commit to fhahn/llvm-project that referenced this pull request Sep 17, 2024
Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.

See llvm#101372 for a discussion
of missed range check optimizations in hardened mode.

Once llvm#108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.
Comment on lines 3107 to 3108
auto *Align = cast<ConstantInt>(OBU.Inputs[1]);
if (!isPowerOf2_64(Align->getZExtValue()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *Align = cast<ConstantInt>(OBU.Inputs[1]);
if (!isPowerOf2_64(Align->getZExtValue()))
auto *Align = dyn_cast<ConstantInt>(OBU.Inputs[1]);
if (!Align || !isPowerOf2_64(Align->getZExtValue()))

While attributes expect constant arguments, assume operand bundles may be provided a dynamic value, for example:

call void @llvm.assume(i1 true) ["align"(ptr %val, i32 %align)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use getKnowledgeFromBundle here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks! Didn't check that they also allow dynamic values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also added a check of the type size + test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use getKnowledgeFromBundle here?

Missed this earlier, updated now, thanks!

@nikic
Copy link
Contributor

nikic commented Sep 17, 2024

Based on https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1320/files it looks like we end up losing the alignment information in ~most cases? Presumably because SROA later comes along and removes the load.

Note that for the existing non-null handling, SROA will actually rematerialize the nonnull assumption. But I'm reasonably confident that doing that for !align would have terrible effects, at least for frontends that use !align a lot.

continue;
LI->setMetadata(
LLVMContext::MD_align,
MDNode::get(II->getContext(), ValueAsMetadata::getConstant(Align)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also add noundef to preserve the fact this is IUB.

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 17, 2024

Based on https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1320/files it looks like we end up losing the alignment information in ~most cases? Presumably because SROA later comes along and removes the load.

Note that for the existing non-null handling, SROA will actually rematerialize the nonnull assumption. But I'm reasonably confident that doing that for !align would have terrible effects, at least for frontends that use !align a lot.

PhaseOrdering reproducer:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
declare void @llvm.assume(i1 noundef) #0

define i32 @_ZN4llvm7support6endian8readNextIjLNS_10endiannessE1ELm0EhEET_RPKT2_(ptr %0) {
  %2 = call i32 @_ZN4llvm7support6endian8readNextIjLm0EhEET_RPKT1_NS_10endiannessE(ptr %0)
  ret i32 0
}

define i32 @_ZN4llvm7support6endian8readNextIjLm0EhEET_RPKT1_NS_10endiannessE(ptr %0) {
  %2 = load ptr, ptr %0, align 8
  %3 = call i32 @_ZN4llvm7support6endian4readIjLm0EEET_PKvNS_10endiannessE(ptr %2)
  %4 = load ptr, ptr %0, align 8
  %5 = getelementptr i8, ptr %4, i64 4
  store ptr %5, ptr %0, align 8
  ret i32 0
}

define i32 @_ZN4llvm7support6endian4readIjLm0EEET_PKvNS_10endiannessE(ptr %0) {
  call void @llvm.assume(i1 true) [ "align"(ptr %0, i64 4) ]
  %.0.copyload = load i32, ptr %0, align 1
  %2 = call i32 @_ZN4llvm7support6endian9byte_swapIjEET_S3_NS_10endiannessE(i32 %.0.copyload)
  ret i32 0
}

declare i32 @_ZN4llvm7support6endian9byte_swapIjEET_S3_NS_10endiannessE(i32)

define i64 @_ZN4llvm22OnDiskChainedHashTableIN12_GLOBAL__N_126IdentifierIndexReaderTraitEE24readNumBucketsAndEntriesERPKh(ptr %0) {
  %2 = call i32 @_ZN4llvm7support6endian8readNextIjLNS_10endiannessE1ELm0EhEET_RPKT2_(ptr %0)
  %3 = call i32 @_ZN4llvm7support6endian8readNextIjLNS_10endiannessE1ELm0EhEET_RPKT2_(ptr %0)
  ret i64 0
}

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }

Before:

define noundef i64 @_ZN4llvm22OnDiskChainedHashTableIN12_GLOBAL__N_126IdentifierIndexReaderTraitEE24readNumBucketsAndEntriesERPKh(ptr nocapture %0) local_unnamed_addr {
  %2 = load ptr, ptr %0, align 8
  call void @llvm.assume(i1 true) [ "align"(ptr %2, i64 4) ]
  %.0.copyload.i.i.i = load i32, ptr %2, align 4
  %3 = tail call i32 @_ZN4llvm7support6endian9byte_swapIjEET_S3_NS_10endiannessE(i32 %.0.copyload.i.i.i)
  %4 = load ptr, ptr %0, align 8
  %5 = getelementptr i8, ptr %4, i64 4
  store ptr %5, ptr %0, align 8
  call void @llvm.assume(i1 true) [ "align"(ptr %5, i64 4) ]
  %.0.copyload.i.i.i1 = load i32, ptr %5, align 4
  %6 = tail call i32 @_ZN4llvm7support6endian9byte_swapIjEET_S3_NS_10endiannessE(i32 %.0.copyload.i.i.i1)
  %7 = load ptr, ptr %0, align 8
  %8 = getelementptr i8, ptr %7, i64 4
  store ptr %8, ptr %0, align 8
  ret i64 0
}

After:

define noundef i64 @_ZN4llvm22OnDiskChainedHashTableIN12_GLOBAL__N_126IdentifierIndexReaderTraitEE24readNumBucketsAndEntriesERPKh(ptr nocapture %0) local_unnamed_addr {
  %2 = load ptr, ptr %0, align 8, !align !0
  %.0.copyload.i.i.i = load i32, ptr %2, align 4
  %3 = tail call i32 @_ZN4llvm7support6endian9byte_swapIjEET_S3_NS_10endiannessE(i32 %.0.copyload.i.i.i)
  %4 = load ptr, ptr %0, align 8
  %5 = getelementptr i8, ptr %4, i64 4
  store ptr %5, ptr %0, align 8
  %.0.copyload.i.i.i1 = load i32, ptr %5, align 1
  %6 = tail call i32 @_ZN4llvm7support6endian9byte_swapIjEET_S3_NS_10endiannessE(i32 %.0.copyload.i.i.i1)
  %7 = load ptr, ptr %0, align 8
  %8 = getelementptr i8, ptr %7, i64 4
  store ptr %8, ptr %0, align 8
  ret i64 0
}

@fhahn
Copy link
Contributor Author

fhahn commented Sep 17, 2024

Based on https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1320/files it looks like we end up losing the alignment information in ~most cases? Presumably because SROA later comes along and removes the load.

Note that for the existing non-null handling, SROA will actually rematerialize the nonnull assumption. But I'm reasonably confident that doing that for !align would have terrible effects, at least for frontends that use !align a lot.

Ah that is an unfortunate! For the particular case @dtcxzyw shared, looks like the issue is early-cse: https://llvm.godbolt.org/z/KfacPbzf7

Maybe for that case it would be sufficient to have early-cse re-materialize the assumption when CSE'ing an instruction with !align metadata that cannot be preserved.

@fhahn
Copy link
Contributor Author

fhahn commented Sep 17, 2024

(other alternative would be to just keep the assumptions, but I was hoping this patch would help to avoid introducing too many new instructions when doing something like #108961)

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 18, 2024

(other alternative would be to just keep the assumptions, but I was hoping this patch would help to avoid introducing too many new instructions when doing something like #108961)

Agree. Removing assumptions reduces compilation time :)

llvm/Interp.cpp.ll 36858214354 33892745069 -8.05%
llvm/Disasm.cpp.ll 3399768227 3263050730 -4.02%
llvm/APINotesReader.cpp.ll 7143171077 7021418904 -1.70%
llvm/COFFObjectFile.cpp.ll 3399105265 3356267000 -1.26%
darktable/histogram.c.ll 1007474874 995703620 -1.17%

@fhahn fhahn force-pushed the ic-fold-alignment-assumption-into-load branch from 2de8174 to a7ad32d Compare September 18, 2024 10:38
@fhahn
Copy link
Contributor Author

fhahn commented Sep 18, 2024

@dtcxzyw updated this PR to include a change to EarlyCSE to materialize alignment assumptions for !align. Could you re-run your analysis for the latest version of the PR?

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 18, 2024

@dtcxzyw updated this PR to include a change to EarlyCSE to materialize alignment assumptions for !align. Could you re-run your analysis for the latest version of the PR?

See dtcxzyw/llvm-opt-benchmark#1312. You can request a new run yourself :)
I just added this feature yesterday. Hopefully it works well.

@nikic
Copy link
Contributor

nikic commented Sep 18, 2024

New results show about what I'd expect. Improvements for C++ which doesn't use !align, big regressions for Rust where ~all ptr loads are !align.

@fhahn
Copy link
Contributor Author

fhahn commented Sep 18, 2024

New results show about what I'd expect. Improvements for C++ which doesn't use !align, big regressions for Rust where ~all ptr loads are !align.

Yep, updating the patch to avoid adding assume() if the new pointer already has !align

If an alignment assumption is valid in the context of a corresponding
load of the pointer the assumption applies to, the assumption can be
replaced !align metadata on the load.

The benefits of folding it into !align are that existing code makes
better use of !align and it allows removing the now-redundant call
instructions.
@fhahn fhahn force-pushed the ic-fold-alignment-assumption-into-load branch from a7ad32d to ca75fc2 Compare September 18, 2024 13:38

// Try to fold alignment assumption into a load's !align metadata, if the
// assumption is valid in the load's context.
if (OBU.getTagName() == "align" && OBU.Inputs.size() == 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to only use the alignment assumption and not just add support for this in computeKnownBits and use that?

ldionne pushed a commit to fhahn/llvm-project that referenced this pull request Jan 13, 2025
Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.

See llvm#101372 for a discussion
of missed range check optimizations in hardened mode.

Once llvm#108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 15, 2025
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 15, 2025
ldionne added a commit that referenced this pull request Jan 16, 2025
#108961)

Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.

This patch adds alignment assumptions at the point where the begin and
end pointers are loaded. If the pointers would not have the same
alignment, end might never get hit when incrementing begin.

See #101372 for a discussion
of missed range check optimizations in hardened mode.

Once #108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.

Co-authored-by: Louis Dionne <[email protected]>
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 16, 2025
…s of vector. (#108961)

Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.

This patch adds alignment assumptions at the point where the begin and
end pointers are loaded. If the pointers would not have the same
alignment, end might never get hit when incrementing begin.

See llvm/llvm-project#101372 for a discussion
of missed range check optimizations in hardened mode.

Once llvm/llvm-project#108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.

Co-authored-by: Louis Dionne <[email protected]>
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 16, 2025
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 16, 2025
DKLoehr pushed a commit to DKLoehr/llvm-project that referenced this pull request Jan 17, 2025
llvm#108961)

Missing information about begin and end pointers of std::vector can lead
to missed optimizations in LLVM.

This patch adds alignment assumptions at the point where the begin and
end pointers are loaded. If the pointers would not have the same
alignment, end might never get hit when incrementing begin.

See llvm#101372 for a discussion
of missed range check optimizations in hardened mode.

Once llvm#108958 lands, the created
`llvm.assume` calls for the alignment should be folded into the `load`
instructions, resulting in no extra instructions after InstCombine.

Co-authored-by: Louis Dionne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants