Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Loop Metadata to Asserts to help Loop rotation #227

Open
wants to merge 3 commits into
base: aie-public
Choose a base branch
from

Conversation

F-Stuckmann
Copy link
Collaborator

@F-Stuckmann F-Stuckmann commented Oct 28, 2024

note the first commit will be removed once the ( #225 ) is merged

Please note: I am still going through QoR to make sure I am not introducing any functional errors.

return std::nullopt;
}

return std::optional(BackedgeCount);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the loop is already rotated, is there any benefit for us? I mean, we can influence changes in the CFG as you mentioned before (fusing BBs). In this way, I am curious about practical effects (QoR).

I am just pointing this because loop rotation is an premise in several parts of this PR. Could we have a command line option to control if we would like to touch rotated loops and evaluate practical results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not have any do-while loops that have loop iteration count pragmas attached. Therefore, we do not have any regressions.

Copy link
Collaborator

@andcarminati andcarminati Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just not sure if we need this, because if our goal is "to help loop rotation" we are inserting unnecessary information. Even if we have such type of loops, assume will add a known information.

// expression can be simplified in later passes
SCEVCouldNotCompute::NoWrapFlags NWF = BunldedSCEV.AddRecExpr->getNoWrapFlags(
static_cast<SCEVCouldNotCompute::NoWrapFlags>(SCEV::FlagNUW |
SCEV::FlagNSW));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: SCEV also mentions "self-wrap", do you know what it means?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a flag that determines that no overflow will occur, i.e. if either NUW or NSW are set.

In our pass we need to determine, if we have a signed or unsigned no-overflow flag to duplicate it for the Bound arithmetic. Therefore the NW flag does not provide any useful information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// AddRec expressions may have a no-self-wraparound property if, in
/// the integer domain, abs(step) * max-iteration(loop) <=
/// unsigned-max(bitwidth). This means that the recurrence will never reach
/// its start value if the step is non-zero.

/// Note that NUW and NSW are also valid properties of a recurrence, and
/// either implies NW. For convenience, NW will be set for a recurrence
/// whenever either NUW or NSW are set.


// Expansion of MinIterSCEV will result in \
// InitialValue + StepSize * Minimum Iteration Count
if (!Expander.isSafeToExpand(MinIterSCEV)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: in which condition does it happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens in decrementing loops:

for (unsigned i=Bound; i > 0; i ++)

or loop with initial values, that are not constants.
for (unsigned i = InitialValue; i < Bound; i++)
In the case of constants, MinIterSCEV evaluates to a constant that is just inserted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding unit tests:

incrementByOne
decrementByOne
decrementGEZero

The counter case, where MinIterSCEV evaluates to a constant is
incrementInclusiveBound

SCEV::FlagNSW));
auto *AE = dyn_cast<SCEVAddExpr>(MinIterSCEV);
if (AE && NWF) {
MinIterSCEV = SE.getAddExpr(AE->getOperand(0), AE->getOperand(1), NWF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially this is taking the same expression and adding flags? Isn't there a helper to add them without re-creating the SCEV? Do you also know why those flags aren't automatically propagated from BunldedSCEV.AddRecExpr? After all, MinIterSCEV is that same expression, but evaluated at a different point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can add the overflow flags without recreating the expression, I generalized it to add the flags for any 'SCEVCommutativeExpr'

Unfortunately SCEVAddRecExpr::evaluateAtIteration does not have a way to automatically reuse the no overflow flags.

SCEVCouldNotCompute::NoWrapFlags NWF = BunldedSCEV.AddRecExpr->getNoWrapFlags(
static_cast<SCEVCouldNotCompute::NoWrapFlags>(SCEV::FlagNUW |
SCEV::FlagNSW));
auto *AE = dyn_cast<SCEVAddExpr>(MinIterSCEV);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be something else than SCEVAddExpr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I generalized it can be any SCEVCommutativeExpr , which can be an AddExpr, MulExpr or SCEVMinMaxExpr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for those it does not make sense to add nowrap flags?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the base expression is a AddExpr, how can it become e.g. MulExpr or SCEVMinMaxExpr?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified offline and "confirmed" in https://www.npopov.com/2023/10/03/LLVM-Scalar-evolution.html: We should just get a new AddRec with a different initial value.

@F-Stuckmann F-Stuckmann force-pushed the stuckmann.loop.metadata.asserts branch from bc48bdb to 89dd3b2 Compare November 26, 2024 14:02
dyn_cast<SCEVCommutativeExpr>(MinIterSCEV);
if (ConstCE && NWF) {
SCEVCommutativeExpr *CE = const_cast<SCEVCommutativeExpr *>(ConstCE);
CE->setNoWrapFlags(NWF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that might be dangerous to const_cast an expression and change its properties? (It might be cached somewhere and re-used) Is there a way to get a non-const SCEV from evaluateAtIteration()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unfortunately get a const SCEV expression from evaluateAtIteration()

@F-Stuckmann F-Stuckmann force-pushed the stuckmann.loop.metadata.asserts branch from fa41660 to deff215 Compare November 28, 2024 14:39
@F-Stuckmann
Copy link
Collaborator Author

Changed the Logic to evaluate the Compare Instruction at Iteration 0 and set it to true, which solves all of our issues.

Moved the Assumption into the preheader (with respective clones) to guarantee, that the assumption is loop invariant and does not change even if both Operands of the Compare are loop variant.

Added more unittests to test more corner cases.

@@ -107,6 +107,11 @@ Value *recursivlyCloneInBB(Value *Op, BasicBlock &BB, ValueToValueMapTy &VMap) {
<< *Op << "\n");
return nullptr;
}
if (!I->isSafeToRemove()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this checking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general checker, if we are able to clone the instruction into a different BB, without introducing behavior that could cause issues (should be in conjunction with the volatile memory check). The checks include:

  • if it is a call instruction
  • if we may write to memory
  • it an exception is thrown or if it is an exception handler

basically, combined with volatile this is a very conservative hoisting check

if (!AddRec) {
LLVM_DEBUG(dbgs() << "Could not extract AddRecExpr, will reuse " << *Op
<< "\n");
return Op;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is dangerous: we might return a LoadInst that isn't loop-invariant. What we return here must be loop-invariant because we will later clone it.

E.g. we could have

loop.header:
  store ...
  %iv = load ...

The store location could alias with the load location.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we might want to run LICM beforehand, or use analysis like MemorySSA or LoopAccessAnalysis to help us say what is loop invariant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note LoopUtils has a bunch of things to help you. In particular:

/// Returns true if is legal to hoist or sink this instruction disregarding the
/// possible introduction of faults.  Reasoning about potential faulting
/// instructions is the responsibility of the caller since it is challenging to
/// do efficiently from within this routine.
/// \p TargetExecutesOncePerLoop is true only when it is guaranteed that the
/// target executes at most once per execution of the loop body.  This is used
/// to assess the legality of duplicating atomic loads.  Generally, this is
/// true when moving out of loop and not true when moving into loops.
/// If \p ORE is set use it to emit optimization remarks.
bool canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
                        Loop *CurLoop, MemorySSAUpdater &MSSAU,
                        bool TargetExecutesOncePerLoop,
                        SinkAndHoistLICMFlags &LICMFlags,
                        OptimizationRemarkEmitter *ORE = nullptr);

Copy link
Collaborator Author

@F-Stuckmann F-Stuckmann Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to canSinkOrHoistInst check

if (I->isVolatile()) {
LLVM_DEBUG(dbgs() << "Volatile Instruction: Aborting! Could not clone "
<< *Op << "\n");
return nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said before, I think we need to check more than just volatile, we cannot just clone any instruction without checking it is loop-invariant. Or at least we need to prove that the instructions that precede it will have no effects on its value.

LLVM_DEBUG(dbgs() << "Loop already in rotated form. Will not add Loop "
"Iteration Count assumptions.\n");
return nullptr;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking: that could be an already-rotated loop with a guard? I assume we could potentially remove the guard if we add a builtin_assume? Nothing to do for now, I think it's good to focus on standard for loops :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
The case to check against is a do-while loop with iteration count metadata.
Since this is already rotated, we don't need to add any assumptions, because our pass is redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are so early in the optimization pipeline, we should never see an already rotated loop

@F-Stuckmann F-Stuckmann force-pushed the stuckmann.loop.metadata.asserts branch from 9863dd9 to 3909dd4 Compare December 3, 2024 10:48
@F-Stuckmann
Copy link
Collaborator Author

Only add assumption if the assumption at the first iteration can be guaranteed to be loop invariant.

@F-Stuckmann F-Stuckmann force-pushed the stuckmann.loop.metadata.asserts branch from 3909dd4 to 33c04f6 Compare December 3, 2024 10:59
@F-Stuckmann
Copy link
Collaborator Author

rebased to aie-public

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants