-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Loop Metadata to Asserts to help Loop rotation #227
base: aie-public
Are you sure you want to change the base?
Conversation
return std::nullopt; | ||
} | ||
|
||
return std::optional(BackedgeCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the loop is already rotated, is there any benefit for us? I mean, we can influence changes in the CFG as you mentioned before (fusing BBs). In this way, I am curious about practical effects (QoR).
I am just pointing this because loop rotation is an premise in several parts of this PR. Could we have a command line option to control if we would like to touch rotated loops and evaluate practical results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not have any do-while
loops that have loop iteration count pragmas
attached. Therefore, we do not have any regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just not sure if we need this, because if our goal is "to help loop rotation" we are inserting unnecessary information. Even if we have such type of loops, assume
will add a known information.
// expression can be simplified in later passes | ||
SCEVCouldNotCompute::NoWrapFlags NWF = BunldedSCEV.AddRecExpr->getNoWrapFlags( | ||
static_cast<SCEVCouldNotCompute::NoWrapFlags>(SCEV::FlagNUW | | ||
SCEV::FlagNSW)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: SCEV
also mentions "self-wrap", do you know what it means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a flag that determines that no overflow will occur, i.e. if either NUW or NSW are set.
In our pass we need to determine, if we have a signed or unsigned no-overflow flag to duplicate it for the Bound arithmetic. Therefore the NW flag does not provide any useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// AddRec expressions may have a no-self-wraparound property if, in
/// the integer domain, abs(step) * max-iteration(loop) <=
/// unsigned-max(bitwidth). This means that the recurrence will never reach
/// its start value if the step is non-zero.
/// Note that NUW and NSW are also valid properties of a recurrence, and
/// either implies NW. For convenience, NW will be set for a recurrence
/// whenever either NUW or NSW are set.
|
||
// Expansion of MinIterSCEV will result in \ | ||
// InitialValue + StepSize * Minimum Iteration Count | ||
if (!Expander.isSafeToExpand(MinIterSCEV)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: in which condition does it happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens in decrementing loops:
for (unsigned i=Bound; i > 0; i ++)
or loop with initial values, that are not constants.
for (unsigned i = InitialValue; i < Bound; i++)
In the case of constants, MinIterSCEV
evaluates to a constant that is just inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding unit tests:
incrementByOne
decrementByOne
decrementGEZero
The counter case, where MinIterSCEV
evaluates to a constant is
incrementInclusiveBound
SCEV::FlagNSW)); | ||
auto *AE = dyn_cast<SCEVAddExpr>(MinIterSCEV); | ||
if (AE && NWF) { | ||
MinIterSCEV = SE.getAddExpr(AE->getOperand(0), AE->getOperand(1), NWF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially this is taking the same expression and adding flags? Isn't there a helper to add them without re-creating the SCEV? Do you also know why those flags aren't automatically propagated from BunldedSCEV.AddRecExpr
? After all, MinIterSCEV
is that same expression, but evaluated at a different point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can add the overflow flags without recreating the expression, I generalized it to add the flags for any 'SCEVCommutativeExpr'
Unfortunately SCEVAddRecExpr::evaluateAtIteration
does not have a way to automatically reuse the no overflow flags.
SCEVCouldNotCompute::NoWrapFlags NWF = BunldedSCEV.AddRecExpr->getNoWrapFlags( | ||
static_cast<SCEVCouldNotCompute::NoWrapFlags>(SCEV::FlagNUW | | ||
SCEV::FlagNSW)); | ||
auto *AE = dyn_cast<SCEVAddExpr>(MinIterSCEV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be something else than SCEVAddExpr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I generalized it can be any SCEVCommutativeExpr
, which can be an AddExpr
, MulExpr
or SCEVMinMaxExpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for those it does not make sense to add nowrap flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the base expression is a AddExpr
, how can it become e.g. MulExpr
or SCEVMinMaxExpr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified offline and "confirmed" in https://www.npopov.com/2023/10/03/LLVM-Scalar-evolution.html: We should just get a new AddRec
with a different initial value.
bc48bdb
to
89dd3b2
Compare
dyn_cast<SCEVCommutativeExpr>(MinIterSCEV); | ||
if (ConstCE && NWF) { | ||
SCEVCommutativeExpr *CE = const_cast<SCEVCommutativeExpr *>(ConstCE); | ||
CE->setNoWrapFlags(NWF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that might be dangerous to const_cast an expression and change its properties? (It might be cached somewhere and re-used) Is there a way to get a non-const SCEV
from evaluateAtIteration()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unfortunately get a const SCEV expression from evaluateAtIteration()
fa41660
to
deff215
Compare
Changed the Logic to evaluate the Compare Instruction at Iteration 0 and set it to true, which solves all of our issues. Moved the Assumption into the preheader (with respective clones) to guarantee, that the assumption is loop invariant and does not change even if both Operands of the Compare are loop variant. Added more unittests to test more corner cases. |
@@ -107,6 +107,11 @@ Value *recursivlyCloneInBB(Value *Op, BasicBlock &BB, ValueToValueMapTy &VMap) { | |||
<< *Op << "\n"); | |||
return nullptr; | |||
} | |||
if (!I->isSafeToRemove()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general checker, if we are able to clone the instruction into a different BB, without introducing behavior that could cause issues (should be in conjunction with the volatile memory check). The checks include:
- if it is a call instruction
- if we may write to memory
- it an exception is thrown or if it is an exception handler
basically, combined with volatile this is a very conservative hoisting check
if (!AddRec) { | ||
LLVM_DEBUG(dbgs() << "Could not extract AddRecExpr, will reuse " << *Op | ||
<< "\n"); | ||
return Op; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is dangerous: we might return a LoadInst
that isn't loop-invariant. What we return here must be loop-invariant because we will later clone it.
E.g. we could have
loop.header:
store ...
%iv = load ...
The store location could alias with the load location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we might want to run LICM beforehand, or use analysis like MemorySSA
or LoopAccessAnalysis
to help us say what is loop invariant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note LoopUtils
has a bunch of things to help you. In particular:
/// Returns true if is legal to hoist or sink this instruction disregarding the
/// possible introduction of faults. Reasoning about potential faulting
/// instructions is the responsibility of the caller since it is challenging to
/// do efficiently from within this routine.
/// \p TargetExecutesOncePerLoop is true only when it is guaranteed that the
/// target executes at most once per execution of the loop body. This is used
/// to assess the legality of duplicating atomic loads. Generally, this is
/// true when moving out of loop and not true when moving into loops.
/// If \p ORE is set use it to emit optimization remarks.
bool canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT,
Loop *CurLoop, MemorySSAUpdater &MSSAU,
bool TargetExecutesOncePerLoop,
SinkAndHoistLICMFlags &LICMFlags,
OptimizationRemarkEmitter *ORE = nullptr);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to canSinkOrHoistInst
check
if (I->isVolatile()) { | ||
LLVM_DEBUG(dbgs() << "Volatile Instruction: Aborting! Could not clone " | ||
<< *Op << "\n"); | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said before, I think we need to check more than just volatile
, we cannot just clone any instruction without checking it is loop-invariant. Or at least we need to prove that the instructions that precede it will have no effects on its value.
LLVM_DEBUG(dbgs() << "Loop already in rotated form. Will not add Loop " | ||
"Iteration Count assumptions.\n"); | ||
return nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking: that could be an already-rotated loop with a guard? I assume we could potentially remove the guard if we add a builtin_assume
? Nothing to do for now, I think it's good to focus on standard for
loops :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
The case to check against is a do-while loop
with iteration count metadata
.
Since this is already rotated, we don't need to add any assumptions, because our pass is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are so early in the optimization pipeline, we should never see an already rotated loop
9863dd9
to
3909dd4
Compare
Only add assumption if the assumption at the first iteration can be guaranteed to be loop invariant. |
3909dd4
to
33c04f6
Compare
rebased to aie-public |
note the first commit will be removed once the ( #225 ) is merged
Please note: I am still going through QoR to make sure I am not introducing any functional errors.