-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2] Enable up-combining for VLD + UNPACK #241
Conversation
QoR:
|
bool isUseOf(const MachineInstr &MI, const MachineInstr &Use) { | ||
for (auto &Defs : Use.defs()) { | ||
bool isUseOf(const MachineInstr &MI, const MachineInstr &Def) { | ||
for (auto &Defs : Def.defs()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that iterate on implicit defs as well? I'm never sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just explicit ones! To cover all definitions, we need all_defs()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should be careful about implicit defs as well. The instruction might write to a control register that is used by a later instruction. I don't think that case can happen in AIE2, but who knows.
However, I'm now thinking about a case like this:
%0:vec256 = VLD
$crSat = ...
%1:vec512 = VUNPACK %0, implicit $crSat
There we cannot advance the VUNPACK, because it would then use a different value for $crSat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this could be a case if we extend for VUPS case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best, in this case, is to iterate over all uses and defs, like:
bool isUseOf(const MachineInstr &MI, const MachineInstr &Def) {
for (auto &Defs : Def.all_defs()) {
for (auto &MIUse : MI.all_uses()) {
if (MIUse.isReg() && Defs.getReg() == MIUse.getReg())
return true;
}
}
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh alright, it's VPACK that uses crSat
, not VUNPACK. Still, the latter could use crUnpackSign
, so we can end up with the same problem. I think we need to use all_uses
and all_defs
like you proposed.
llvm/test/CodeGen/AIE/aie2/GlobalISel/inst-select-no-combine-vldb_unpack.mir
Outdated
Show resolved
Hide resolved
fcdfcd4
to
20b677f
Compare
@@ -1628,6 +1633,9 @@ bool AIE2InstructionSelector::selectG_AIE_LOAD_UNPACK( | |||
Register DstReg = UNPACKI.getOperand(0).getReg(); | |||
Register SignReg = UNPACKI.getOperand(3).getReg(); | |||
|
|||
if (ShouldAntecipate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it not be ShouldAnticipate
?
Is there a more descriptive name for it? Like CanJoinLoadAndUse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name and also included a comment.
20b677f
to
1bb9385
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but before merging, we need to be a bit more careful about implicit uses/defs I think. Otherwise I think it's nice and concise code :)
auto UnsafeToMoveBefore = [&](const MachineInstr &MI) { | ||
return (isUseOf(Dest, MI)); | ||
}; | ||
return none_of(InstrRange, UnsafeToMoveBefore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice way of using none_of
🚀
1bb9385
to
11f9a74
Compare
Hi @gbossu, all comments were addressed. Thank you for the review ;-). |
11f9a74
to
c00687b
Compare
for (auto &MIUse : MI.uses()) { | ||
bool isUseOf(const MachineInstr &MI, const MachineInstr &Def) { | ||
for (auto &Defs : Def.all_defs()) { | ||
for (auto &MIUse : MI.all_uses()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test to check that the VPACK
/VUNPACK
isn't advanced when one of its implicit inputs is defined in between the load/store and the op?
E.g.
%0 = VLD ...
VST ... ; prevent delaying VLD
$crSign = ...
%1 = VPACK %0, implicit $crSign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included a test for VUNPACK with a control register write in between. I am also looking explicitly to G_INTRINSIC_W_SIDE_EFFECTS
when we try to advance an instruction.
If we can't delay VLD, we can try to antecipate UNPACK. This approach can be extended to other selection combiners.
c00687b
to
2a76196
Compare
After this change:
No side-effects were observed in QoR. |
; CHECK-NEXT: [[COPY1:%[0-9]+]]:er = COPY $r0 | ||
; CHECK-NEXT: [[COPY2:%[0-9]+]]:em = COPY [[COPY1]] | ||
; CHECK-NEXT: [[VLD_pstm_pseudo:%[0-9]+]]:vec256, [[VLD_pstm_pseudo1:%[0-9]+]]:ep = VLD_pstm_pseudo [[COPY]], [[COPY2]] :: (load (<32 x s8>)) | ||
; CHECK-NEXT: $crunpacksign = COPY [[COPY1]] | ||
; CHECK-NEXT: [[VUNPACK_S16_S8_:%[0-9]+]]:vec512 = VUNPACK_S16_S8 [[VLD_pstm_pseudo]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super-nit: we are writing to crSign
, but VUNPACK_S16_S8
does not read it. If we get smarter, I guess we will be able to combine. I guess what we should really check is VST + PACK
with a set_crsat()
in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it could be improved even more. When we are combining/selecting, this csr write in-between is just an intrinsic call with a constant operand representing the register. We can parse this call against the registers that are consumed by the final instruction (we need a mapping for this). About the test, VUNPACK_S16_S8
uses indeed the crunpacksign
, but if we change the register, the result will be the same.
As this PR solves the problems of the tracked benchmarks, we can keep it as is and look to set_csr()
in the future if we see this as a blocker in some benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
If we can't delay VLD, we can try to antecipate UNPACK. This approach can be extended to other selection combiners.