Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot add more virtual registers then number physical registers to RegisterDependencyConditions #7672

Open
BradleyWood opened this issue Feb 27, 2025 · 11 comments

Comments

@BradleyWood
Copy link
Contributor

OpenJ9 recently hit this assert when generating an inline intrinsic on 32-bit.

Assertion failed at /tmp/bld_88867/bld_linux_x86/compiler/../omr/compiler/x/codegen/OMRMachine.cpp:558: considerVirtAsSpillCandidate
VMState: 0x0005ff06
	freeBestGPRegister(): could not find any GPR spill candidates for VRF_0xce6f9070

If register dependency conditions has more pre/post conditions than available number of physical registers, register assignment fails to spill registers when needed. I was able to reproduce the issue on x86_64 by adding a few extra registers live across the vectorizedHashCode() intrinsic and adding them to the RegisterDependencyConditions. Excluding these live registers from the RegisterDependencyConditions produces register spills as expected with no crash.

@0xdaryl
Copy link
Contributor

0xdaryl commented Feb 27, 2025

The register dependency condition mechanism was never designed to handle the situation where more virtual registers are requested to be assigned than there are physically available. That situation is usually (always?) a bug in the code that created the regdep because it is asking the register assigner to do the impossible.

There are situations where, for a given virtual reg, a NoReg and a real reg dependency are added to the regdep. If that situation is possible, there is a "union" function on the register dependency conditions that should merge those constraints together.

@vijaysun-omr
Copy link
Contributor

Is there a conservative fix (or a real one if that is easy enough) that comes to mind to address the issue for 32-bit while re-enabling the original code developed for the instrinsic ?

@BradleyWood
Copy link
Contributor Author

@vijaysun-omr The simplest thing is not to do loop unrolling.

@BradleyWood
Copy link
Contributor Author

@0xdaryl I think you should be able to add as many NoReg dependencies as you please to RegisterDependencyConditions

@0xdaryl
Copy link
Contributor

0xdaryl commented Feb 27, 2025

I believe that's only true for "dummy" virtual registers whose lifetime is limited to the regdep and are used to evacuate the register assigned to a virtual. I think in that instance you only need one available register of each kind to satisfy the dummy assignment.

Is that what you have in this circumstance, or are these virtual registers that are actually used in instructions?

@vijaysun-omr
Copy link
Contributor

32-bit performance is not as critical as 64-bit performance in general. So, if a simple fix is possible to keep 32-bit functional (even disabling the new optimization, or disabling the unrolling on 32-bit should be considered seriously) while re-enabling for 64-bit, then I would encourage that.

@0xdaryl
Copy link
Contributor

0xdaryl commented Mar 3, 2025

Do you believe there is still a problem in register assignment? If so, perhaps sharing a log with the relevant depcond would help clarify your observation of incorrect behaviour. If not, please close.

@BradleyWood
Copy link
Contributor Author

Whether or not the register assigner was designed to handle this case, I think we ought to make a change that allows this. This type of situation happens a lot, especially if you plan on supporting 32-bit. In addition, if you have any sort of control flow, you are forced to use register dependency conditions even if you don't require reservation of specific registers.

Currently, there is no documentation on this topic, and I would not expect any developer to know about the quirks of RA. Lots of additional domain knowledge is required to foresee and fix these problems.

@0xdaryl
Copy link
Contributor

0xdaryl commented Mar 3, 2025

Please attach a codegen log (or a snippet with enough context) for what you think needs to be fixed. What I think you're asking for is not something our RA is capable of (assigning n register dependencies for m real registers where n > m). It may be the case that the "fix" for this is more documentation, which unfortunately is sorely lacking.

@BradleyWood
Copy link
Contributor Author

I have attached a log using the vectorized integer hashcode as an example. I allocated an additional 7 VRF registers with NoReg dependencies. They are used with paddw instruction once each before the main loop and once each just before the label marked as the end of internal control flow.

traceRA.txt

@0xdaryl
Copy link
Contributor

0xdaryl commented Mar 4, 2025

Thanks.

At L0193 there is a register dependency with 1 GPR and 17 VRFs. All of the conditions are NoReg, which means you're not asking for specific registers for any of these. If we're generating code for a processor that only supports XMMs or YMMs, there are only 16 registers maximum available for simultaneous assignment. For ZMM, while there are 32 real registers available in the architecture, the register assigner is only capable of assigning 16 anyway (zmm0-15). This is a constraint and limitation that we should fix at some point.

A regdep essentially sets the state of the RA at some point in the instruction stream. It specifies conditions that must be satisfied simultaneously. Once a condition is satisfied by the RA, that reg is "blocked" (a register state) so that it is not spilled to satisfy other conditions. It is "unblocked" after the regdep assignment is complete. Since your regdep is asking for 17 virtual VRFs registers to be assigned at L0193 but only 16 registers maximum are available for assignment, the RA will run out of registers to assign because there is nothing it can do to satisfy what you're asking for (i.e., it can't spill any of the other registers in the regdep that it has already assigned).

If you believe this is a valid state, can you describe what you think the RA should do at this point, or what the semantics should be for a regdep that requests more registers than are available in the ISA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants