-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot add more virtual registers then number physical registers to RegisterDependencyConditions #7672
Comments
The register dependency condition mechanism was never designed to handle the situation where more virtual registers are requested to be assigned than there are physically available. That situation is usually (always?) a bug in the code that created the regdep because it is asking the register assigner to do the impossible. There are situations where, for a given virtual reg, a NoReg and a real reg dependency are added to the regdep. If that situation is possible, there is a "union" function on the register dependency conditions that should merge those constraints together. |
Is there a conservative fix (or a real one if that is easy enough) that comes to mind to address the issue for 32-bit while re-enabling the original code developed for the instrinsic ? |
@vijaysun-omr The simplest thing is not to do loop unrolling. |
@0xdaryl I think you should be able to add as many NoReg dependencies as you please to RegisterDependencyConditions |
I believe that's only true for "dummy" virtual registers whose lifetime is limited to the regdep and are used to evacuate the register assigned to a virtual. I think in that instance you only need one available register of each kind to satisfy the dummy assignment. Is that what you have in this circumstance, or are these virtual registers that are actually used in instructions? |
32-bit performance is not as critical as 64-bit performance in general. So, if a simple fix is possible to keep 32-bit functional (even disabling the new optimization, or disabling the unrolling on 32-bit should be considered seriously) while re-enabling for 64-bit, then I would encourage that. |
Do you believe there is still a problem in register assignment? If so, perhaps sharing a log with the relevant depcond would help clarify your observation of incorrect behaviour. If not, please close. |
Whether or not the register assigner was designed to handle this case, I think we ought to make a change that allows this. This type of situation happens a lot, especially if you plan on supporting 32-bit. In addition, if you have any sort of control flow, you are forced to use register dependency conditions even if you don't require reservation of specific registers. Currently, there is no documentation on this topic, and I would not expect any developer to know about the quirks of RA. Lots of additional domain knowledge is required to foresee and fix these problems. |
Please attach a codegen log (or a snippet with enough context) for what you think needs to be fixed. What I think you're asking for is not something our RA is capable of (assigning |
I have attached a log using the vectorized integer hashcode as an example. I allocated an additional 7 VRF registers with NoReg dependencies. They are used with |
Thanks. At L0193 there is a register dependency with 1 GPR and 17 VRFs. All of the conditions are NoReg, which means you're not asking for specific registers for any of these. If we're generating code for a processor that only supports XMMs or YMMs, there are only 16 registers maximum available for simultaneous assignment. For ZMM, while there are 32 real registers available in the architecture, the register assigner is only capable of assigning 16 anyway (zmm0-15). This is a constraint and limitation that we should fix at some point. A regdep essentially sets the state of the RA at some point in the instruction stream. It specifies conditions that must be satisfied simultaneously. Once a condition is satisfied by the RA, that reg is "blocked" (a register state) so that it is not spilled to satisfy other conditions. It is "unblocked" after the regdep assignment is complete. Since your regdep is asking for 17 virtual VRFs registers to be assigned at L0193 but only 16 registers maximum are available for assignment, the RA will run out of registers to assign because there is nothing it can do to satisfy what you're asking for (i.e., it can't spill any of the other registers in the regdep that it has already assigned). If you believe this is a valid state, can you describe what you think the RA should do at this point, or what the semantics should be for a regdep that requests more registers than are available in the ISA? |
OpenJ9 recently hit this assert when generating an inline intrinsic on 32-bit.
If register dependency conditions has more pre/post conditions than available number of physical registers, register assignment fails to spill registers when needed. I was able to reproduce the issue on x86_64 by adding a few extra registers live across the vectorizedHashCode() intrinsic and adding them to the RegisterDependencyConditions. Excluding these live registers from the RegisterDependencyConditions produces register spills as expected with no crash.
The text was updated successfully, but these errors were encountered: