-
Notifications
You must be signed in to change notification settings - Fork 393
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171)
Instead of patching the tag component with TVECTOR in every instruction that produces a vector value, we now use a separate IR instruction to do this. This reduces implementation redundancy, but more importantly allows for a class of optimizations: - NUM_TO_VECTOR previously patched the component unconditionally but the result was used only in MUL/DIV_VEC instructions that ignore it anyway; we can now remove this. - ADD_VEC et al can now forward the source of TAG_VECTOR instruction of either input; this shortens the latency chain and in the future could allow us to generate optimal vector instruction sequence once the temporary stores are marked as dead. - In the future on X64, ADD_VEC et al will be able to analyze the input instruction and remove tag masking conditionally. This is not part of this PR as it requires a decision around expected FP environment and/or the necessity of the existing masking to begin with. I've also renamed NUM_TO_VECTOR to NUM_TO_VEC so that "VEC" always refers to "3 float values" and for consistency with ADD/etc. Note: ADD_VEC input forwarding is currently performed unconditionally; it may or may not increase the spills that can't be reloaded from the stack. On A64 this makes the Taylor series computation a tiny bit faster (11.3ns => 11.0ns) as it removes the redundant ins instructions along the NUM_TO_VEC path. Curiously, the optimization of forwarding TAG_VECTOR input to arithmetic instructions actually has a small penalty as without it this PR runs at 10.9 ns. I don't know if this is a property of the benchmark though, as I just noticed that in this benchmark type inference actually fails to infer parts of the computation as a vector op. If desired I will happily omit this part of the change and we can explore that separately.
- Loading branch information
Showing
9 changed files
with
192 additions
and
74 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.