-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformations: (memref-stream-interleave) always take bigger factor #3721
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3721 +/- ##
=======================================
Coverage 91.31% 91.31%
=======================================
Files 468 468
Lines 58549 58580 +31
Branches 5652 5653 +1
=======================================
+ Hits 53463 53495 +32
- Misses 3634 3635 +1
+ Partials 1452 1450 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Yes this was roughly what I used as well if I recall correctly. The only thing to keep in mind is the much higher register pressure, but as you said, for the kernels we care about we are unlikely to run out of registers.
Something to be aware about in the future however. Maybe this should be a note in the pass description?
Would you suggest something scarier than this?
|
Looks good to me :) |
The search range is bound by `pipeline-depth * 2` as very large interleaving factors | ||
would cause too much register pressure, potentially running out of registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search range is bound by `pipeline-depth * 2` as very large interleaving factors | |
would cause too much register pressure, potentially running out of registers. | |
The search range is bound by `pipeline-depth * 2` as very large interleaving factors | |
can potentially increase register pressure or even exhaust all available registers. |
I know this isn't directly related, but the wording has been annoying me for a bit.
Take it or leave it :)
When I initially implemented this pass, I assumed that it was better to err on the side of caution, to take the smallest factor above 3. In practice, this isn't the best approach, as a factor of 4 is only good in the scenario with no function cache misses, and it's a safer bet to take bigger values if possible. We also learned from the Snitch backend paper experiments that the register pressure never goes close to the limit for the kernels we care about, so this should not be that risky.
My understanding is that this change is what @zero9178 implemented in Quidditch already.
I'm currently working on automatically pruning the space of possible values, and merging this PR would help keep the logic relatively simple, while sharing some helpers between this heurist-based pass, and the new compile-and-evaluate-based-pass.