-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start tagging kernels #1427
start tagging kernels #1427
Conversation
include/kernels/hisq_paths_force.cuh
Outdated
template <typename Param> struct AllFiveAllSevenLink | ||
{ | ||
template <typename Param> struct AllFiveAllSevenLinkOps { | ||
static constexpr int cacheLen = Param::sig_positive ? 3 : 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_len
instead of cacheLen
. We're working towards the underscore convention instead of camelCase for simpler variables, nColor
is just a pervasive beast we're afraid to deal with thus far...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, first of all, this is a very impressive design/implementation, and I like it as a whole. I can see the approach going on here---doing all of the work at compile time---and I appreciate it for what it is. My other point of reference is the way Kokkos hands around its "scratch" memory pointers, putting all of the onus on the user to get it right, which has cost me a lot of time and sanity. I'm good on all of the plumbing work (sans one |
Thanks for taking time to review @weinbe2. I think I've addressed all the comments except for the issue of whether to make an alias for I added more documentation in the headers. Let me know if there is anything missing or could be made clearer. I also merged in the latest develop so it should mostly ready now, pending final review. |
Thanks @jcosborn , I think the documentation you added more than suffices. It's made both the underlying assumptions and proper usage clearer. The PR is good to go in my book. As for:
I noted this elsewhere, but to make the thread easier to follow: I'm ultimately fine leaving it as @maddyscientist , do you want to do a last review? |
|
||
__device__ __host__ T &operator[](int i) { return array_[i]; } | ||
__device__ __host__ const T &operator[](int i) const { return array_[i]; } | ||
__device__ __host__ inline T &operator[](const int i) { return array_[i]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set these arguments as const
? Seems rather redundant for copy by value integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real reason, I've removed them and a few unnecessary inline
s too (since member functions are inline by default).
3b71bb6
No objections from me with this PR. @weinbe2 did you do any run-time sanity checks as part of your review? |
cscs-ci run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections from me with this PR. @weinbe2 did you do any run-time sanity checks as part of your review?
All correctness checks pass.
Runtime is good. There's a reproducible sub-sub-percentile regression on H100-80GB-HBM3 with CUDA 12.4 at a 24^4 volume, nothing I see as worth digging into. This has my approval.
@maddyscientist what are your thoughts on the cscs completing-in-time woes?
cscs-ci run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. I've re-kicked off the CSCS CI, hopefully it squeaks through this time and then we can merge.
This is an initial set of changes to start tagging kernel operations that require special resources (e.g. shared memory) or other treatment (e.g. block synchronization). This initial version only enables tagging for ThreadLocalCache and thread_array.