Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start tagging kernels #1427

Merged
merged 9 commits into from
May 17, 2024
Merged

start tagging kernels #1427

merged 9 commits into from
May 17, 2024

Conversation

jcosborn
Copy link
Contributor

@jcosborn jcosborn commented Jan 9, 2024

This is an initial set of changes to start tagging kernel operations that require special resources (e.g. shared memory) or other treatment (e.g. block synchronization). This initial version only enables tagging for ThreadLocalCache and thread_array.

@jcosborn jcosborn marked this pull request as ready for review January 10, 2024 22:27
@jcosborn jcosborn requested review from a team as code owners January 10, 2024 22:27
template <typename Param> struct AllFiveAllSevenLink
{
template <typename Param> struct AllFiveAllSevenLinkOps {
static constexpr int cacheLen = Param::sig_positive ? 3 : 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_len instead of cacheLen. We're working towards the underscore convention instead of camelCase for simpler variables, nColor is just a pervasive beast we're afraid to deal with thus far...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weinbe2
Copy link
Contributor

weinbe2 commented Apr 25, 2024

Well, first of all, this is a very impressive design/implementation, and I like it as a whole.

I can see the approach going on here---doing all of the work at compile time---and I appreciate it for what it is. My other point of reference is the way Kokkos hands around its "scratch" memory pointers, putting all of the onus on the user to get it right, which has cost me a lot of time and sanity.

I'm good on all of the plumbing work (sans one max vs sum question, but since the code isn't breaking I'm sure there's something my brain can't handle right now). Nearly all of my comments were related to my thoughts on conventions and some requests for more detailed documentation. I emphasized conventions so much because they bounced around a bit in the different files---which I understand is just a byproduct of the initial development process---it just made it hard to follow what was going on at first, so I'd want to agree of this type of thing before it gets merged!

@jcosborn jcosborn requested a review from a team as a code owner April 29, 2024 04:10
@jcosborn
Copy link
Contributor Author

Thanks for taking time to review @weinbe2. I think I've addressed all the comments except for the issue of whether to make an alias for *this or pass it directly. I can change it to an alias if there is an overall preference to do so.

I added more documentation in the headers. Let me know if there is anything missing or could be made clearer.

I also merged in the latest develop so it should mostly ready now, pending final review.

@weinbe2
Copy link
Contributor

weinbe2 commented Apr 29, 2024

Thanks @jcosborn , I think the documentation you added more than suffices. It's made both the underlying assumptions and proper usage clearer. The PR is good to go in my book.

As for:

[...] except for the issue of whether to make an alias for *this or pass it directly.

I noted this elsewhere, but to make the thread easier to follow: I'm ultimately fine leaving it as *this.

@maddyscientist , do you want to do a last review?


__device__ __host__ T &operator[](int i) { return array_[i]; }
__device__ __host__ const T &operator[](int i) const { return array_[i]; }
__device__ __host__ inline T &operator[](const int i) { return array_[i]; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set these arguments as const? Seems rather redundant for copy by value integers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason, I've removed them and a few unnecessary inlines too (since member functions are inline by default).
3b71bb6

@maddyscientist
Copy link
Member

No objections from me with this PR. @weinbe2 did you do any run-time sanity checks as part of your review?

@weinbe2
Copy link
Contributor

weinbe2 commented May 7, 2024

cscs-ci run

Copy link
Contributor

@weinbe2 weinbe2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections from me with this PR. @weinbe2 did you do any run-time sanity checks as part of your review?

All correctness checks pass.

Runtime is good. There's a reproducible sub-sub-percentile regression on H100-80GB-HBM3 with CUDA 12.4 at a 24^4 volume, nothing I see as worth digging into. This has my approval.

@maddyscientist what are your thoughts on the cscs completing-in-time woes?

@maddyscientist
Copy link
Member

cscs-ci run

Copy link
Member

@maddyscientist maddyscientist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I've re-kicked off the CSCS CI, hopefully it squeaks through this time and then we can merge.

@maddyscientist maddyscientist merged commit 2aa17e2 into develop May 17, 2024
12 checks passed
@maddyscientist maddyscientist deleted the feature/sycl-merge branch May 17, 2024 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants