-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for work_group_memory extension #1984
base: master
Are you sure you want to change the base?
Conversation
Regarding the stability of the extension: The extension was originally introduced June last year: intel/llvm#13725 Since then, one minor fix (intel/llvm#14937) and one minor API change to constructor not used in this implementation (intel/llvm#15933) were made. The implementation was added in intel/llvm#15861 in October, so should be included in the 2025.1 oneAPI releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution @sommerlukas !
Please cite the extension document from intel/llvm or elsewhere.
It would be great to add a test exercising this kernel argument type.
Perhaps libsyclinterface/tests/test_sycl_queue_submit_local_accessor_arg
could be used to add such a test.
I am happy to work with you to build the test and add it.
Thanks for your feedback @oleksandr-pavlyk!
Extension document is here. As I had discussed with @ndgrigorian offline last week, the implementation in DPCTL focuses on the variant with runtime-specified size: |
Sure, I can look into adding a test. As we would need a kernel to execute to exercise this, what's the recommended way of writing that kernel? |
The first step would be to use DPC++, or OpenCL compiler to create SPV for a kernel that consumes Then one would need to use that SPV like it is used in the test for local_accessor argument. |
I've taken a look at The current implementation of this PR only exposes the |
Builds seem to be failing, likely because the It would probably work best in a helper file—for an example, we use the experimental We define the macro and use the header file to indirectly include |
The use in that header isn't very idiomatic. Feature test macros for SYCL extensions are not supposed to be set by the application. Instead, the implementation should set them to a value defined in the extension specification, as described in the SYCL specification. In DPC++, this is implemented through the
If CI wants to build with the 2025.0 release compiler until a newer release comes out, I need to conditionally enable the feature based on macro and throw an exception in case somebody tries to use it with DPC++ version 2025.0 or older. |
I see, good to know. I noticed that some of the experimental extensions don't need to be explicitly set, but Making it conditional based on compiler version would definitely be preferable. |
This PR needs work, I suggest we take time to get it right, and it also brought to light that I have created a branch in my fork that adds changes on top of Lukas's changes in this PR (sommerlukas#1).
What is still to do:
dpctl code to build interoperability kernel bundle may need updating. |
I'm also working on improving the implementation locally, I was planning to push as soon as I have tests ready. What I've done so far:
The reason I introduced
I had considered that, but there was no precedent in the Python interface, as support for If we add this variant, IMHO it should be in addition to the current version where the number of bytes is specified. Unless we build something complicated with Cython/ctype types, we would otherwise limit the possible types of array elements to a fixed list, whereas the DPC++ extension allows user-defined types. |
@sommerlukas The PR sommerlukas#1 was added with my changes to your branch in your fork. If you do not intend to accept it because you have already worked something out locally, please let me know, and I would break it up into smaller changes and open follow up PRs. |
Thanks for bringing that PR, I think there's some overlap with what I've done locally. Maybe it make sense to break out the changes for exposing For the changes relating to |
2356410
to
ddd8349
Compare
I've pushed my local changes now. The work group memory extension is now also added to I also added a test for the Python interface and a test for |
This may make the mapping between the class and its SYCL counter-part less confusing, though there are downstream implications. |
As I mentioned offline, there's also the possibility to replace the interop to create The other benefit is that it integrates into If there's downstream implications, maybe it makes sense to build a new Extensions:
|
My testing was done using DPC++ runtime from SYCL nightly, which may have some bugs. Using that runtime library, only |
@sommerlukas While |
Apart from details of the implementation suggested here, what is the idea behind exposing work-group-memory kernel argument in I feel that creating |
Yeah, this was not meant for this PR, more medium-term. Support for SPIR-V and OpenCL source languages should already be in the 2025.0 package released a few weeks ago, so 2025.0 should match DPCTL's existing support. SYCL source language is going to be in one of the other 2025.X releases soon. |
The E.g., users could compile a OpenCL kernel like this with DPCTL and pass a __kernel void example_kernel(__global float *input, __local float *local_data) {
...
} Our concrete use case is with SYCL kernels compiled via DPC++ that use |
I'm happy to move it to some other place if you and @ndgrigorian prefer. I'm also fine to not mark it experimental at all if that's preferred. |
@oleksandr-pavlyk @ndgrigorian Any hunch why the conda tests on CI fail? I can't reproduce the issue locally with my |
Thanks for explaining, @sommerlukas , this is useful. Please add a test for OpenCL device that uses local pointer in the kernel specified in OpenCl and that is invoked using work-group-memory. |
Extend kernel argument handling to add support for the work_group_memory extension, allowing users to dynamically allocate local memory for a kernel. Signed-off-by: Lukas Sommer <[email protected]>
b952f90
to
04efec6
Compare
I've added a test using an OpenCL kernel with |
I think I more or less agree with @oleksandr-pavlyk , putting it aside in a special namespace which marks it experimental doesn't do much for a Python-level user—it's an implementation detail and may or may not change in the compiler in the future. If there's a check for availability already present, we can instead document that the compiler version is relevant to whether the extension works. |
I'll take a look at the CI too, but the best way I would imagine to debug would be to download the artifact and install it into a Conda environment, if you can't reproduce locally. |
Ok, makes sense. I've removed the |
Removing the Building and testing on Windows seems fine. For the failure on Linux, I'm not sure this is related to my changes. The error message points to
@ndgrigorian @oleksandr-pavlyk Any idea what could be causing this? |
@oleksandr-pavlyk @sommerlukas nice set of changes. numba-dpex also supports local accessors. |
The failure looks like one I would see when building with 3.13t (free-threaded Python). Indeed, looking at the log— |
@sommerlukas Also a quick glance at gh-1991 would be appreciated, if any of it is necessary for your work—this wasn't clear to me. |
Extend kernel argument handling to add support for the work_group_memory extension, allowing users to dynamically allocate local memory for a kernel.