-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA kernels should not make const copies of global and closure variables #9084
Comments
Noted also in the triage discussion - we will need to hold a reference to referenced globals and closure variables to ensure their lifetime exceeds that of the dispatcher. |
@gmarkall I would like to report one error I encountered. I patched my local numba installation using your patch and it works well for my current use case. I then try to use your test script above as a quick check of the numba treatment of global/closure arrays. The first time I run your test script, it succeeds. But all the later runs of the script will throw an error like this:
Does this have something to do with lifetime management you mentioned above? |
Are you caching the kernel? |
Got it. Thanks! Turning it off solves that problem |
No problem. A final patch implementing this would raise an error when attempting to cache a kernel that makes references to globals and/or closure variables. |
@gmarkall I wonder if there is a timeline when an official patch for this will be released? Thanks! |
@yxqd This is in the 0.59 milestone, so it should be part of that release. |
0.59 seems to be aimed for 2023-12: #8971 |
Got it. Thanks! |
Noted whilst investigating https://numba.discourse.group/t/avoid-multiple-copies-of-large-numpy-array-in-closure/2017
In many cases Numba will copy global and closure variables as constant arrays inside jitted functions and kernels (and will always attempt to do this for kernels). This is a problem because const memory is extremely limited in CUDA. The following simple example:
will fail with the error:
Since we have a mechanism for users to specify when a const array should be created, constant arrays should never be implicitly created in CUDA kernels, and they should always be opted-in to. Making this change will not be a breaking change, because Numba makes no guarantee about whether a copy is made or not: https://numba.readthedocs.io/en/stable/reference/pysemantics.html#global-and-closure-variables
The following change provides a proof-of-concept that demonstrates creating references to arrays instead of const copies:
This does require the example to be modified so that the data is already on the device:
And now works as expected:
To complete the implementation, the following considerations need to be addressed:
BaseContext.add_dynamic_addr()
suggests that addition of a dynamic address will disable caching, but this does not seem to be the case for CUDA at least.The text was updated successfully, but these errors were encountered: