-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support enabling PTDS via CUDA_PTDS environment variable #633
base: branch-21.06
Are you sure you want to change the base?
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, are you sure this is the right place to do this? Second, is thus too heavy-handed? Third, I think the env variable should not start with CUDA. It should be very explicit, clear and dangerous sounding: RMM_FORCE_PER_THREAD_DEFAULT_STREAM.
When you say "this is the right place", are you referring to RMM itself, to the Cython binding or to the
Why is this heavy-handed if we only allow the user to explicitly enable it?
I'm fine with a different variable name, I chose a very general-purpose one only to see how people feel about it. Should we have a very specific name like what you suggested or have a variable name that all projects could share and not only RMM, but also Numba, CuPy, etc.? |
I mean the constructor of CudaStreamView. I suppose as long as there is still a way to explicitly request
There were two things I didn't like about the name. One, it starts with CUDA, which should probably be reserved for official CUDA variables. Two, it has an acronym that many people won't know (CUDA docs never use the term "PTDS" as far as I know). |
Sorry, this is still not 100% clear to me. Are you saying that you want a way to explicitly request
I understand all that, my intent here is asking for additional input on whether we want an RMM-specific variable (and consequently specific variables for other libraries, e.g., |
In Numba we'd probably want to have an env var called |
This is what I thought too, although CuPy doesn't have a configuration module it follows a similar pattern for environment variables. My suggestion with |
I mean that it already exists: pass rmm/include/rmm/cuda_stream_view.hpp Lines 112 to 125 in e7f0726
|
I generally type env variables into scripts, not at the command line, so the length doesn't matter much. PTDS is not a well-known acronym, and it's easily confused with PTSD. |
I spent some time now trying to do that and it seems indeed there's no simple, clean way to do so. In d789117 I exposed the definitions for The solution I see are to define custom Python types, perhaps just based on |
I do type variables, explicit is much better than implicit for me, throughout my life I wasted countless hours on implicit definitions from configuration files, system defaults, etc., thus I hate those things that are easy to forget, especially as time passes.
I agree it isn't a well-known acronym, but no acronyms or names are well-known from day zero. And if it's about confusing with similar acronyms from completely different areas of knowledge, we might as well just tell everyone to never use acronyms again. :) |
I only have a basic working experience of Cython so I'm not sure what a better suggestion would look like :-(. |
Just a drive-by comment...It is not possible to set the default value of Python cpdef enum cudaStream:
cudaStreamLegacy = 0
cudaStreamPerThread
cdef f(cudaStream s=cudaStreamLegacy):
pass
cdef class XYZ:
def __cinit__(self, cudaStream s=cudaStreamLegacy):
pass |
Thanks @leofang , that's indeed very informative. I'm feeling less confused now, that was pretty much the only solution I could arrive at and your confirmation is quite helpful! |
I changed |
I'm firmly against PTDS in the name of the environment variable. It's inconsistent with the rest of RMM and with CUDA. |
Is this 1ee52af ok for you? |
Please retarget to |
I updated this PR to work with the latest branch-0.18. Would appreciate some new feedback here. |
Trying to control per-thread default stream with environment variables does not feel like a good idea to me. It would be one thing if there was a CUDA provided environment variable that controlled this behavior, but trying to do it ad hoc per library just feels like a bad idea. Furthermore, the existing environment variable deviates from the standard meaning of "per-thread default stream". Normally enabling PTDS changes the meaning of the value |
I mostly agree with that. However, we need anyway to control the PTDS behavior per library anyway, i.e., we'll need to add support to all of them somehow, but we need to start somehow/somewhere. Furthermore, using environment variables has been the most reasonable approach I've seen to enable a certain behavior globally in Python without requiring changes to user code as well as 3rd-party dependencies to such libraries, which is good at an initial stage without breaking existing code.
This is also true, and by no means this is intended as a flawless or future proof approach, but it allows our team and other advanced users to begin testing and profiling with PTDS, particularly with Dask. For now, my current assumption is that the libraries we'll use with PTDS will rely on RMM's default, thus using With the above said, I'm open to suggestions on improving this that will allow us to at least begin some testing, without each new user being required modify and install RMM from source. For the future, I think we may prefer to address each piece of code individually to treat PTDS at a local level, but that certainly involves touching pretty much all Python libraries that is used by RAPIDS today, which is going to be inefficient and even more intrusive at this stage. |
I don't think this is true. So long as every library exposes a stream argument to indicate on which stream an operation should occur, then one need only specify the
|
Yes, this is what I meant, sorry for being unclear. That's true, and probably the right move long-term, but right now this is still a bit distant, as we really need to make sure we can pass the stream to all those libraries. |
I think this should be re-targeted to 0.19 |
Have updated the upstream branch targeted by the PR. Looks like that updated GitHub Projects automatically. Not sure if there's anything else needed in terms of retargeting |
if int(os.environ.get("RMM_PER_THREAD_DEFAULT_STREAM", "0")) != 0: | ||
DEFAULT_STREAM = Stream._from_cudaStream_t(cuda_stream_per_thread.value()) | ||
else: | ||
DEFAULT_STREAM = Stream._from_cudaStream_t(cuda_stream_default.value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to allow this to be changed during runtime of an application or only at import time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume at import time as this is what I think cupy does now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well CuPy reads the env var (same is done here) so technically there it could be changed at runtime by overwriting os.environ['CUPY_CUDA_PER_THREAD_DEFAULT_STREAM']
. However such operation is really a bad practice that is hard to be fool-proofed.
This PR has been labeled |
I haven't been able to continue working on PTDS, but hopefully can come back to it in the next month or so. |
For reference, I implemented something similar in Numba ( see numba/numba#6936) and went with
|
Thanks @gmarkall , I still think having environment variables is a great step forward at least for the short-/mid-term, as this allows us to test things before committing to big changes in the ecosystem. |
This PR has been labeled |
This PR has been labeled |
Following up on a conversation with @harrism @jakirkham @rongou yesterday, I did a small change to the Cython bindings where we can enable PTDS via a
CUDA_PTDS
runtime environment variable, which is different from #480 where rebuilding the Python package is necessary. This allows us to test RMM together with other Python libraries in a non-intrusive fashion, requiring users to explicitly enable PTDS.It's important to notice that this only works for the RMM API, for example
rmm.DeviceBuffer
, but using Numba to do the copying will still result on using the default stream only. To add Numba support, we may do a similar change there. Tagging @gmarkall for awareness too.I currently used stream
2
because I didn't find an existing definition in RMM, I'm happy to change that appropriately to a definition, but I'm not sure where that should live yet.