Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always enable per-thread default stream #11281

Closed
wants to merge 2 commits into from

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Jul 15, 2022

This PR removes all options around PTDS and enables it all the time. This change resolves #9165, which is caused by the fact that our CI builds all use the conda build.sh script that doesn't set the PTDS option. Now PTDS is simply always enabled.

@vyasr vyasr added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue Java Affects Java cuDF API. improvement Improvement / enhancement to an existing function breaking Breaking change labels Jul 15, 2022
@vyasr vyasr self-assigned this Jul 15, 2022
@vyasr vyasr requested review from a team as code owners July 15, 2022 23:02
@vyasr vyasr requested review from ttnghia and PointKernel July 15, 2022 23:02
@github-actions github-actions bot added the gpuCI label Jul 15, 2022
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive change that impacts all downstream libraries that use libcudf. I think some discussion, notice, and deprecation are needed before making a change of this scope.

Making PTDS the default and allowing someone to opt out of it feels like a much more reasonable step here and then subsequent removal of the option can be discussed.

@shwina
Copy link
Contributor

shwina commented Jul 16, 2022

cc: @jakirkham @pentschev for thoughts as well.

@pentschev
Copy link
Member

Thanks Ashwin for the ping.

Overall this looks to me like a step in the right direction, but I agree with Keith that we need to give users some grace period for the move. For instance, have we done some actual testing with the rest of RAPIDS? If so, could you please point out to what kind of testing has been done so far? This is something I would like to try out with Dask, and knowing what has been tested so far and how it was tested would be of help.

With the above said, having an opt-out feature should be the minimum and we should have broader testing before making this change mandatory, but even then I think it may be useful to allow switching back to the old behavior for debugging and testing purposes. I'm happy to test this with Dask and help in any other testing we deem necessary.

@beckernick
Copy link
Member

cc @nirandaperera (Cylon)

@pentschev
Copy link
Member

I started doing some testing with the Dask-CUDA Merge Benchmark and noticed that cuDF still doesn't specify PTDS when calling CuPy. One can override that by adding CUPY_CUDA_PER_THREAD_DEFAULT_STREAM=1 to environment variables, but if this hasn't been considered before perhaps it would be a good time to specify PTDS to CuPy internally in cuDF rather than passing that responsibility onto the user?

@kkraus14
Copy link
Collaborator

I would like to walk back my statement of making this the default to start even. For example, code could currently do something like:

  • Run some libcudf function that produces a cudf::table, called source, but does not synchronize the stream
  • Spawns a handful of downstream threads that do a combination of host work and calling libcudf functions that use source as an input

This is a race with PTDS but is valid with the current default stream behavior.

@vyasr
Copy link
Contributor Author

vyasr commented Jul 18, 2022

@kkraus14 @pentschev thanks for raising those concerns, I am happy to have a longer discussion about this. For context, this came about because we realized that the behavior with PTDS is not currently being tested (except of course by downstream consumers like Spark-RAPIDS), and that raises the possibility of silent bugs there. During the discussion of whether we should simply be adding additional tests with PTDS, there was support for simply migrating to PTDS by default, and (more limited) support for removing the option entirely.

With your concerns in mind, here are a couple of questions:

  • What should the default be for conda packages? Yes, it is possible to create a race condition with PTDS enabled that would be safe using the legacy default stream. Conversely, with the current conda packages users cannot take advantage of PTDS. That includes e.g. libcugraph.
  • @kkraus14 Regarding the default build behavior, do you simply think that it's premature to switch, or would you be worried about ever switching?

@pentschev
Copy link
Member

pentschev commented Jul 18, 2022

Would we be able to build conda packages with a runtime switch -- like a CUDF_PER_THREAD_DEFAULT_STREAM=1 environment variable -- that would allow us to at least add some tests for libraries like Dask-CUDA? This way we could begin some testing right away. Additionally, we could publicize a deprecation notice somewhere (maybe https://docs.rapids.ai/notices/rgn ?) if we intend to transition that in the next release or so.

@vyasr vyasr marked this pull request as draft July 18, 2022 15:55
@kkraus14
Copy link
Collaborator

  • What should the default be for conda packages? Yes, it is possible to create a race condition with PTDS enabled that would be safe using the legacy default stream. Conversely, with the current conda packages users cannot take advantage of PTDS. That includes e.g. libcugraph.

Ideally we could have conda packages for both legacy default stream builds and PTDS builds and use a build variant to switch between them. Could also use a conda track_features (https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#track-features) to allow controlling which one is grabbed by default when someone just asks for libcudf.

  • @kkraus14 Regarding the default build behavior, do you simply think that it's premature to switch, or would you be worried about ever switching?

I think it's premature to switch without giving any notice, having any documentation, having warnings / deprecations, etc. to make it loud and clear to downstream users of libcudf that the behavior is going to change in a subsequent release.

I'm not clear on whether it would make sense to ever only support building with per thread default stream. Given libcudf hasn't landed on a specific stream management strategy quite yet, I think it's premature to try to answer this problem as well.

@nirandaperera
Copy link

cc @nirandaperera (Cylon)

@beckernick thanks for the headsup. @ahmet-uyar FYI

@vyasr
Copy link
Contributor Author

vyasr commented Jul 18, 2022

CC @robertmaynard @jrhemstad

@bdice bdice added 5 - DO NOT MERGE Hold off on merging; see PR for details and removed 3 - Ready for Review Ready for review by team labels Jul 28, 2022
@bdice bdice changed the base branch from branch-22.08 to branch-22.10 July 28, 2022 03:31
@github-actions
Copy link

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@vyasr
Copy link
Contributor Author

vyasr commented Oct 21, 2022

I'm going to close this since we are instead moving towards explicitly exposing streams. That will allow anything interfacing with libcudf (the C++ cudf API) to pass through stream 1 when those libraries are running with PTDS enabled. The question of how to enable control over streams at the Python level is a separate question that should be addressed in discussions around other issues/PRs like rapidsai/rmm#633 and #5922.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] gpuCI is not building cuDF with PER_THREAD_DEFAULT_STREAM
8 participants