-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporarily disable compression for communication protocols #957
base: branch-22.08
Are you sure you want to change the base?
Conversation
For GPU data, compression is worse rather than better because it provokes device-to-host transfers when they are unnecessary. This is a short-term fix for rapidsai#935, in lieu of hooking up GPU-based compression algorithms.
@charlesbluca @beckernick @VibhuJawa @ayushdg @randerzander just so you're aware of this, in case this shows up somehow in any workflows where you may test TCP performance. I believe this should make things faster for Dask-CUDA workflows and not have any negative impact. |
Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here |
I can't really explain that, is there any chance we're hitting another condition because TCP will force a D2H/H2D copy? Not sure if you've seen, but this is what this is the comment I left when I found that out: #935 (comment) |
TCP currently requires host to device copying regardless of whether there is compression or not. So disabling compression wouldn't fix that. If we have some way to send device objects over TCP, let's discuss how we can enable this in Distributed. |
Yes, I know, and this is not just currently, but will always be the case as it needs to go over Ethernet because there's no GPUDirectRDMA in that case. But anyway this is not what I was trying to say, apologies if that was unclear. I'm only imagining this is happening because it hits some other path instead of https://github.com/dask/distributed/blob/3551d1574c9cd72d60197cc84dd75702ebcfec54/distributed/protocol/cuda.py#L28 that you mentioned earlier. The thing is |
My commit message might be misleading. I got lost trying to follow where the compression was coming from, so it might not be host/device copies, but rather just that on a fast-ish network compressing big chunks of data is slower than just sending them over the wire. |
Gotcha ok that makes more sense. That said, users can still configure this themselves in those cases. Instead of having a different default (based on whether Distributed or Distributed + Dask-CUDA is used), which may be more confusing, why don't we document this and encourage users to explore different settings based on their needs? |
I'll have a look, here are some profiled callgraphs of the two options (compression auto vs compression None) |
tl;dr: using the TCP protocol never calls OK, so that explicit disabling only kicks in when In the tcp case, |
Right though disabling compression won't avoid the DtH/HtD transfers in the TCP case. Compression is allowed in that case since everything is on host. It just follows Distributed's default. Certainly users can disable this behavior on their own. We can also add this before our own benchmark scripts as well (if that is important to us). Would caution against setting this as a default in Dask-CUDA because
Here's a recent example of this kind of debugging due to a custom environment variable conda-forge added to scikit-build ( conda-forge/ctng-compiler-activation-feedstock#77 ) ( scikit-build/scikit-build#722 ). |
Particularly I find frustrating those defaults that are hard to really know beforehand, the compression default itself is a great example, something changed in Dask (pulling lz4 by default) that had to be debugged so I could understand. So here we would be setting yet-another implicit default that may be difficult to debug too (now in two layers), so I agree with your point John. I'm ok with just setting that as a default for benchmark scripts, for example, if Lawrence is ok with that too. |
rerun tests |
|
||
# Until GPU-based compression is hooked up, turn off compression | ||
# in communication protocols (see https://github.com/rapidsai/dask-cuda/issues/935) | ||
dask.config.config["distributed"]["comm"]["compression"] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this to the benchmark script then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with that, but we should document this in a good, visible manner. Lately there has been desire to make defaults more performance-friendly for newcomers, and this is a pretty significant drawback at least for this one workflow. This new behavior could cause users to try out Dask and immediately rule it out, as well as GPUs entirely, due to very bad performance that comes from this.
Perhaps people like @beckernick @VibhuJawa @randerzander @ayushdg could voice opinions on this matter too, especially if they are running other workflows without UCX lately that would potentially show if this change is indeed significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to change the default, would recommend raising this in Distributed for the reasons already discussed above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with changing the default in Dask-CUDA if it is well documented but we should make sure that we don't overwrite a non-default value!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to change the default, would recommend raising this in Distributed for the reasons already discussed above
I do not oppose to raising this in Distributed if someone is interested in driving the conversation. However, the problem with discussing this in a broader aspect is that the current default may make sense from a CPU standpoint, in which case we should still consider having a non-default value in Dask-CUDA if it makes sense for GPU workflows, which it currently seems to be the case.
I am okay with changing the default in Dask-CUDA if it is well documented but we should make sure that we don't overwrite a non-default value!
I agree, we must document it well and ensure we don't overwrite a user-defined value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do some profiling. I tried to match the performance loss up with a performance model but couldn't make sense of it. Suppose that a point to point message takes
Let's rearrange again to get
I don't have numbers for
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## branch-22.08 #957 +/- ##
==============================================
Coverage ? 0.00%
==============================================
Files ? 16
Lines ? 2107
Branches ? 0
==============================================
Hits ? 0
Misses ? 2107
Partials ? 0 ☔ View full report in Codecov by Sentry. |
This PR has been labeled |
This PR has been labeled |
For GPU data, compression is worse rather than better because it
provokes device-to-host transfers when they are unnecessary.
This is a short-term fix for #935, in lieu of hooking up GPU-based
compression algorithms.