Temporarily disable compression for communication protocols #957

wence- · 2022-07-21T18:29:12Z

For GPU data, compression is worse rather than better because it
provokes device-to-host transfers when they are unnecessary.

This is a short-term fix for #935, in lieu of hooking up GPU-based
compression algorithms.

For GPU data, compression is worse rather than better because it provokes device-to-host transfers when they are unnecessary. This is a short-term fix for rapidsai#935, in lieu of hooking up GPU-based compression algorithms.

pentschev · 2022-07-21T19:12:32Z

@charlesbluca @beckernick @VibhuJawa @ayushdg @randerzander just so you're aware of this, in case this shows up somehow in any workflows where you may test TCP performance. I believe this should make things faster for Dask-CUDA workflows and not have any negative impact.

jakirkham · 2022-07-21T20:08:23Z

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

pentschev · 2022-07-21T20:35:49Z

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

I can't really explain that, is there any chance we're hitting another condition because TCP will force a D2H/H2D copy? Not sure if you've seen, but this is what this is the comment I left when I found that out: #935 (comment)

jakirkham · 2022-07-21T20:38:24Z

TCP currently requires host to device copying regardless of whether there is compression or not. So disabling compression wouldn't fix that. If we have some way to send device objects over TCP, let's discuss how we can enable this in Distributed.

pentschev · 2022-07-21T20:42:23Z

TCP currently requires host to device copying regardless of whether there is compression or not. So disabling compression wouldn't fix that.

Yes, I know, and this is not just currently, but will always be the case as it needs to go over Ethernet because there's no GPUDirectRDMA in that case. But anyway this is not what I was trying to say, apologies if that was unclear. I'm only imagining this is happening because it hits some other path instead of https://github.com/dask/distributed/blob/3551d1574c9cd72d60197cc84dd75702ebcfec54/distributed/protocol/cuda.py#L28 that you mentioned earlier. The thing is lz4 gets installed by default in Dask now, and that's what caused the behavior change, so I can only imagine that CUDA-specific config is being ignored/overriden.

wence- · 2022-07-21T22:17:22Z

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

My commit message might be misleading. I got lost trying to follow where the compression was coming from, so it might not be host/device copies, but rather just that on a fast-ish network compressing big chunks of data is slower than just sending them over the wire.

jakirkham · 2022-07-21T22:21:28Z

Gotcha ok that makes more sense. That said, users can still configure this themselves in those cases. Instead of having a different default (based on whether Distributed or Distributed + Dask-CUDA is used), which may be more confusing, why don't we document this and encourage users to explore different settings based on their needs?

wence- · 2022-07-22T09:28:39Z

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

I'll have a look, here are some profiled callgraphs of the two options (compression auto vs compression None)

wence- · 2022-07-22T10:20:44Z

Curious how was it determined compression was happening with GPU data? Asking as this is explicitly disabled in Distributed and has been this way for a while. Would be good to get a better understanding about what is happening here

I'll have a look, here are some profiled callgraphs of the two options (compression auto vs compression None)

tl;dr: using the TCP protocol never calls cuda_dumps/loads so it's not "GPU data" any more, and so those overrides don't kick in.

OK, so that explicit disabling only kicks in when to_frames is called with a serializer argument that includes "cuda". This happens when the UCX comm backend is used which explicitly sets serializers in write and read. In that case, cuda_dumps is called which produces frames that have device buffers in them (which UCX can handle).

In the tcp case, serializers is always None on call, and so tcp.write calls to_frames which calls into dask_dumps (for which there are handlers registered to deal with cudf dataframes and call host_serialize). But now the header of the message doesn't contain compression overrides, with the consequence that the host dataframe buffers are now compressed.

jakirkham · 2022-07-22T16:13:09Z

Right though disabling compression won't avoid the DtH/HtD transfers in the TCP case.

Compression is allowed in that case since everything is on host. It just follows Distributed's default.

Certainly users can disable this behavior on their own. We can also add this before our own benchmark scripts as well (if that is important to us).

Would caution against setting this as a default in Dask-CUDA because

It has different behavior than Distributed
Users will not realize adding Dask-CUDA has changed this default somehow
This may not be the right choice for other workflows
It can lead to lengthy debugging by developers building on Dask-CUDA

Here's a recent example of this kind of debugging due to a custom environment variable conda-forge added to scikit-build ( conda-forge/ctng-compiler-activation-feedstock#77 ) ( scikit-build/scikit-build#722 ).

pentschev · 2022-07-22T18:19:32Z

Particularly I find frustrating those defaults that are hard to really know beforehand, the compression default itself is a great example, something changed in Dask (pulling lz4 by default) that had to be debugged so I could understand. So here we would be setting yet-another implicit default that may be difficult to debug too (now in two layers), so I agree with your point John.

I'm ok with just setting that as a default for benchmark scripts, for example, if Lawrence is ok with that too.

pentschev · 2022-07-25T17:55:41Z

rerun tests

jakirkham · 2022-07-25T18:14:57Z

dask_cuda/__init__.py

+
+# Until GPU-based compression is hooked up, turn off compression
+# in communication protocols (see https://github.com/rapidsai/dask-cuda/issues/935)
+dask.config.config["distributed"]["comm"]["compression"] = None


Should we move this to the benchmark script then?

I'm ok with that, but we should document this in a good, visible manner. Lately there has been desire to make defaults more performance-friendly for newcomers, and this is a pretty significant drawback at least for this one workflow. This new behavior could cause users to try out Dask and immediately rule it out, as well as GPUs entirely, due to very bad performance that comes from this.

Perhaps people like @beckernick @VibhuJawa @randerzander @ayushdg could voice opinions on this matter too, especially if they are running other workflows without UCX lately that would potentially show if this change is indeed significant.

If we want to change the default, would recommend raising this in Distributed for the reasons already discussed above

I am okay with changing the default in Dask-CUDA if it is well documented but we should make sure that we don't overwrite a non-default value!

If we want to change the default, would recommend raising this in Distributed for the reasons already discussed above

I do not oppose to raising this in Distributed if someone is interested in driving the conversation. However, the problem with discussing this in a broader aspect is that the current default may make sense from a CPU standpoint, in which case we should still consider having a non-default value in Dask-CUDA if it makes sense for GPU workflows, which it currently seems to be the case.

I am okay with changing the default in Dask-CUDA if it is well documented but we should make sure that we don't overwrite a non-default value!

I agree, we must document it well and ensure we don't overwrite a user-defined value.

Let me do some profiling. I tried to match the performance loss up with a performance model but couldn't make sense of it. Suppose that a point to point message takes $T_p(b) := \alpha + \beta b$ seconds to send $b$ bytes, and that compression takes $T_c(b) = \gamma + \nu b$ seconds to (de)compress $b$ bytes, with a compression factor of $c$. Then sending a raw message costs $T_p(b)$ and sending a compressed message costs $2T_c(b) + T_p(b/c)$. So it's worthwhile to compress whenever $T_p(b) > 2T_c(b) + T_p(b/c)$. So we have $\alpha + \beta b > 2T_c(b) + \alpha + \beta b/c \Leftrightarrow \beta b/c > 2(\gamma + \nu b)$.

Let's rearrange again to get $b(\beta/c - 2\nu) > 2\gamma \Leftrightarrow b > 2\gamma / (\beta/c - 2\nu)$ (If $\gamma = 0$ then compression makes sense if $\beta/c - 2\nu > 0$). In this latter case, that means that it's worthwhile to compress if the "compression bandwidth $C := 1/\nu$" and network bandwidth $B := 1/\beta$ are related by $C > 2 c B$. e.g. for a compression ratio of 2, if we can compress more than four times as fast as we can send over the network, it's worthwhile compressing.

I don't have numbers for $\alpha$, $\beta$, $\gamma$, and $\nu$ but if they were measured you could put an adaptive compression model in and use that.

codecov-commenter · 2022-07-25T18:19:42Z

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (branch-22.08@435dae8). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
dask_cuda/__init__.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             branch-22.08    #957   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?      16           
  Lines                   ?    2107           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?    2107           
  Partials                ?       0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2022-09-04T16:03:08Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-10-04T17:14:30Z

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Temporarily disable compression for communication protocols

0e2b75b

For GPU data, compression is worse rather than better because it provokes device-to-host transfers when they are unnecessary. This is a short-term fix for rapidsai#935, in lieu of hooking up GPU-based compression algorithms.

wence- added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Jul 21, 2022

wence- requested a review from a team as a code owner July 21, 2022 18:29

github-actions bot added the python python code needed label Jul 21, 2022

wence- mentioned this pull request Jul 21, 2022

Performance regression in cuDF merge benchmark #935

Open

pentschev added the bug Something isn't working label Jul 21, 2022

pentschev approved these changes Jul 21, 2022

View reviewed changes

jakirkham reviewed Jul 25, 2022

View reviewed changes

github-actions bot added inactive-30d and removed inactive-30d labels Sep 4, 2022

github-actions bot added the inactive-30d label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporarily disable compression for communication protocols #957

Temporarily disable compression for communication protocols #957

wence- commented Jul 21, 2022

pentschev commented Jul 21, 2022

jakirkham commented Jul 21, 2022

pentschev commented Jul 21, 2022

jakirkham commented Jul 21, 2022 •

edited

Loading

pentschev commented Jul 21, 2022

wence- commented Jul 21, 2022

jakirkham commented Jul 21, 2022

wence- commented Jul 22, 2022

wence- commented Jul 22, 2022

jakirkham commented Jul 22, 2022

pentschev commented Jul 22, 2022

pentschev commented Jul 25, 2022

jakirkham Jul 25, 2022

pentschev Jul 25, 2022

jakirkham Jul 25, 2022

madsbk Aug 1, 2022

pentschev Aug 2, 2022

wence- Aug 2, 2022

codecov-commenter commented Jul 25, 2022 •

edited

Loading

github-actions bot commented Sep 4, 2022

github-actions bot commented Oct 4, 2022

Temporarily disable compression for communication protocols #957

Are you sure you want to change the base?

Temporarily disable compression for communication protocols #957

Conversation

wence- commented Jul 21, 2022

pentschev commented Jul 21, 2022

jakirkham commented Jul 21, 2022

pentschev commented Jul 21, 2022

jakirkham commented Jul 21, 2022 • edited Loading

pentschev commented Jul 21, 2022

wence- commented Jul 21, 2022

jakirkham commented Jul 21, 2022

wence- commented Jul 22, 2022

wence- commented Jul 22, 2022

jakirkham commented Jul 22, 2022

pentschev commented Jul 22, 2022

pentschev commented Jul 25, 2022

jakirkham Jul 25, 2022

Choose a reason for hiding this comment

pentschev Jul 25, 2022

Choose a reason for hiding this comment

jakirkham Jul 25, 2022

Choose a reason for hiding this comment

madsbk Aug 1, 2022

Choose a reason for hiding this comment

pentschev Aug 2, 2022

Choose a reason for hiding this comment

wence- Aug 2, 2022

Choose a reason for hiding this comment

codecov-commenter commented Jul 25, 2022 • edited Loading

Codecov Report

github-actions bot commented Sep 4, 2022

github-actions bot commented Oct 4, 2022

jakirkham commented Jul 21, 2022 •

edited

Loading

codecov-commenter commented Jul 25, 2022 •

edited

Loading