Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily disable compression for communication protocols #957

Open
wants to merge 1 commit into
base: branch-22.08
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions dask_cuda/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,7 @@
dask.dataframe.shuffle.shuffle_group
)
dask.dataframe.core._concat = unproxify_decorator(dask.dataframe.core._concat)

# Until GPU-based compression is hooked up, turn off compression
# in communication protocols (see https://github.com/rapidsai/dask-cuda/issues/935)
dask.config.config["distributed"]["comm"]["compression"] = None
Comment on lines +34 to +37
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to the benchmark script then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that, but we should document this in a good, visible manner. Lately there has been desire to make defaults more performance-friendly for newcomers, and this is a pretty significant drawback at least for this one workflow. This new behavior could cause users to try out Dask and immediately rule it out, as well as GPUs entirely, due to very bad performance that comes from this.

Perhaps people like @beckernick @VibhuJawa @randerzander @ayushdg could voice opinions on this matter too, especially if they are running other workflows without UCX lately that would potentially show if this change is indeed significant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to change the default, would recommend raising this in Distributed for the reasons already discussed above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with changing the default in Dask-CUDA if it is well documented but we should make sure that we don't overwrite a non-default value!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to change the default, would recommend raising this in Distributed for the reasons already discussed above

I do not oppose to raising this in Distributed if someone is interested in driving the conversation. However, the problem with discussing this in a broader aspect is that the current default may make sense from a CPU standpoint, in which case we should still consider having a non-default value in Dask-CUDA if it makes sense for GPU workflows, which it currently seems to be the case.

I am okay with changing the default in Dask-CUDA if it is well documented but we should make sure that we don't overwrite a non-default value!

I agree, we must document it well and ensure we don't overwrite a user-defined value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do some profiling. I tried to match the performance loss up with a performance model but couldn't make sense of it. Suppose that a point to point message takes $T_p(b) := \alpha + \beta b$ seconds to send $b$ bytes, and that compression takes $T_c(b) = \gamma + \nu b$ seconds to (de)compress $b$ bytes, with a compression factor of $c$. Then sending a raw message costs $T_p(b)$ and sending a compressed message costs $2T_c(b) + T_p(b/c)$. So it's worthwhile to compress whenever $T_p(b) > 2T_c(b) + T_p(b/c)$. So we have $\alpha + \beta b > 2T_c(b) + \alpha + \beta b/c \Leftrightarrow \beta b/c > 2(\gamma + \nu b)$.

Let's rearrange again to get $b(\beta/c - 2\nu) > 2\gamma \Leftrightarrow b > 2\gamma / (\beta/c - 2\nu)$ (If $\gamma = 0$ then compression makes sense if $\beta/c - 2\nu > 0$). In this latter case, that means that it's worthwhile to compress if the "compression bandwidth $C := 1/\nu$" and network bandwidth $B := 1/\beta$ are related by $C > 2 c B$. e.g. for a compression ratio of 2, if we can compress more than four times as fast as we can send over the network, it's worthwhile compressing.

I don't have numbers for $\alpha$, $\beta$, $\gamma$, and $\nu$ but if they were measured you could put an adaptive compression model in and use that.