-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Expose rmm maximum_pool_size
to LocalCUDACluster
and dask-cuda-worker API
#826
Comments
Happy to take this on if we decide to do this. |
maximum_pool_size
API to dask-cuda
maximum_pool_size
API to dask-cuda
maximum_pool_size
API to dask-cuda
maximum_pool_size
to LocalCUDACluster
and dask-cuda-worker API
Would this allow me to run multiple workers per GPU? |
Exposing this as |
@mmccarty I don't quite get the direct relation between this and having multiple workers per GPU. Today you can do that manually but I don't believe that would necessarily beneficial performance-wise. What's your use case for multiple workers per GPU? |
Thanks @pentschev . Will push a PR keeping your suggestion in mind soon. Would like to contribute to dask-cuda to just know the code base slightly better. |
@pentschev Happy to talk about the use case. Is there a better issue? or should I just create a new one? |
This PR closes #826 Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #827
@mmccarty I think you should open a new issue, it may be covered somewhere but I can't tell for sure without more details. |
There was issue ( #571 ). Though yeah hard to tell whether that is related without knowing more about the use case. |
We should expose rmm's
maximum_pool_size
argument (See docs) to theLocalCUDACluster
anddask-cuda-worker
CLI API .Why
By default, the total available memory on the GPU is used by RMM . This can cause problems for workflows where it actually grows to the total available device memory and we need some memory outside the POOL .
Why we may need room for other allocations:
Competing Process: Common case is Client & Worker on the same GPU.
Competing pool/libraries: Some libraries might need to allocate some memory outside of the pool.
Like Pytorch (Even Nccl/Raft might need some room.)
Giving the user the ability to set
maximum_pool_size
easily can circumvent those issues.Workflow Context:
I ran into this while working on a workflow where the pool grew to
32501MiB (out of cards total 32510MiB )
and it left very little memory for cuML to do some non POOL allocation leading to failure.Current WorkAround:
Using the RMM reinitialize API.
The text was updated successfully, but these errors were encountered: