-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create multiple dask workers per gpu #571
Comments
It's possible to start multiple compute threads per GPU passing |
In my case, the performance per worker appears adequate with 1 thread, although I may need to play with the number of threads in the future, ideally, I need more workers. Currently, I can't assign the number of workers beyond the number of physical GPUs, I was hoping there was a workaround inside dask_cuda similar to that in TensorFlow. |
Yes, today Dask-CUDA is limited to one worker per GPU as a design choice and it's not in our plans to extend that currently. Extending that would entail lots of different complications, such as handling memory pools and spilling efficiently, so our goal is to keep on working with one worker per GPU but multiple threads. If you definitely want to go ahead and test that on your own, the alternative you have is to launch |
This issue has been labeled |
Since this is currently out-of-scope for Dask-CUDA, I'm closing this. If there's more you would like to discuss, please feel free to reopen this or open a new issue. |
@pentschev I'm absolutely not advocating for adding this functionality. Fooling around with this setting has led to some hard to track down CUDA errors. Is there a way that setting |
@bartbkr what kind of warning? Running with more than one thread per worker should work fine, although we don't expect any reasonable performance gains with it, plus you're likely to end up with OOM errors faster than you would with the default one thread per worker. If these are OOM errors you're seeing, I wouldn't be surprised. |
I'm using a
LocalCUDACluster
to process adask dataframe
, some pseudo code below. The GPU func is much quicker than CPU but has relatively low memory requirements (2-4GB) so it would be good to have more workers per GPU. Is there a way to create multiple dask workers per GPU or create logical GPUs out of physical GPUs (similar what is possible in tensorflowtf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)
). Or is there some other obvious route to improving this?Hardware this will be used on:
Local machine 4X RTX A6000 (48GB)
Scaled to multinode HPC runs with V100 (16GB) and A100(40GB)
All running inside a Docker/Singularity/Shifter container solution.
Psuedo code:
The text was updated successfully, but these errors were encountered: