Why can't I run this on any CUDA device except cuda:0? #10

ChaosAdmStudent · 2024-08-21T14:07:53Z

I tried to modify the codebase a little to allow me to run it on different cuda device but I always end up with an "illegal memory access was encountered" error if I use anything other than cuda:0. Any idea why this is happening and how I can fix it?

I believe the error originates in the project_gaussians_2d function. If I use cuda:1 device and try to print xys (or any other output from this function), I get the cuda illegal memory access error. However, if I use cuda:0, they print out just fine.

The text was updated successfully, but these errors were encountered:

Xinjie-Q · 2024-08-26T04:10:58Z

I'm curious if our original code can run on different cuda devices on your server? If you do not add CUDV_VISIBLE_DEVICE, you need to revise this code:

GaussianImage/train.py

Line 28 in f06988c

self.device = torch.device("cuda:0")

. In the code, we have specified that it is running on cuda:0.

ChaosAdmStudent · 2024-08-27T01:47:41Z

I'm curious if our original code can run on different cuda devices on your server? If you do not add CUDV_VISIBLE_DEVICE, you need to revise this code:

GaussianImage/train.py

Line 28 in f06988c

self.device = torch.device("cuda:0")

. In the code, we have specified that it is running on cuda:0.

In my codebase, I am just using the project_gaussians_2d and rasterize_gaussians_sum functions instead of making a SimpleTrainer2d class instance to start the training. I make sure to host all the inputs to these functions to a user-specified device but if I do anything other than cuda:0, it was initially giving me an error.

I assumed it could be because the cuda code is running on cuda:0 by default (for rendering). So I added cudaSetDevice(device_id); in the bindings.cu file for these two functions and re-compiled the package. After doing this, it started working but the code ran much much slower on the other cuda devices. After inspecting nvidia-smi, I could see that when user device input is cuda:1 or cuda:2, it still hosts some part of the script on cuda:0. I guess the slowdown is because some data is repeatedly getting communicated back and forth between cuda devices. I was wondering if I will have to add the cudaSetDevice(device_id); in every custom cuda kernel that is implemented?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why can't I run this on any CUDA device except cuda:0? #10

Why can't I run this on any CUDA device except cuda:0? #10

ChaosAdmStudent commented Aug 21, 2024 •

edited

Loading

Xinjie-Q commented Aug 26, 2024

ChaosAdmStudent commented Aug 27, 2024 •

edited

Loading

Why can't I run this on any CUDA device except cuda:0? #10

Why can't I run this on any CUDA device except cuda:0? #10

Comments

ChaosAdmStudent commented Aug 21, 2024 • edited Loading

Xinjie-Q commented Aug 26, 2024

ChaosAdmStudent commented Aug 27, 2024 • edited Loading

ChaosAdmStudent commented Aug 21, 2024 •

edited

Loading

ChaosAdmStudent commented Aug 27, 2024 •

edited

Loading