Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why can't I run this on any CUDA device except cuda:0? #10

Open
ChaosAdmStudent opened this issue Aug 21, 2024 · 2 comments
Open

Why can't I run this on any CUDA device except cuda:0? #10

ChaosAdmStudent opened this issue Aug 21, 2024 · 2 comments

Comments

@ChaosAdmStudent
Copy link

ChaosAdmStudent commented Aug 21, 2024

I tried to modify the codebase a little to allow me to run it on different cuda device but I always end up with an "illegal memory access was encountered" error if I use anything other than cuda:0. Any idea why this is happening and how I can fix it?

I believe the error originates in the project_gaussians_2d function. If I use cuda:1 device and try to print xys (or any other output from this function), I get the cuda illegal memory access error. However, if I use cuda:0, they print out just fine.

@Xinjie-Q
Copy link
Owner

I'm curious if our original code can run on different cuda devices on your server? If you do not add CUDV_VISIBLE_DEVICE, you need to revise this code:

self.device = torch.device("cuda:0")
. In the code, we have specified that it is running on cuda:0.

@ChaosAdmStudent
Copy link
Author

ChaosAdmStudent commented Aug 27, 2024

I'm curious if our original code can run on different cuda devices on your server? If you do not add CUDV_VISIBLE_DEVICE, you need to revise this code:

self.device = torch.device("cuda:0")

. In the code, we have specified that it is running on cuda:0.

In my codebase, I am just using the project_gaussians_2d and rasterize_gaussians_sum functions instead of making a SimpleTrainer2d class instance to start the training. I make sure to host all the inputs to these functions to a user-specified device but if I do anything other than cuda:0, it was initially giving me an error.

I assumed it could be because the cuda code is running on cuda:0 by default (for rendering). So I added cudaSetDevice(device_id); in the bindings.cu file for these two functions and re-compiled the package. After doing this, it started working but the code ran much much slower on the other cuda devices. After inspecting nvidia-smi, I could see that when user device input is cuda:1 or cuda:2, it still hosts some part of the script on cuda:0. I guess the slowdown is because some data is repeatedly getting communicated back and forth between cuda devices. I was wondering if I will have to add the cudaSetDevice(device_id); in every custom cuda kernel that is implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants