Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory placement consideration for PAD #18

Open
robers97 opened this issue Jul 5, 2018 · 3 comments
Open

memory placement consideration for PAD #18

robers97 opened this issue Jul 5, 2018 · 3 comments

Comments

@robers97
Copy link

robers97 commented Jul 5, 2018

For some reason the 'CUDA-U' implementation is PAD is taking a long time in the kernel; likely waiting for data. I was able to make the following change to bring the kernel time in-line with the CUDA-D version.

#ifdef CUDA_8_0
T * h_in_out;
cudaStatus = cudaMallocManaged(&h_in_out, in_size * sizeof(T));
cudaMemAdvise(h_in_out, in_size * sizeof(T), cudaMemAdviseSetPreferredLocation, 0);

I realize that this change increases the Allocation phase of the program, but it seems more reasonable that the tax is there. Its unclear to me how this setting may impact other results with CUDA CHAI.
I'm opening this for discussion and consideration.

@ielhajj
Copy link
Member

ielhajj commented Jul 6, 2018

Are you using static or dynamic partitioning?

@robers97
Copy link
Author

robers97 commented Jul 6, 2018 via email

@ielhajj
Copy link
Member

ielhajj commented Jul 6, 2018

Does the best static partitioning version also perform worse than CUDA-D? In PAD, different workers may touch the same memory locations, and with dynamic partitioning, these workers are more likely to be on different devices. That might be one of the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants