memory placement consideration for PAD #18

robers97 · 2018-07-05T21:31:53Z

For some reason the 'CUDA-U' implementation is PAD is taking a long time in the kernel; likely waiting for data. I was able to make the following change to bring the kernel time in-line with the CUDA-D version.

#ifdef CUDA_8_0
T * h_in_out;
cudaStatus = cudaMallocManaged(&h_in_out, in_size * sizeof(T));
cudaMemAdvise(h_in_out, in_size * sizeof(T), cudaMemAdviseSetPreferredLocation, 0);

I realize that this change increases the Allocation phase of the program, but it seems more reasonable that the tax is there. Its unclear to me how this setting may impact other results with CUDA CHAI.
I'm opening this for discussion and consideration.

ielhajj · 2018-07-06T01:26:29Z

Are you using static or dynamic partitioning?

robers97 · 2018-07-06T15:37:21Z

Dynamic...

…

On Thu, Jul 5, 2018 at 8:26 PM Izzat El Hajj ***@***.***> wrote: Are you using static or dynamic partitioning? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#18 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APhggv6EluF0gW86DE0DZyk4FoTJth44ks5uDrzFgaJpZM4VEkfi> .

ielhajj · 2018-07-06T16:12:03Z

Does the best static partitioning version also perform worse than CUDA-D? In PAD, different workers may touch the same memory locations, and with dynamic partitioning, these workers are more likely to be on different devices. That might be one of the issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory placement consideration for PAD #18

memory placement consideration for PAD #18

robers97 commented Jul 5, 2018

ielhajj commented Jul 6, 2018

robers97 commented Jul 6, 2018 via email

ielhajj commented Jul 6, 2018

memory placement consideration for PAD #18

memory placement consideration for PAD #18

Comments

robers97 commented Jul 5, 2018

ielhajj commented Jul 6, 2018

robers97 commented Jul 6, 2018 via email

ielhajj commented Jul 6, 2018