Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring time spent for reduction operation in AllReduce. #1494

Open
jain-jainendra opened this issue Jan 21, 2025 · 3 comments
Open

Measuring time spent for reduction operation in AllReduce. #1494

jain-jainendra opened this issue Jan 21, 2025 · 3 comments

Comments

@jain-jainendra
Copy link

I am trying to measure time spent in reduction operation of RCCL Allreduce. I found that eventually it calls this part of code in common_kernel.h.
#pragma unroll Unroll
for (int u=0; u < Unroll; u++) {
if (s < PreOpSrcs) tmp[u] = applyPreOp(preFn, tmp[u]);
acc[u] = applyReduce(redFn, acc[u], tmp[u]);
}

How can we measure time spent in applyReduce function? tried _clock64, wall_clock64. They are not helpful

@ppanchad-amd
Copy link

Hi @jain-jainendra. Internal ticket has been created to assist with your issue. Thanks!

@huanrwan-amd
Copy link

Hi @jain-jainendra ,
In rccl-test examples (https://github.com/ROCm/rccl-tests), ./all_reduce_perf test could help to benchmark the time spent in reduction operation.

e.g. https://rocm.docs.amd.com/en/develop/how-to/rocm-for-ai/training/train-a-model.html#running-the-rccl-bandwidth-test

Image
It shows the out-of-place and in-place time for reduce operation.

Please let us know for any further help.

@jain-jainendra
Copy link
Author

I am using RCCL tests only. But this gives complete time for allreduce application which involves communication and computation. I want to measure only time spent in computation i.e. reduction operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants