-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue of incorrect External id for non-PyTorch activities #1001
Conversation
libkineto/src/output_json.cpp
Outdated
int external_id = 0; | ||
if (op.linkedActivity()) { | ||
external_id = op.linkedActivity()->correlationId(); | ||
} else if (op.type() == libkineto::ActivityType::CPU_OP || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These types align with https://github.com/pytorch/pytorch/blob/main/torch/csrc/profiler/collection.cpp#L1023-L1026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logically this makes sense although I was wondering if instead of having this whitelist of events that set external_id = op.correlationId(), can we just have a blacklist of events where we want to skip (ie. CUDA_RUNTIME/CUDA_KERNEL events). I ask this because there could be a type that we are missing and I would rather have extra information (even if wrong) rather than have a trace missing information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion. Using a blacklist is indeed a more suitable solution.To minimize the impact, I only added the types that I can confirm can be excluded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
Hi, @aaronenyeshi , could you please take a look at whether this patch is okay? |
Overall looks reasonable |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@fwenguang has updated the pull request. You must reimport the pull request before landing. |
@fwenguang has updated the pull request. You must reimport the pull request before landing. |
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@sraikund16 merged this pull request in 955af49. |
Sorry about the incomplete patch addressing the issue of setting the external ID.
I think only the PyTorch activities should set the external id to op.correlationId(), while the external ids for the runtimes and kernels should be set to op.linkedActivity()->correlationId().
However, there may be cases where the linkedActivity() for the runtime or kernel is nullptr (for example, when directly launching a kernel using a pybind mapping interface). In such cases, the external id should be empty, not set to op.correlationId().