Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Ignore pop from an empty external ids stack in RoctracerLogger to avo…
…id crash. (#1006) Summary: Pull Request resolved: #1006 See D62090845 for the context. This diff is trying to mimic nvidia side behavior. Take a similar workload/application that dyno trace crashes on MI300x, dyno trace on H100 looks like P1666484898. If search for keyword `CUPTI_ERROR_QUEUE_EMPTY` and refer to [nvidia's doc](https://l.facebook.com/l.php?u=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Farchive%2F9.2%2Fcupti%2Fgroup__CUPTI__ACTIVITY__API.html%23group__CUPTI__ACTIVITY__API_1g47395bf12ff55f30822d408b940567e3&h=AT1GbJqjqyEYga1oPxXkXPwznRcRGKnHtSlUt_708U3wxjzTel6MJbF2-o7f5yp7pdDKJ5Y_ASuojzFRECp-un81L7PU6GvesQfQ10v7419Eaqm3laLWGZIZldZpczkg37FlbFbI6zC59n6xtOdrscxX-bA), it looks like the suspicious migrated fiber thread attempts to deque from nvidia's thread_local queue fail, just like what we saw on the AMD side. Reviewed By: davidberard98 Differential Revision: D64974651 fbshipit-source-id: 56b36b0d85361ef8839225663eb6aa314a0897e2
- Loading branch information