-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Phantom] Pytorch topk perf (SWDEV-499808) #1848
Comments
machine: banff-cyxtera-s80-2 |
upstream build failed:
|
results for topk included with default rocPRIM when making choices:
|
PR: pytorch#145416 |
Verbatim from email thread, written by [email protected]: By replacing __threadfence() in line 337 of TensorTopK.hip I was getting "Memory access fault by GPU node-8 (Agent handle: 0xf914a50) on address 0x7f2a8cb5a000. Reason: Unknown." However, replacing it instead with "__builtin_amdgcn_fence(__ATOMIC_RELEASE, "agent")" worked without errors. The entries regarding this fix have "patch" values of "TensorTopK.hop:337:threadfence~>amdgcnfence". Also regarding the bug fix, added "__builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent");" to line 359 of TensorTopK.hip. Such entries of "patch"es of "TensorTopK.hop:337:threadfence~>amdgcnfence+TensorTopK.hop:359:amdgcnfence". As expected, these patches help with the performance. |
Focusing on (patch + bug fix) results only, we added more shapes to get a better understanding of how native and sort compare in different scenarios: The results are included, where rows for which "sort" is better are highlighted. We observe that for data which is 1 dimensional (considering (1, 256) and (1, 1, 256) as 1 dimensional as well) and rather long, sort is the better option. We will focus our tests specifically on the number of elements in 1 the dimensional case. |
Added the changes described in #1848 (comment) to the PR: pytorch#145416. After talking to Pruthvi and Jerry on Jan 24, out focus will be on:
|
results presented on Jan 30 with Phantom (adding both for the record, the second one has 'sort' as well and subsumes the first): topk_results_jan30_2025.xlsx
|
link to issue: https://ontrack-internal.amd.com/browse/SWDEV-499808
List of tasks kindly given by Jerry:
Benchmarks:
The text was updated successfully, but these errors were encountered: