[ROCm] Use tl.range()
in block GEMM kernels with num_stages
set by host.
#9052
Job | Run time |
---|---|
45s | |
45s |
tl.range()
in block GEMM kernels with num_stages
set by host.
#9052
Job | Run time |
---|---|
45s | |
45s |