-
Assume ProblemSize M N K is 63 63 63, and the TileSize is 64 64 64, how cutlass control the reading of global memory to prevent ilegal memory access? I'm not very clear about the details. (Maybe the key is predication in cp_async?) |
Beta Was this translation helpful? Give feedback.
Answered by
hwu36
Jul 10, 2024
Replies: 1 comment
-
yes, prediction will tell you if the loading address is out of bound and then cp.async will not be executed. the other key thing is that you need to set the alignment to be 1 so that every thread loads one data a time and the loading address can be any -- i.e. does not have to be aligned with multiple elements. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
MARD1NO
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
yes, prediction will tell you if the loading address is out of bound and then cp.async will not be executed.
the other key thing is that you need to set the alignment to be 1 so that every thread loads one data a time and the loading address can be any -- i.e. does not have to be aligned with multiple elements.