Skip to content

How cutlass control reading global memory when TileSize can not be divided by ProblemSize? #1507

Answered by hwu36
MARD1NO asked this question in Q&A
Discussion options

You must be logged in to vote

yes, prediction will tell you if the loading address is out of bound and then cp.async will not be executed.

the other key thing is that you need to set the alignment to be 1 so that every thread loads one data a time and the loading address can be any -- i.e. does not have to be aligned with multiple elements.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by MARD1NO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants