Skip to content

Where is the XOR-permuted layout implemented? #510

Answered by hwu36
masahi asked this question in Q&A
Discussion options

You must be logged in to vote

The mapping of the thread to the data is in the iterators. The shared memory store iterator RegularTileAccessIterator is in https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op.h. The shared memory load iterator MmaTensorOpMultiplicandTileIterator in https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h. Check their constructors for the initial mapping. Recommend to insert many printfs if you need to dive into them. The iterators are very general which covers cases that you don't care. So, you don't need to figure out every variables or templates.

https://github.com/NVIDIA…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by masahi
Comment options

You must be logged in to vote
1 reply
@masahi
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants