Shared memory bank conflict problem #281
Unanswered
FindDefinition
asked this question in
Q&A
Replies: 2 comments 6 replies
-
Hmm. Does cutlass have this problem or your customized kernel have this problem? You can use inline ptx to enforce 128bit shared memory. Just implement |
Beta Was this translation helpful? Give feedback.
4 replies
-
Any update? Has the issue been fixed? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm debugging a turing gemm kernel:
TileShape: 128x256x32
WarpTileShape: 64x64x32
layout: ttt
dtypes: all
cutlass::half_t
tensor op: 1688
CUDA: 11.3
When I profile this kernel in nsight compute, lots of bank conflicts detected:
After debug, two issues found:
st.shared.u32
instead ofst.shared.v4.u32
in ptxHere is the problem:
How can I force the compiler generate
st.shared.v4.u32
? use inline asm? does thest.shared.v4.u32
require registers address to be aligned?gemm code:
Beta Was this translation helpful? Give feedback.
All reactions