Replies: 2 comments 1 reply
-
As you can see, semaphore wait is essentially a memory load instruction, and there's a dependency on the data right inside |
Beta Was this translation helpful? Give feedback.
1 reply
-
There are two ways to do split-k. One is using semaphore, the other is using atomic add. Semaphore can guarantee deterministic. Atomic add is faster. Cutlass kernel usually uses at most 8 warps. Dead lock won't happen because of thread number. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Gemm in Cutlass implements SplitKSerial Reduction using semaphores.
According to the following implementation code,the
k
th threadblock have to wait until thek-1
th threadblock releases the lock. However the order in which threadblocks are scheduled is undefined,if the device does not have sufficient resources and sm to execute all the blocks required, there is no guarantee that thek-1
th block is in the executing or completed state while thek
th block is in the executing state, in which case a deadlock may occur.Beta Was this translation helpful? Give feedback.
All reactions