Replies: 2 comments
-
Now cutlass batched GEMM is integrated, too. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Convolution kernels are also integrated. Results and analysis on end to end models are available at apache/tvm#9746. It looks like TVM + cutlass is faster than TVM + cudnn. I'm not claiming that "cutlass is faster than cudnn on my gpu", it's just that the way we use cudnn is apparently not performing as well as it should be. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
apache/tvm@541f9f2
In this first commit, Turing and Ampere FP16 tensor core GEMMs are added into TVM.
Great work and thank you very much, TVM community!!!
Beta Was this translation helpful? Give feedback.
All reactions