You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New Mixed-input Hopper GEMMs support covering 16-bit x 8-bit input types with optimal performance.
New Mixed-input Ampere GEMMs with support for canonical layouts (TN). The implementation supports upcast on operandB {fp16, bf16} x {s8, u8} and upcast on operandA {s8, u8} x {fp16, bf16}. They also include fast numeric conversion recipes and warp level shuffles to achieve optimal performance.
New Copy Async based Hopper GEMMs - which support lower than 16B aligned input tensors (across s8/fp8/fp16/bf16/tf32 types) with optimal performance. As a part of this, new kernel schedules, and Copy Ops SM80_CP_ASYNC_CACHE_* were also added.
EVT Support for RELU with Aux bitmap tensor store (used in dRELU). See SM90 EVT fusions for details.
Various subbyte enhancements like tagged device ptrs, support for vectorized copy, various operators to treat subbyte iterators as pointers, and full-fledged CuTe Tensor support.
Support for Clang as a host compiler.
Support for void-C kernels and SM80 mixed-input GEMMs in the CUTLASS Python interface
This discussion was created from the release CUTLASS 3.3.0.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This discussion was created from the release CUTLASS 3.3.0.
Beta Was this translation helpful? Give feedback.
All reactions