You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While looking at example 55 (cutlass/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu), I was curious whether this modification would be legal:
From: using MmaType = cutlass::bfloat16_t; using QuantType = cutlass::int4b_t;
To: using MmaType = cutlass::bfloat16_t; using QuantType = cutlass::int2b_t;
According to the README.md, for the example, "For 8-bit x 4-bit or 2-bit, both inputs must be K-major." However, the internal comment states, "Only supports INT4 x { FP16, BF16 }." Furthermore, I'm having trouble finding documentation in the library over the use of int2b_t datatype for use in GEMM. I apologize if this question needs to be more detailed or if I missed some part of the documentation.
The text was updated successfully, but these errors were encountered:
That comment is inaccurate and needs updating. This kernel always converts smaller dtypes to the MMA dtype with CUDA Cores, so as long as such conversion logic exists, the combination is legal. I just checked that INT2 x BF16 compiled successfully and passed the test.
However, for optimal performance, such conversion should be low-cost; otherwise, it may negate the benefits of using mixed dtypes. Please refer to the PTX doc to verify if the hardware natively supports this conversion, also this header to see if an optimized software solution is available.
Currently, only naive conversion logic exists for int2b_t, i.e., casting first to int and then to bfloat16_t. This process makes the kernel significantly slower compared to INT4 x BF16.
Please let us know if this combination is important for your use case, and let us see if we can implement an optimized conversion for it.
Yes. Feel free to implement it as a partial specialization of the NumericArrayConverter. Please refer to the INT4 => BF16 as example.
IMO, INT2 => BF16 would be essentially the same as INT4 => BF16, just move the INT2 to the lSB of the target BF16 (as part of the mantissa).
Also notice that even with offline layout swizzling, int2b_t cannot be packed to a register when loading from smem (int4b_t can be packed to satisfy LDS.32 and thus int2b_t can only satisfy LDS.16). We are currently experimenting improving this.
While looking at example 55 (cutlass/examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_bf16_gemm.cu), I was curious whether this modification would be legal:
From:
using MmaType = cutlass::bfloat16_t; using QuantType = cutlass::int4b_t;
To:
using MmaType = cutlass::bfloat16_t; using QuantType = cutlass::int2b_t;
According to the README.md, for the example, "For 8-bit x 4-bit or 2-bit, both inputs must be K-major." However, the internal comment states, "Only supports INT4 x { FP16, BF16 }." Furthermore, I'm having trouble finding documentation in the library over the use of
int2b_t
datatype for use in GEMM. I apologize if this question needs to be more detailed or if I missed some part of the documentation.The text was updated successfully, but these errors were encountered: