Question about PR1084 Support for Mixed Input TensorOp #1117

MARD1NO · 2023-09-28T06:42:39Z

MARD1NO
Sep 28, 2023

In include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h

FragmentShuffler shuffling registers for int8 dtype, and I'm not very cleared about the whole process, can someone explain it?

Answered by manishucsd

Sep 29, 2023

Data is loaded using ldmatrix from SMEM into Registers. Each thread owns s8x4 contiguous data in the operandB matrix. However, to issue mma.sync on bf16 or f16 each thread needs to own (f16x2x2). The f16x2 .... f16x2 are separated by 6 elements. Please see layout diagrams from GTC 2020 talk for IMMA and HMMA. Basically, the data is loaded thinking we will do IMMA; but the FragmentShuffler shuffles to get it ready in registers for HMMA. From the GTC 2020 talk put the yellow operandB from slide 21 and slide 22 side-by-side.

View full answer

hwu36 · 2023-09-28T15:40:40Z

hwu36
Sep 28, 2023
Maintainer

@manishucsd

0 replies

manishucsd · 2023-09-29T01:16:55Z

manishucsd
Sep 29, 2023

Data is loaded using ldmatrix from SMEM into Registers. Each thread owns s8x4 contiguous data in the operandB matrix. However, to issue mma.sync on bf16 or f16 each thread needs to own (f16x2x2). The f16x2 .... f16x2 are separated by 6 elements. Please see layout diagrams from GTC 2020 talk for IMMA and HMMA. Basically, the data is loaded thinking we will do IMMA; but the FragmentShuffler shuffles to get it ready in registers for HMMA. From the GTC 2020 talk put the yellow operandB from slide 21 and slide 22 side-by-side.

1 reply

MARD1NO Oct 7, 2023
Author

So much thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about PR1084 Support for Mixed Input TensorOp #1117

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Question about PR1084 Support for Mixed Input TensorOp #1117

MARD1NO Sep 28, 2023

Replies: 2 comments · 1 reply

hwu36 Sep 28, 2023 Maintainer

manishucsd Sep 29, 2023

MARD1NO Oct 7, 2023 Author

MARD1NO
Sep 28, 2023

Replies: 2 comments 1 reply

hwu36
Sep 28, 2023
Maintainer

manishucsd
Sep 29, 2023

MARD1NO Oct 7, 2023
Author