[TOSA] Fold Up-Casts into MatMul when supported by tosa.matmul #198

cmcgirr-amd · 2024-07-15T07:33:46Z

In the newer version of Torch-MLIR using Torch 2.3 and Torch Dynamo + FxImporter we see that a GEMM with bias that operate on bf16 and f16 types decompose differently than the earlier versions using TorchScript.

  %0 = torch.aten.to.dtype %arg0, %int6, %false, %false, %none : !torch.vtensor<[4,8],bf16>, !torch.int, !torch.bool, !torch.bool, !torch.none -> !torch.vtensor<[4,8],f32>
  %1 = torch.aten.to.dtype %arg1, %int6, %false, %false, %none : !torch.vtensor<[8,16],bf16>, !torch.int, !torch.bool, !torch.bool, !torch.none -> !torch.vtensor<[8,16],f32>
  %2 = torch.aten.mm %0, %1 : !torch.vtensor<[4,8],f32>, !torch.vtensor<[8,16],f32> -> !torch.vtensor<[4,16],f32>

This decomposition contains a torch.aten.mm that operates on an accumulator type where the inputs are casted from the smaller representation to the accumulator type. E.g. bf16 -> f32. And since TOSA specifications give that the tosa.matmul can support these configurations, we can simply fold the cast into the operation signature rather than keeping the casting operations.

The resulting IR should look like this for the bf16 case.

%8 = tosa.matmul %6, %7 : (tensor<1x4x8xbf16>, tensor<1x8x16xbf16>) -> tensor<1x4x16xf32>

One caveat is the i16 -> i48 tosa.matmul case cannot be supported at the moment as there is no torch.int48 equivalent dtype to represent this special accumulator used in TOSA.

roberteg16

As you noted on your PR message, I think we should guard against i16 -> i48. Otherwise lgtm

lib/Conversion/TorchToTosa/TorchToTosa.cpp

TinaAMD · 2024-07-15T08:31:02Z

Nice idea to get rid of the casts this way! Implementationwise, I would have expected this to be a canonicalization on TOSA, i.e. you just lower naively from torch-mlir to tosa, and then canonicalize the tosa casts + matmul to a different matmul (then you could also easily cover the i48 case). WDYT?

cmcgirr-amd · 2024-07-15T08:55:30Z

Nice idea to get rid of the casts this way! Implementationwise, I would have expected this to be a canonicalization on TOSA, i.e. you just lower naively from torch-mlir to tosa, and then canonicalize the tosa casts + matmul to a different matmul (then you could also easily cover the i48 case). WDYT?

Kind of tricky because for some the accumulator types are not supported as input types so it would be invalid to have:

tosa.matmul %6, %7 : (tensor<1x4x8xi32>, tensor<1x8x16xi32>) -> tensor<1x4x16xi32>

Could be done for the float types, but integers may be difficult.

cmcgirr-amd · 2024-07-15T08:57:53Z

As you noted on your PR message, I think we should guard against i16 -> i48. Otherwise lgtm

In what sense do you mean guarded? Like don't convert in this case:

torch.aten.mm %0, %1 : !torch.vtensor<[4,8],si16>, !torch.vtensor<[8,16],si16> -> !torch.vtensor<[4,16],si48>

Or when there are casting ops? Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK

TinaAMD · 2024-07-15T09:05:03Z

Nice idea to get rid of the casts this way! Implementationwise, I would have expected this to be a canonicalization on TOSA, i.e. you just lower naively from torch-mlir to tosa, and then canonicalize the tosa casts + matmul to a different matmul (then you could also easily cover the i48 case). WDYT?

Kind of tricky because for some the accumulator types are not supported as input types so it would be invalid to have:
tosa.matmul %6, %7 : (tensor<1x4x8xi32>, tensor<1x8x16xi32>) -> tensor<1x4x16xi32>

Oh, I see, so you cannot actually lower the integer cases when the casts are not present, I didn't notice that this was disallowed. Makes sense to keep it here then, thanks for the explanation!

Could be done for the float types, but integers may be difficult.

lib/Conversion/TorchToTosa/TorchToTosa.cpp

roberteg16 · 2024-07-15T13:08:26Z

As you noted on your PR message, I think we should guard against i16 -> i48. Otherwise lgtm

In what sense do you mean guarded? Like don't convert in this case:
torch.aten.mm %0, %1 : !torch.vtensor<[4,8],si16>, !torch.vtensor<[8,16],si16> -> !torch.vtensor<[4,16],si48>
Or when there are casting ops? Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK

I was referring to avoid converting a case like you described.

Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK

Had a look into: https://github.com/llvm/torch-mlir/blob/main/include/torch-mlir/Dialect/Torch/IR/GeneratedTorchOps.td#L11547

I am no sure it is not allowed as you say

cmcgirr-amd · 2024-07-15T13:42:39Z

As you noted on your PR message, I think we should guard against i16 -> i48. Otherwise lgtm

In what sense do you mean guarded? Like don't convert in this case:
torch.aten.mm %0, %1 : !torch.vtensor<[4,8],si16>, !torch.vtensor<[8,16],si16> -> !torch.vtensor<[4,16],si48>
Or when there are casting ops? Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK
I was referring to avoid converting a case like you described.

Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK

Had a look into: https://github.com/llvm/torch-mlir/blob/main/include/torch-mlir/Dialect/Torch/IR/GeneratedTorchOps.td#L11547

I am no sure it is not allowed as you say

That is true the definition allows for anything, but I would find it hard to find an integer value to describe the dtype if the importer does not have it defined: https://github.com/Xilinx/torch-mlir/blob/feature/backport_ea1_ops/python/torch_mlir/extras/fx_importer.py#L176-L193

test/Conversion/TorchToTosa/basic.mlir

roberteg16 · 2024-07-15T15:21:31Z

As you noted on your PR message, I think we should guard against i16 -> i48. Otherwise lgtm

In what sense do you mean guarded? Like don't convert in this case:
torch.aten.mm %0, %1 : !torch.vtensor<[4,8],si16>, !torch.vtensor<[8,16],si16> -> !torch.vtensor<[4,16],si48>
Or when there are casting ops? Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK
I was referring to avoid converting a case like you described.

Because we cannot have a legal torch.aten.to_dtype op with si48 AFAIK

Had a look into: https://github.com/llvm/torch-mlir/blob/main/include/torch-mlir/Dialect/Torch/IR/GeneratedTorchOps.td#L11547
I am no sure it is not allowed as you say
That is true the definition allows for anything, but I would find it hard to find an integer value to describe the dtype if the importer does not have it defined: https://github.com/Xilinx/torch-mlir/blob/feature/backport_ea1_ops/python/torch_mlir/extras/fx_importer.py#L176-L193

Oh I see, it seems that the imported does not allow it. Could we just harden it to make it safe for the future?

test/Conversion/TorchToTosa/basic.mlir

rather than the f32 accumulator

…converted to tosa

TinaAMD

LGTM

feat(torch.aten.mm): fold up-casts into matmul when supported in TOSA

0a94521

cmcgirr-amd requested review from roberteg16 and TinaAMD July 15, 2024 07:33

roberteg16 reviewed Jul 15, 2024

View reviewed changes

lib/Conversion/TorchToTosa/TorchToTosa.cpp Outdated Show resolved Hide resolved

cmcgirr-amd requested a review from roberteg16 July 15, 2024 08:58

TinaAMD reviewed Jul 15, 2024

View reviewed changes

lib/Conversion/TorchToTosa/TorchToTosa.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchToTosa/TorchToTosa.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchToTosa/TorchToTosa.cpp Show resolved Hide resolved

cmcgirr-amd added 2 commits July 15, 2024 15:52

fix: address PR comments

e637e14

test(TorchToTosa): add more torch.aten.mm cases

511bf68

cmcgirr-amd requested a review from TinaAMD July 15, 2024 14:54

cmcgirr-amd commented Jul 15, 2024

View reviewed changes

test/Conversion/TorchToTosa/basic.mlir Show resolved Hide resolved

roberteg16 approved these changes Jul 15, 2024

View reviewed changes

TinaAMD reviewed Jul 16, 2024

View reviewed changes

test/Conversion/TorchToTosa/basic.mlir Outdated Show resolved Hide resolved

cmcgirr-amd added 3 commits July 16, 2024 10:26

refactor(TorchToTosa): aten.mm if f16 use tosa.matmul(f16, f16) -> f16

0ef1774

rather than the f32 accumulator

refactor(TorchToTosa): add guard for aten.mm si16->si48

8e10629

refactor(TorchToTosa): remove AtenToDtype case as the op was already …

f0eb1b2

…converted to tosa

cmcgirr-amd requested a review from TinaAMD July 16, 2024 09:28

TinaAMD approved these changes Jul 16, 2024

View reviewed changes

cmcgirr-amd merged commit 611039c into feature/backport_ea1_ops Aug 13, 2024
3 checks passed

cmcgirr-amd deleted the christopher.matmul_casted_inputs branch August 13, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOSA] Fold Up-Casts into MatMul when supported by tosa.matmul #198

[TOSA] Fold Up-Casts into MatMul when supported by tosa.matmul #198

cmcgirr-amd commented Jul 15, 2024

roberteg16 left a comment

TinaAMD commented Jul 15, 2024 •

edited

Loading

cmcgirr-amd commented Jul 15, 2024

cmcgirr-amd commented Jul 15, 2024 •

edited

Loading

TinaAMD commented Jul 15, 2024

roberteg16 commented Jul 15, 2024 •

edited

Loading

cmcgirr-amd commented Jul 15, 2024

roberteg16 commented Jul 15, 2024

TinaAMD left a comment

[TOSA] Fold Up-Casts into MatMul when supported by tosa.matmul #198

[TOSA] Fold Up-Casts into MatMul when supported by tosa.matmul #198

Conversation

cmcgirr-amd commented Jul 15, 2024

roberteg16 left a comment

Choose a reason for hiding this comment

TinaAMD commented Jul 15, 2024 • edited Loading

cmcgirr-amd commented Jul 15, 2024

cmcgirr-amd commented Jul 15, 2024 • edited Loading

TinaAMD commented Jul 15, 2024

roberteg16 commented Jul 15, 2024 • edited Loading

cmcgirr-amd commented Jul 15, 2024

roberteg16 commented Jul 15, 2024

TinaAMD left a comment

Choose a reason for hiding this comment

TinaAMD commented Jul 15, 2024 •

edited

Loading

cmcgirr-amd commented Jul 15, 2024 •

edited

Loading

roberteg16 commented Jul 15, 2024 •

edited

Loading