[TorchToLinalg] Add `aten.fft_rfft` and lowering #3857

giacs-epic · 2024-11-07T15:42:03Z

Add AtenFftRfftOp to Torch dialect.
Add conversion of AtenFftRfftOp to Linalg, using a linalg.matmul per output component (real and imaginary). Computing the DFT is O(n^2).
Add decomposition of AtenFftRfftOp into Torch-level ops (same paradigm as above).
Add unit and end-to-end tests.

zjgarvey

Before reviewing in detail, let me see if I understand correctly what your cross-repository goal is.

You include this torch-to-linalg conversion as a baseline conversion, but in IREE, you intend to have a more performant lowering to linalg_ext? I suppose this pass exists before torch-to-linalg in the torch-to-iree pipeline?
Would it make more sense to add this as a decomposition of the op at the torch-dialect level? That way, other backends like Tosa and StableHLO could benefit, and we can turn off the op via backend-legal-ops option in torch-decompose-complex-ops pass if we want to go a different route in IREE. I have plans to modify the decompose complex ops pass to be more specific in the torch-to-iree pipeline this week, so we can specify a backend-legal-ops set there.

giacs-epic · 2024-11-11T11:20:23Z

@zjgarvey

Yes, that's the goal so far. Indeed I would place a pass to lower rfft to linalg_ext before torch-to-linalg in iree.
That makes a lot of sense. Would it be compatible with an eventual decomposition of aten.stft? I.e. would having both decompositions yield to the following behavior: aten.stft gets decomposed into aten.fft_rffts, which in turn get decomposed into matmuls?

zjgarvey · 2024-11-12T19:32:55Z

2. That makes a lot of sense. Would it be compatible with an eventual decomposition of `aten.stft`? I.e. would having both decompositions yield to the following behavior: `aten.stft` gets decomposed into `aten.fft_rfft`s, which in turn get decomposed into matmuls?

Yeah, precisely. There are some limitations, however. Does the higher performance path for fft_rfft to linalg_ext apply to the same cases as this conversion? If not, we will definitely need to keep this as a torch-to-linalg conversion to catch any patterns that failed to match the conversion to linalg_ext. This is because we won't be able to go back to decompose-complex-ops after trying to convert to linalg_ext.

zjgarvey · 2024-11-12T19:37:10Z

Ah, I see you already converted this to a decomposition. Perhaps we should just do both? StableHlo and Tosa would benefit from the decomposition, which we can turn off once you add the torch-to-linalg-ext path to IREE, and then the torch-to-linalg conversion would be a final fallback if the linalg_ext path doesn't apply.

giacs-epic · 2024-11-13T14:03:09Z

@zjgarvey The higher-performance path would apply when the input signal length is a power of 2, all other cases would need to be translated to this "naive" algorithm. Do you think it's possible to branch compilation based on the input dimension size?
Otherwise I'm open to just keeping both decomposition and lowering to linalg.

zjgarvey · 2024-11-13T20:52:31Z

It might be possible to mark the op as conditionally illegal for decompose-complex-ops, but I don't think we want to go that route. Let's add both the torch-to-linalg conversion and the decomposition for now.

giacs-epic · 2024-11-14T17:02:43Z

@zjgarvey Added conversion back.

zjgarvey

I have a few comments after looking closer at the code.

zjgarvey · 2024-11-18T16:32:28Z

lib/Conversion/TorchToLinalg/Linear.cpp

+      if (isRealPart) {
+        v = cos(v);
+      } else {
+        v = -sin(v);
+      }


nit : I'd prefer a ternary expression

Suggested change

if (isRealPart) {

v = cos(v);

} else {

v = -sin(v);

}

v = isRealPart ? cos(v) : -sin(v);

Changed according to your suggestion.

zjgarvey · 2024-11-18T18:20:33Z

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

+  BaseTensorType lhsType = cast<BaseTensorType>(lhs.getType());
+  assert(lhsType && lhsType.hasSizes());
+  const ArrayRef<int64_t> lhsShape = lhsType.getSizes();
+  assert(lhsShape.size() >= 2);
+  BaseTensorType rhsType = cast<BaseTensorType>(rhs.getType());
+  assert(rhsType && rhsType.hasSizes());
+  const ArrayRef<int64_t> rhsShape = rhsType.getSizes();
+  assert(rhsShape.size() >= 2);
+  assert(rhsShape[rhsShape.size() - 2] == lhsShape[lhsShape.size() - 1]);
+
+  SmallVector<int64_t> resShape(lhsShape);
+  resShape[resShape.size() - 1] = rhsShape[rhsShape.size() - 1];
+
+  Type dtype = lhsType.getOptionalDtype();
+
+  ValueTensorType resType =
+      ValueTensorType::get(rewriter.getContext(), resShape, dtype);


I think we can avoid this helper function entirely. The asserts should never fail anyway, since you are generating the DFT coefficient matrix from the input and are already reporting match failures for unsupported cases.

Agree. Removing the function and simplifying.

zjgarvey · 2024-11-18T18:32:36Z

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

+    Value unsqueezeDim =
+        rewriter.create<ConstantIntOp>(loc, rewriter.getI64IntegerAttr(-2));
+    auto unsqueezed = unsqueezeTensor(rewriter, op, self, unsqueezeDim);
+    if (failed(unsqueezed))
+      return rewriter.notifyMatchFailure(op,
+                                         "cannot generate unsqueezed tensor");
+    Value lhs = *unsqueezed;
+    Type dtype = inputType.getOptionalDtype();
+
+    Value real, complex;
+
+    for (const bool isRealPart : {true, false}) {
+
+      // coeff : (fftLength x outputFftDim)
+      ValueTensorType matrixType = ValueTensorType::get(
+          op.getContext(), SmallVector<int64_t>{fftLength, outputFftDim},
+          dtype);
+      Value coeffMatrix = getDFTMatmulCoeff(rewriter, loc, matrixType,
+                                            /*isRealPart=*/isRealPart);
+
+      // X = matmul(lhs, coeff) : (D x 1 x outputFftDim)
+      Value matmulRes = createBatchMatmul(rewriter, loc, lhs, coeffMatrix);
+
+      // Y = squeeze(X, -2) : (D x outputFftDim)
+      auto squeezed = squeezeTensor(rewriter, op, loc, -2, matmulRes);
+      if (failed(squeezed))
+        return rewriter.notifyMatchFailure(op,
+                                           "cannot generate squeezed tensor");


I don't understand why we need to conjugate the torch.aten.matmul with a squeeze. Pytorch's matmul should do what we want regardless of the size of D: (D x fftLength) * (fftLength x outputFftDim) -> (D x outputFftDim).

You are right, we don't. Changing.

zjgarvey · 2024-11-18T18:37:35Z

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

+
+    Value real, complex;
+
+    for (const bool isRealPart : {true, false}) {


Why is the looping variable a const bool?

This is removed in the refactoring.

zjgarvey · 2024-11-18T18:38:07Z

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

+          op.getContext(), SmallVector<int64_t>{fftLength, outputFftDim},
+          dtype);
+      Value coeffMatrix = getDFTMatmulCoeff(rewriter, loc, matrixType,
+                                            /*isRealPart=*/isRealPart);


nit: remove the arg hint.

Changed in the refactoring.

zjgarvey · 2024-11-18T18:41:05Z

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

+    Value lhs = *unsqueezed;
+    Type dtype = inputType.getOptionalDtype();
+
+    Value real, complex;


nit: I'd rename complex to imaginary, since the latter tensor represents the imaginary part of the end result, which is complex-valued.

Yes. I misused the word complex. imaginary is the correct one. Changing.

zjgarvey · 2024-11-18T18:45:04Z

test/Dialect/Torch/decompose-complex-ops.mlir

+// CHECK:             %[[INTM2:.*]] = torch.constant.int -2
+// CHECK:             %[[INT0:.*]] = torch.constant.int 0
+// CHECK:             %[[INT1:.*]] = torch.constant.int 1
+// CHECK:             %[[VAR2:.*]] = torch.aten.transpose.int %[[ARG0:.*]], %[[INT0:.*]], %[[INT1:.*]] : !torch.vtensor<[36,23],f32>, !torch.int, !torch.int -> !torch.vtensor<[23,36],f32>


For lit tests, you should only do [[NAME:.*]] once. Every subsequent use of a variable should be [[NAME]], otherwise the variable NAME gets overridden, even if it didn't match the original use in the first place.

Thanks! Changing.

zjgarvey · 2024-11-18T18:58:43Z

lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

+    Value stack =
+        rewriter.create<AtenStackOp>(loc, stackType, sequence, cstMinusOne);


I'm not sure how much we need to beef up the decomposition here, but it would probably be most efficient to construct the real and imaginary parts of the coeff matrix in one literal tensor of shape [fftLength, (outputFftDim*2)] in such a way that unflattening to [fftLength, outputFftDim, 2] gives the real and imaginary split in the last dim. Then the matmul can be performed in one torch.aten.matmul, the result can then be unflattened before getting converted to a complex tensor. Concatenations and matmuls are expensive, so reducing those would be ideal.

I erroneously assumed that through optimization passes it would have been transformed to the optimal computation that you described, but indeed it doesn't. Also I'm not sure how an optimizing transformation for this case should be expressed. For simplicity I'll change the decomposition to the form that you suggest, although, by doing so, the decomposition becomes slightly less readable.

zjgarvey · 2024-11-18T19:01:04Z

lib/Conversion/TorchToLinalg/Linear.cpp

+    Value realMatrix =
+        getDFTMatmulCoeff(rewriter, loc, matrixType, /*isRealPart=*/true);
+    Value real = createLinalgMatmulOnTensors(rewriter, loc, componentsType,
+                                             self, realMatrix);
+
+    Value imagMatrix =
+        getDFTMatmulCoeff(rewriter, loc, matrixType, /*isRealPart=*/false);
+    Value imag = createLinalgMatmulOnTensors(rewriter, loc, componentsType,
+                                             self, imagMatrix);


I think my comment about multiple matmuls in the decomposition below applies here as well. Let me know what you think about making one DFTMatmulCoeff with both real and imaginary parts.

Although, in this case, the linalg generic would need to be constructed more carefully, since you wouldn't have two tensors to iterate over (it wouldn't be elementwise anymore, and you would need to fiddle with the indexing maps). Feel free to keep this conversion as-is if it seems like too much work to make the change.

For the same reason as above I think this should also be done. I will add this in the next commit.

Refactored the conversion in the last commit.

giacs-epic added 5 commits November 5, 2024 15:25

Add rfft and its conversion to linalg

a5db8a1

Add rfft to abstract interp lib

6c9da80

Fix wrong component shape when transposing

643c6d5

Add tests

c073ea2

Add unit tests

ad189a9

zjgarvey reviewed Nov 8, 2024

View reviewed changes

giacs-epic changed the title ~~[TorchToLinalg] Add aten.ffr_rfft and lowering~~ [TorchToLinalg] Add aten.fft_rfft and lowering Nov 11, 2024

giacs-epic added 3 commits November 11, 2024 18:30

Add tests to ONNX xfail set

9194e6e

Change to decompose implementation

721c8f1

Add tests to FX_IMPORTER_STABLEHLO_XFAIL_SET too

7f9fb25

Add lowering to Linalg back

048dc55

zjgarvey self-requested a review November 14, 2024 18:03

giacs-epic mentioned this pull request Nov 18, 2024

Add aten.stft.center and decomposition #3880

Draft

zjgarvey requested changes Nov 18, 2024

View reviewed changes

giacs-epic added 2 commits November 19, 2024 13:46

Address review feedback

740de4f

Refactor conversion to Linalg

535d9c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TorchToLinalg] Add `aten.fft_rfft` and lowering #3857

[TorchToLinalg] Add `aten.fft_rfft` and lowering #3857

giacs-epic commented Nov 7, 2024 •

edited

Loading

zjgarvey left a comment

giacs-epic commented Nov 11, 2024 •

edited

Loading

zjgarvey commented Nov 12, 2024

zjgarvey commented Nov 12, 2024

giacs-epic commented Nov 13, 2024 •

edited

Loading

zjgarvey commented Nov 13, 2024

giacs-epic commented Nov 14, 2024

zjgarvey left a comment

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

zjgarvey Nov 18, 2024

zjgarvey Nov 18, 2024

giacs-epic Nov 19, 2024

giacs-epic Nov 20, 2024


		Value real, complex;

		for (const bool isRealPart : {true, false}) {

		Value stack =
		rewriter.create<AtenStackOp>(loc, stackType, sequence, cstMinusOne);

[TorchToLinalg] Add aten.fft_rfft and lowering #3857

Are you sure you want to change the base?

[TorchToLinalg] Add aten.fft_rfft and lowering #3857

Conversation

giacs-epic commented Nov 7, 2024 • edited Loading

zjgarvey left a comment

Choose a reason for hiding this comment

giacs-epic commented Nov 11, 2024 • edited Loading

zjgarvey commented Nov 12, 2024

zjgarvey commented Nov 12, 2024

giacs-epic commented Nov 13, 2024 • edited Loading

zjgarvey commented Nov 13, 2024

giacs-epic commented Nov 14, 2024

zjgarvey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[TorchToLinalg] Add `aten.fft_rfft` and lowering #3857

[TorchToLinalg] Add `aten.fft_rfft` and lowering #3857

giacs-epic commented Nov 7, 2024 •

edited

Loading

giacs-epic commented Nov 11, 2024 •

edited

Loading

giacs-epic commented Nov 13, 2024 •

edited

Loading