Optimization for Roberta unstick->reshape->transpose->reshape->stick #3056

AlexandreEichenberger · 2025-01-29T19:22:46Z

In some situations, a sequence of transformations are "no-ops" under a given ztensor representation.

The pattern that is exploited here is the 3DS for (A, B, C*D) <-> (A*C, B, D) which are equivalent when B%32=0 and D%64=0,

The pattern detected and transformed to a high/zlow reshape are the following

and

A high level proof is here below. For a detail proof, one has to follow every steps of the transformations above and show equality of memory accesses, namely that when accessing (e3, e2, e1), we get the same memory location in the original 3DS tensor as well as the final 3DS tensor in the above examples.

In practice, this PR adds 2 rules to catch the above 2 patterns, replace them with a zhigh.Reshape which is similar to the memref.reshape in that it performs no "data layout transformation", just provide mapping between two equivalent shapes. The ZHigh version performs such equivalency for ZTensor formats such as 3D.

THe ZHigh reshape operation is lowered to ZLow equivalent reshape operation, which is then transformed to a memref.reinterpret_cast operations after all members are normalized.

PR adds littlest to catch the patterns listed above, and one for ZHigh to ZLow conversion, and one for ZLow to memref.

I checked that the values generated by Roberta with/without this PR were the same. Performance measurements show that in Roberta, the number of transpose were reduced from 48 to 12 (with a reductions in stick/unstick also by 36 operations). Speedup for the time spent in the transpose/stick/unstick were reduced by 9%, 33%, and 37%. Overall (with one NNPA and once CPU), the time was reduced by 4%.

At this time, this PR is restricted to static shapes.

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger · 2025-01-29T19:23:59Z

src/Accelerators/NNPA/Conversion/ONNXToZHigh/RewriteONNXForZHigh.td

@@ -95,17 +95,6 @@ def replaceONNXBatchNormalizationInferenceModePattern : Pattern<
 //
 //===----------------------------------------------------------------------===//

-


migrated the code elsewhere so that it can be reused, as it was needed to support the reshape op.

AlexandreEichenberger · 2025-01-29T19:25:34Z

src/Accelerators/NNPA/Dialect/ZHigh/ZHighOps/OpHelper.cpp

@@ -402,8 +473,8 @@ AffineMapAttr getTiling2DTo4DMap(OpBuilder &b, Value val) {
  return AffineMapAttr::get(map);
 }

-AffineMapAttr getTiling3DTo4DMap(OpBuilder &b, Value val) {
-  assert(isTiling3DTo4D(val) &&
+AffineMapAttr getLeftmostTiling3DTo4DMap(OpBuilder &b, Value val) {


Many of the prior operations where not specific if they applied to the right most or leftmost position, as only one was needed. As I added more, I made all names more explicit,

tungld · 2025-01-30T01:30:08Z

src/Accelerators/NNPA/Transform/ZLow/ZLowRewrite.cpp

+    IndexExprScope currScope(&rewriter, loc);
+    // Here, cannot use the shape found in the reshape op, as it is the original
+    // shape before memref normalization.
+    Value input = reshapeOp.getX();


We should check if X is normalized or not before processing further, by using something like this:

// Input must have no affine layout. In other words, it has been normalized. if (hasNonIdentityLayout(input.getType())) return failure();

Without this check, I see that, in your lit test zlow-rewrite.mlir, zlow.reshape with affine_maps is still lowered to memref.reinterpret_cast

Got it, I did not realize that zlow-rewrite ran twice. Its now fixed.

tungld · 2025-01-30T01:37:31Z

test/mlir/accelerators/nnpa/transform/zlow-rewrite.mlir

+// CHECK-LABEL:  func.func @handle_zlow_reshape
+// CHECK-SAME:   ([[PARAM_0_:%.+]]: memref<8x384x768xf16, #map>, [[PARAM_1_:%.+]]: memref<96x64x384xf16, #map>) -> memref<96x384x384xf16, #map> {
+
+// CHECK-DAG:       [[VAR_reinterpret_cast_:%.+]] = memref.reinterpret_cast [[PARAM_0_]] to offset: [0], sizes: [96, 384, 64], strides: [24576, 64, 1] : memref<8x384x768xf16, #map> to memref<96x384x64xf16>


It does not look like what we are expecting since the input memref is not normalized.
To check this case, you can

add a new check for this case where we call --normalize-memrefs, by adding this line to top of this file:

// RUN: onnx-mlir-opt --march=z16 --maccel=NNPA --normalize-memrefs --zlow-rewrite --canonicalize %s -split-input-file | FileCheck %s --check-prefix=RESHAPE

then, replace CHECK by RESHAPE in CHECK-DAG, CHECK-LABEL, ... since we use the prefix RESHAPE for this check.

Simply wrote 2 versions of that test, one without and one with memref normalized, and checked that the pattern only applies with memref normalized.

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger · 2025-01-30T16:12:31Z

Thanks @tungld for the feedback, implemented both suggestions.

tungld

LGTM!

Glad to see the performance improvement!

AlexandreEichenberger added 21 commits December 19, 2024 16:20

merge from remote branch

ee16dee

Signed-off-by: Alexandre Eichenberger <[email protected]>

added files

5b6b918

Signed-off-by: Alexandre Eichenberger <[email protected]>

fix tests

5e7e21f

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

97d871a

update

903fcb4

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

0cb084d

update

5355c04

add rudimentary roberta pattern

a33fa39

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

44c8f79

update

a9e8713

added test for size mod 32 and 64

d311ae5

Signed-off-by: Alexandre Eichenberger <[email protected]>

added lit tests

88e7337

Signed-off-by: Alexandre Eichenberger <[email protected]>

get second pattern

b20b23c

Signed-off-by: Alexandre Eichenberger <[email protected]>

added ZHighReshapeOp

88227da

Signed-off-by: Alexandre Eichenberger <[email protected]>

added all the parts

f3ac015

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

a9292ba

added testing the option under test-compiler-option

9e44254

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

e33c2d3

testing

ffd8b01

Signed-off-by: Alexandre Eichenberger <[email protected]>

remove shape from zlow.reshape

adf3efd

Signed-off-by: Alexandre Eichenberger <[email protected]>

added lit tests for reshape in zhigh and zlow

aed20d3

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger commented Jan 29, 2025

View reviewed changes

update

a27daf0

AlexandreEichenberger requested review from tungld and chentong319 January 29, 2025 19:28

tungld reviewed Jan 30, 2025

View reviewed changes

AlexandreEichenberger added 2 commits January 30, 2025 10:56

update

6c190e7

remove test

94f31f7

Signed-off-by: Alexandre Eichenberger <[email protected]>

tungld approved these changes Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization for Roberta unstick->reshape->transpose->reshape->stick #3056

Optimization for Roberta unstick->reshape->transpose->reshape->stick #3056

AlexandreEichenberger commented Jan 29, 2025 •

edited

Loading

AlexandreEichenberger Jan 29, 2025

AlexandreEichenberger Jan 29, 2025

tungld Jan 30, 2025

AlexandreEichenberger Jan 30, 2025

tungld Jan 30, 2025

AlexandreEichenberger Jan 30, 2025

AlexandreEichenberger commented Jan 30, 2025

tungld left a comment

		@@ -95,17 +95,6 @@ def replaceONNXBatchNormalizationInferenceModePattern : Pattern<
		//
		//===----------------------------------------------------------------------===//

Optimization for Roberta unstick->reshape->transpose->reshape->stick #3056

Are you sure you want to change the base?

Optimization for Roberta unstick->reshape->transpose->reshape->stick #3056

Conversation

AlexandreEichenberger commented Jan 29, 2025 • edited Loading

AlexandreEichenberger Jan 29, 2025

Choose a reason for hiding this comment

AlexandreEichenberger Jan 29, 2025

Choose a reason for hiding this comment

tungld Jan 30, 2025

Choose a reason for hiding this comment

AlexandreEichenberger Jan 30, 2025

Choose a reason for hiding this comment

tungld Jan 30, 2025

Choose a reason for hiding this comment

AlexandreEichenberger Jan 30, 2025

Choose a reason for hiding this comment

AlexandreEichenberger commented Jan 30, 2025

tungld left a comment

Choose a reason for hiding this comment

AlexandreEichenberger commented Jan 29, 2025 •

edited

Loading