[Torch] Add decomposition for 1d torch.nonzero #3876

AmosLewis · 2024-11-15T04:03:37Z

Target model: migraphx_onnx-model-zoo__gpt2-10

module {
  func.func @main_graph(%arg0: !torch.vtensor<[?],i1>) -> !torch.vtensor<[1,?],si64>  attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.6.0"} {
    %0 = torch.operator "onnx.NonZero"(%arg0) : (!torch.vtensor<[?],i1>) -> !torch.vtensor<[1,?],si64> 
    return %0 : !torch.vtensor<[1,?],si64> 
  }
}

The python implementation: nonzero.py
To fix e2e test error in xida's previous draft #3721

Here is the bug and reproducer mlir: https://gist.github.com/AmosLewis/92717dbe4847649afefc915425629124

Running AtenNonzero1DModule_one_nonzero...
mismatched size for broadcast
./build_tools/ci/test_posix.sh: line 12: 3770074 Aborted                 (core dumped) python -m e2e_testing.main --config=onnx -v --filter AtenNonzero1DModule_one_nonzero

AmosLewis · 2024-11-18T23:56:06Z

module {
  func.func @nonzero_graph(%arg0: !torch.vtensor<[?],f32>) -> !torch.vtensor<[1,?],si64> attributes {torch.onnx_meta.ir_version = 10 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.NonZero"(%arg0) : (!torch.vtensor<[?],f32>) -> !torch.vtensor<[1,?],si64> 
    return %0 : !torch.vtensor<[1,?],si64>
  }
}

iree linalg input>abi>preprocessing>global-optimization>dispatch>flow>BUG

torch-mlir-opt -pass-pipeline='builtin.module(func.func(convert-torch-onnx-to-torch),torch-lower-to-backend-contract,func.func(cse,canonicalize),torch-backend-to-linalg-on-tensors-backend-pipeline)' NonZero.default.torch-onnx.mlir > NonZero.default.onnx.linalg.mlir
iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  NonZero.default.onnx.linalg.mlir --dump-compilation-phases-to=./dispatch
NonZero.default.onnx.linalg.mlir:85:13: error: 'stream.async.dispatch' op has invalid Read access range [0 to -4 for -4] of resource %7 with size -4; start > end
    %26:2 = linalg.generic {indexing_maps = [#map2, #map], iterator_types = ["parallel"]} outs(%23, %25 : tensor<?x1xi32>, tensor<?xi64>) {
            ^
NonZero.default.onnx.linalg.mlir:10:3: note: called from
  func.func @nonzero_graph(%arg0: tensor<?xf32>) -> tensor<1x1xi64> {
  ^
NonZero.default.onnx.linalg.mlir:85:13: note: see current operation: %18:2 = "stream.async.dispatch"(%5, %6, %14, %15, %5, %6, %9, %0, %2, %2, %9, %0, %9, %0, %9, %9) <{affinity = #hal.device.affinity<@__device_0>, entry_points = [@nonzero_graph_dispatch_5::@nonzero_graph_dispatch_5_elementwise_broadcast_D_i32], operandSegmentSizes = array<i32: 2, 4, 2, 2, 2, 2, 2>, tied_operands = [-1 : index, -1 : index]}> : (index, index, !stream.resource<transient>, !stream.resource<transient>, index, index, index, index, index, index, index, index, index, index, index, index) -> (!stream.resource<transient>, !stream.resource<transient>)
    %26:2 = linalg.generic {indexing_maps = [#map2, #map], iterator_types = ["parallel"]} outs(%23, %25 : tensor<?x1xi32>, tensor<?xi64>) {

If I delete the torch.aten.remainder.Tensor, it can be lower to iree successfully.
%25 = torch.aten.remainder.Tensor %24, %2 : !torch.vtensor<[?,1],si64>, !torch.vtensor<[1],si64> -> !torch.vtensor<[1,1],si64>

The mismatched size for broadcast might because the return linalg is static return %transposed : tensor<1x1xi64> while input onnx return is dynamic return %0 : !torch.vtensor<[1,?],si64>

AmosLewis · 2024-11-19T17:18:12Z

Interesting, after I delete the test for the op that using nonzero, the e2e still stuck when run other tests.

AmosLewis · 2024-11-20T00:37:26Z

CI failed at MaskedScatterStaticBasic_basic_nonzerofailed.mlir, which lower to onnx.NonZero
%2 = torch.operator "onnx.NonZero"(%1) : (!torch.vtensor<[4,4],i1>) -> !torch.vtensor<[2,?],si64>

Running MaskedScatterStaticBasic_basic...
ERROR: Runtime op verification failed
"memref.store"(%589, %550, %552, %554) <{nontemporal = false}> : (i64, memref<2x2xi64>, index, index) -> ()
^ out-of-bounds access

AmosLewis · 2024-11-20T02:27:25Z

The issue probably arise from the end = %int-1 in torch.arrange op for dynamic input. Need to figure out a way to fix it.

%8 = torch.aten.arange.start_step %int0, %int-1, %int1, %none, %none, %none, %none : !torch.int, !torch.int, !torch.int, !torch.none, !torch.none, !torch.none, !torch.none -> !torch.vtensor<[?],si64>

    Value rangeTensor = rewriter.create<AtenArangeStartStepOp>(
        loc, cumulativeSumType, c(0),
        rewriter.create<ConstantIntOp>(
            loc, rewriter.getI64IntegerAttr(flattenedInputType.getSizes()[0])),
        one, noneCst, noneCst, noneCst, noneCst);

AmosLewis · 2024-11-20T03:46:25Z

After add dynamic support for AtenArangeStartStepOp, iree linalg bug move forward to
input>abi>preprocessing>global-optimization>dispatch>flow>>stream>executable-sources>executable-config>BUG
error_after_fix_dynamic_end.mlir

iree-compile --iree-hal-target-backends=llvm-cpu model.linalg.mlir -o model.vmfb --dump-compilation-phases-to=./tmp/
failed to translate executables
model.linalg.mlir:21:10: error: 'memref.alloca' op expected no unbounded stack allocations
    %1 = tensor.empty(%dim) : tensor<?xi64>
         ^
model.linalg.mlir:10:3: note: called from
  func.func @main_graph(%arg0: tensor<?xi1>) -> tensor<1x1xi64> {
  ^
model.linalg.mlir:21:10: note: see current operation: %14 = "memref.alloca"(%11) <{alignment = 64 : i64, operandSegmentSizes = array<i32: 1, 0>}> : (index) -> memref<?xi64>
    %1 = tensor.empty(%dim) : tensor<?xi64>
         ^
model.linalg.mlir:32:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
    %7:2 = tm_tensor.scan dimension(0) inclusive(true) ins(%2 : tensor<?xi64>) outs(%4, %6 : tensor<?xi64>, tensor<i64>) {

If test only with torch-mlir

python -m e2e_testing.main --config=onnx -v --filter AtenNonzero1DModule_one_nonzero

****** Failed tests - 1 tests
    FAIL - "AtenNonzero1DModule_one_nonzero"
        @ trace item #0 - call to "forward"
        @ output of call to "forward"
        ERROR: value (Tensor with shape=[1, 1], dtype=torch.int64, min=+0.0, max=+0.0, mean=+0.0) is not close to golden value (Tensor with shape=[1, 1], dtype=torch.int64, min=+2.0, max=+2.0, mean=+2.0)


Summary:
    Failed: 1

[Torch] Add decompose for 1d torch.nonzero

35e20e0

AmosLewis force-pushed the nonzero branch from a7cccd0 to 2399017 Compare November 18, 2024 23:39

AmosLewis requested a review from zjgarvey November 19, 2024 17:09

Fix AtenArangeStartStepOp dynamic end support

f7cd0fc

AmosLewis force-pushed the nonzero branch from 2399017 to f7cd0fc Compare November 20, 2024 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch] Add decomposition for 1d torch.nonzero #3876

[Torch] Add decomposition for 1d torch.nonzero #3876

AmosLewis commented Nov 15, 2024 •

edited

Loading

AmosLewis commented Nov 18, 2024 •

edited

Loading

AmosLewis commented Nov 19, 2024

AmosLewis commented Nov 20, 2024 •

edited

Loading

AmosLewis commented Nov 20, 2024 •

edited

Loading

AmosLewis commented Nov 20, 2024 •

edited

Loading

[Torch] Add decomposition for 1d torch.nonzero #3876

Are you sure you want to change the base?

[Torch] Add decomposition for 1d torch.nonzero #3876

Conversation

AmosLewis commented Nov 15, 2024 • edited Loading

AmosLewis commented Nov 18, 2024 • edited Loading

AmosLewis commented Nov 19, 2024

AmosLewis commented Nov 20, 2024 • edited Loading

AmosLewis commented Nov 20, 2024 • edited Loading

AmosLewis commented Nov 20, 2024 • edited Loading

AmosLewis commented Nov 15, 2024 •

edited

Loading

AmosLewis commented Nov 18, 2024 •

edited

Loading

AmosLewis commented Nov 20, 2024 •

edited

Loading

AmosLewis commented Nov 20, 2024 •

edited

Loading

AmosLewis commented Nov 20, 2024 •

edited

Loading