Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch] Add decomposition for 1d torch.nonzero #3876

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AmosLewis
Copy link
Collaborator

@AmosLewis AmosLewis commented Nov 15, 2024

Target model: migraphx_onnx-model-zoo__gpt2-10

module {
  func.func @main_graph(%arg0: !torch.vtensor<[?],i1>) -> !torch.vtensor<[1,?],si64>  attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.6.0"} {
    %0 = torch.operator "onnx.NonZero"(%arg0) : (!torch.vtensor<[?],i1>) -> !torch.vtensor<[1,?],si64> 
    return %0 : !torch.vtensor<[1,?],si64> 
  }
}

The python implementation: nonzero.py
To fix e2e test error in xida's previous draft #3721

Here is the bug and reproducer mlir: https://gist.github.com/AmosLewis/92717dbe4847649afefc915425629124

Running AtenNonzero1DModule_one_nonzero...
mismatched size for broadcast
./build_tools/ci/test_posix.sh: line 12: 3770074 Aborted                 (core dumped) python -m e2e_testing.main --config=onnx -v --filter AtenNonzero1DModule_one_nonzero

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 18, 2024

module {
  func.func @nonzero_graph(%arg0: !torch.vtensor<[?],f32>) -> !torch.vtensor<[1,?],si64> attributes {torch.onnx_meta.ir_version = 10 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.NonZero"(%arg0) : (!torch.vtensor<[?],f32>) -> !torch.vtensor<[1,?],si64> 
    return %0 : !torch.vtensor<[1,?],si64>
  }
}

iree linalg input>abi>preprocessing>global-optimization>dispatch>flow>BUG

torch-mlir-opt -pass-pipeline='builtin.module(func.func(convert-torch-onnx-to-torch),torch-lower-to-backend-contract,func.func(cse,canonicalize),torch-backend-to-linalg-on-tensors-backend-pipeline)' NonZero.default.torch-onnx.mlir > NonZero.default.onnx.linalg.mlir
iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  NonZero.default.onnx.linalg.mlir --dump-compilation-phases-to=./dispatch
NonZero.default.onnx.linalg.mlir:85:13: error: 'stream.async.dispatch' op has invalid Read access range [0 to -4 for -4] of resource %7 with size -4; start > end
    %26:2 = linalg.generic {indexing_maps = [#map2, #map], iterator_types = ["parallel"]} outs(%23, %25 : tensor<?x1xi32>, tensor<?xi64>) {
            ^
NonZero.default.onnx.linalg.mlir:10:3: note: called from
  func.func @nonzero_graph(%arg0: tensor<?xf32>) -> tensor<1x1xi64> {
  ^
NonZero.default.onnx.linalg.mlir:85:13: note: see current operation: %18:2 = "stream.async.dispatch"(%5, %6, %14, %15, %5, %6, %9, %0, %2, %2, %9, %0, %9, %0, %9, %9) <{affinity = #hal.device.affinity<@__device_0>, entry_points = [@nonzero_graph_dispatch_5::@nonzero_graph_dispatch_5_elementwise_broadcast_D_i32], operandSegmentSizes = array<i32: 2, 4, 2, 2, 2, 2, 2>, tied_operands = [-1 : index, -1 : index]}> : (index, index, !stream.resource<transient>, !stream.resource<transient>, index, index, index, index, index, index, index, index, index, index, index, index) -> (!stream.resource<transient>, !stream.resource<transient>)
    %26:2 = linalg.generic {indexing_maps = [#map2, #map], iterator_types = ["parallel"]} outs(%23, %25 : tensor<?x1xi32>, tensor<?xi64>) {

If I delete the torch.aten.remainder.Tensor, it can be lower to iree successfully.
%25 = torch.aten.remainder.Tensor %24, %2 : !torch.vtensor<[?,1],si64>, !torch.vtensor<[1],si64> -> !torch.vtensor<[1,1],si64>

The mismatched size for broadcast might because the return linalg is static return %transposed : tensor<1x1xi64> while input onnx return is dynamic return %0 : !torch.vtensor<[1,?],si64>

@AmosLewis
Copy link
Collaborator Author

Interesting, after I delete the test for the op that using nonzero, the e2e still stuck when run other tests.

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 20, 2024

CI failed at MaskedScatterStaticBasic_basic_nonzerofailed.mlir, which lower to onnx.NonZero
%2 = torch.operator "onnx.NonZero"(%1) : (!torch.vtensor<[4,4],i1>) -> !torch.vtensor<[2,?],si64>

Running MaskedScatterStaticBasic_basic...
ERROR: Runtime op verification failed
"memref.store"(%589, %550, %552, %554) <{nontemporal = false}> : (i64, memref<2x2xi64>, index, index) -> ()
^ out-of-bounds access

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 20, 2024

The issue probably arise from the end = %int-1 in torch.arrange op for dynamic input. Need to figure out a way to fix it.

%8 = torch.aten.arange.start_step %int0, %int-1, %int1, %none, %none, %none, %none : !torch.int, !torch.int, !torch.int, !torch.none, !torch.none, !torch.none, !torch.none -> !torch.vtensor<[?],si64>

    Value rangeTensor = rewriter.create<AtenArangeStartStepOp>(
        loc, cumulativeSumType, c(0),
        rewriter.create<ConstantIntOp>(
            loc, rewriter.getI64IntegerAttr(flattenedInputType.getSizes()[0])),
        one, noneCst, noneCst, noneCst, noneCst);

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 20, 2024

After add dynamic support for AtenArangeStartStepOp, iree linalg bug move forward to
input>abi>preprocessing>global-optimization>dispatch>flow>>stream>executable-sources>executable-config>BUG
error_after_fix_dynamic_end.mlir

iree-compile --iree-hal-target-backends=llvm-cpu model.linalg.mlir -o model.vmfb --dump-compilation-phases-to=./tmp/
failed to translate executables
model.linalg.mlir:21:10: error: 'memref.alloca' op expected no unbounded stack allocations
    %1 = tensor.empty(%dim) : tensor<?xi64>
         ^
model.linalg.mlir:10:3: note: called from
  func.func @main_graph(%arg0: tensor<?xi1>) -> tensor<1x1xi64> {
  ^
model.linalg.mlir:21:10: note: see current operation: %14 = "memref.alloca"(%11) <{alignment = 64 : i64, operandSegmentSizes = array<i32: 1, 0>}> : (index) -> memref<?xi64>
    %1 = tensor.empty(%dim) : tensor<?xi64>
         ^
model.linalg.mlir:32:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
    %7:2 = tm_tensor.scan dimension(0) inclusive(true) ins(%2 : tensor<?xi64>) outs(%4, %6 : tensor<?xi64>, tensor<i64>) {

If test only with torch-mlir

python -m e2e_testing.main --config=onnx -v --filter AtenNonzero1DModule_one_nonzero

****** Failed tests - 1 tests
    FAIL - "AtenNonzero1DModule_one_nonzero"
        @ trace item #0 - call to "forward"
        @ output of call to "forward"
        ERROR: value (Tensor with shape=[1, 1], dtype=torch.int64, min=+0.0, max=+0.0, mean=+0.0) is not close to golden value (Tensor with shape=[1, 1], dtype=torch.int64, min=+2.0, max=+2.0, mean=+2.0)


Summary:
    Failed: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant