Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental][Kleidi] Add GEMM operator tests #1638

Merged
merged 4 commits into from
Jan 30, 2025
Merged

Conversation

digantdesai
Copy link
Contributor

@digantdesai digantdesai commented Jan 29, 2025

  • Adds GEMM op tests against new Kleidi i8mm kernels building on top of [Experimental] Add Kleidi i8mm gemm kernels #1295
  • Adds android cross compile support
  • Adds new gemm test generator script for Kleidi kernels
  • Updates kleidi submodule from 0.4.0 to 1.2.0 (latest - 1 week old)
$ ./test_linear_8bit_act_xbit_weight
Running main() from /tmp/cmake-out-android/torch_ao/tests/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 131 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 131 tests from test_linear_8bit_act_xbit_weight
[ RUN      ] test_linear_8bit_act_xbit_weight.Standard
[       OK ] test_linear_8bit_act_xbit_weight.Standard (13 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.HasWeightZeros
[       OK ] test_linear_8bit_act_xbit_weight.HasWeightZeros (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.HasBias
[       OK ] test_linear_8bit_act_xbit_weight.HasBias (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.HasClamp
[       OK ] test_linear_8bit_act_xbit_weight.HasClamp (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.SmallDimension
[       OK ] test_linear_8bit_act_xbit_weight.SmallDimension (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.KNotDivisibleByGroupSize
[       OK ] test_linear_8bit_act_xbit_weight.KNotDivisibleByGroupSize (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.GroupSizeNotDivisibleBy16
[       OK ] test_linear_8bit_act_xbit_weight.GroupSizeNotDivisibleBy16 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn222xk32xg32 (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk128xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn34xk128xg64 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn102xk32xg32_clamp (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m41xn222xk32xg32 (7 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn22xk128xg32_bias (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m101xn34xk128xg64 (9 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn222xk32xg32 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn34xk128xg64 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn102xk32xg32_clamp (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m41xn222xk32xg32 (6 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m101xn34xk128xg64 (7 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn222xk32xg32 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk128xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn34xk128xg64 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn102xk32xg32_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m41xn222xk32xg32 (5 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m101xn34xk128xg64 (7 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn222xk32xg32 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk128xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn34xk128xg64 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn102xk32xg32_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m41xn222xk32xg32 (5 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m101xn34xk128xg64 (7 ms)
[----------] 131 tests from test_linear_8bit_act_xbit_weight (137 ms total)

[----------] Global test environment tear-down
[==========] 131 tests from 1 test suite ran. (137 ms total)
[  PASSED  ] 131 tests.

Generates a fixed combination of c++ tests. One has to manually update
the c++ file by copying the script output.
Disable running when cross compiling.

$ export ANDROID_NDK=/path/to/ndk/
$ bash build_and_run_tests.sh android # note the positional arg
$ adb push /tmp/cmake-out-android/torch_ao/tests/test_linear_8bit_act_xbit_weight /data/local/tmp/

Also add i8mm support for op tests.
@digantdesai digantdesai requested a review from metascroy January 29, 2025 05:57
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 29, 2025
Copy link

pytorch-bot bot commented Jan 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1638

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce71632 with merge base e151d6a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@digantdesai digantdesai added topic: not user facing Use this tag if you don't want this PR to show up in release notes cpu labels Jan 29, 2025
@digantdesai digantdesai marked this pull request as draft January 29, 2025 14:02
* Adds proper GEMM tests with i8mm, using the new test generator
* Fixes a small bug with weight pointer calculation
* Tested on S24, occasionally hits ATOL every now and then, need to investigate

$ ./test_linear_8bit_act_xbit_weight
Running main() from /tmp/cmake-out-android/torch_ao/tests/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 131 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 131 tests from test_linear_8bit_act_xbit_weight
[ RUN      ] test_linear_8bit_act_xbit_weight.Standard
[       OK ] test_linear_8bit_act_xbit_weight.Standard (13 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.HasWeightZeros
[       OK ] test_linear_8bit_act_xbit_weight.HasWeightZeros (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.HasBias
[       OK ] test_linear_8bit_act_xbit_weight.HasBias (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.HasClamp
[       OK ] test_linear_8bit_act_xbit_weight.HasClamp (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.SmallDimension
[       OK ] test_linear_8bit_act_xbit_weight.SmallDimension (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.KNotDivisibleByGroupSize
[       OK ] test_linear_8bit_act_xbit_weight.KNotDivisibleByGroupSize (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.GroupSizeNotDivisibleBy16
[       OK ] test_linear_8bit_act_xbit_weight.GroupSizeNotDivisibleBy16 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn222xk32xg32 (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk128xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn34xk128xg64 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn102xk32xg32_clamp (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m41xn222xk32xg32 (7 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn22xk128xg32_bias (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m101xn34xk128xg64 (9 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn222xk32xg32 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn34xk128xg64 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn102xk32xg32_clamp (2 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m41xn222xk32xg32 (6 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m101xn34xk128xg64 (7 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn222xk32xg32 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk128xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn34xk128xg64 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn102xk32xg32_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m41xn222xk32xg32 (5 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m101xn34xk128xg64 (7 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn102xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn222xk32xg32 (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk128xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk64xg64_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn34xk128xg64 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m4xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m4xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m31xn2xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m31xn2xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m32xn4xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m32xn4xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m33xn6xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m33xn6xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m34xn8xk32xg32_bias_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m34xn8xk32xg32_bias_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m35xn6xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m35xn6xk32xg32_clamp (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m7xn22xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m7xn22xk32xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m17xn26xk32xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m17xn26xk32xg32_bias (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn102xk32xg32_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn102xk32xg32_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m41xn222xk32xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m41xn222xk32xg32 (5 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m19xn14xk64xg32
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m19xn14xk64xg32 (0 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn22xk128xg32_bias
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn22xk128xg32_bias (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m29xn26xk64xg64_clamp
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m29xn26xk64xg64_clamp (1 ms)
[ RUN      ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m101xn34xk128xg64
[       OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m101xn34xk128xg64 (7 ms)
[----------] 131 tests from test_linear_8bit_act_xbit_weight (137 ms total)

[----------] Global test environment tear-down
[==========] 131 tests from 1 test suite ran. (137 ms total)
[  PASSED  ] 131 tests.
@digantdesai digantdesai marked this pull request as ready for review January 29, 2025 20:00
@digantdesai digantdesai changed the title [Experimental][Kleidi] Add GEMM operators [Experimental][Kleidi] Add GEMM operator tests Jan 29, 2025
Copy link
Contributor

@metascroy metascroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@digantdesai digantdesai merged commit b559c6d into main Jan 30, 2025
20 checks passed
@digantdesai digantdesai deleted the i8mm_kleidi branch January 30, 2025 03:12
def main():
kleidi_template = Template(
"""
/*****************/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little late but consider putting a header suggesting this is autogenerated by this particular script and how

Copy link
Contributor Author

@digantdesai digantdesai Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is there, see line 106 in this file.

As per how, script dumps c++ code on stdout right now, and then manual copy-pasta 🍝
Added this as a note in this commit FWIW. We should improve this, but on the back burner I guess.

@digantdesai digantdesai restored the i8mm_kleidi branch January 30, 2025 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. cpu topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants