-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental][Kleidi] Add GEMM operator tests #1638
Conversation
digantdesai
commented
Jan 29, 2025
•
edited
Loading
edited
- Adds GEMM op tests against new Kleidi i8mm kernels building on top of [Experimental] Add Kleidi i8mm gemm kernels #1295
- Adds android cross compile support
- Adds new gemm test generator script for Kleidi kernels
- Updates kleidi submodule from 0.4.0 to 1.2.0 (latest - 1 week old)
Generates a fixed combination of c++ tests. One has to manually update the c++ file by copying the script output.
Disable running when cross compiling. $ export ANDROID_NDK=/path/to/ndk/ $ bash build_and_run_tests.sh android # note the positional arg $ adb push /tmp/cmake-out-android/torch_ao/tests/test_linear_8bit_act_xbit_weight /data/local/tmp/ Also add i8mm support for op tests.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1638
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ce71632 with merge base e151d6a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
572f309
to
0c205fa
Compare
* Adds proper GEMM tests with i8mm, using the new test generator * Fixes a small bug with weight pointer calculation * Tested on S24, occasionally hits ATOL every now and then, need to investigate $ ./test_linear_8bit_act_xbit_weight Running main() from /tmp/cmake-out-android/torch_ao/tests/_deps/googletest-src/googletest/src/gtest_main.cc [==========] Running 131 tests from 1 test suite. [----------] Global test environment set-up. [----------] 131 tests from test_linear_8bit_act_xbit_weight [ RUN ] test_linear_8bit_act_xbit_weight.Standard [ OK ] test_linear_8bit_act_xbit_weight.Standard (13 ms) [ RUN ] test_linear_8bit_act_xbit_weight.HasWeightZeros [ OK ] test_linear_8bit_act_xbit_weight.HasWeightZeros (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.HasBias [ OK ] test_linear_8bit_act_xbit_weight.HasBias (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.HasClamp [ OK ] test_linear_8bit_act_xbit_weight.HasClamp (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.SmallDimension [ OK ] test_linear_8bit_act_xbit_weight.SmallDimension (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.KNotDivisibleByGroupSize [ OK ] test_linear_8bit_act_xbit_weight.KNotDivisibleByGroupSize (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.GroupSizeNotDivisibleBy16 [ OK ] test_linear_8bit_act_xbit_weight.GroupSizeNotDivisibleBy16 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn4xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn102xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn222xk32xg32 (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn22xk128xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn26xk64xg64_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m1xn34xk128xg64 (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m2xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m4xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m4xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m3xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m31xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m31xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m32xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m32xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m33xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m33xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m34xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m34xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m35xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m35xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m7xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m7xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m17xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m17xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn102xk32xg32_clamp (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m41xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m41xn222xk32xg32 (7 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m19xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m19xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m23xn22xk128xg32_bias (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m29xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m29xn26xk64xg64_clamp (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m101xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x4x32_m101xn34xk128xg64 (9 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn4xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn102xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn222xk32xg32 (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn22xk128xg32_bias (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn26xk64xg64_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m1xn34xk128xg64 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m2xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m4xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m4xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m3xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m31xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m31xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m32xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m32xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m33xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m33xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m34xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m34xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m35xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m35xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m7xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m7xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m17xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m17xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn102xk32xg32_clamp (2 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m41xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m41xn222xk32xg32 (6 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m19xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m19xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m23xn22xk128xg32_bias (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m29xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m29xn26xk64xg64_clamp (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m101xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_dotprod_1x8x32_m101xn34xk128xg64 (7 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn4xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn102xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn222xk32xg32 (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn22xk128xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn26xk64xg64_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m1xn34xk128xg64 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m2xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m4xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m4xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m3xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m31xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m31xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m32xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m32xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m33xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m33xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m34xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m34xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m35xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m35xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m7xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m7xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m17xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m17xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn102xk32xg32_clamp (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m41xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m41xn222xk32xg32 (5 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m19xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m19xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m23xn22xk128xg32_bias (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m29xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m29xn26xk64xg64_clamp (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m101xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_4x8x32_m101xn34xk128xg64 (7 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn4xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn102xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn222xk32xg32 (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn22xk128xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn26xk64xg64_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m1xn34xk128xg64 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m2xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m4xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m4xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m3xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m31xn2xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m31xn2xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m32xn4xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m32xn4xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m33xn6xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m33xn6xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m34xn8xk32xg32_bias_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m34xn8xk32xg32_bias_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m35xn6xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m35xn6xk32xg32_clamp (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m7xn22xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m7xn22xk32xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m17xn26xk32xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m17xn26xk32xg32_bias (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn102xk32xg32_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn102xk32xg32_clamp (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m41xn222xk32xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m41xn222xk32xg32 (5 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m19xn14xk64xg32 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m19xn14xk64xg32 (0 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn22xk128xg32_bias [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m23xn22xk128xg32_bias (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m29xn26xk64xg64_clamp [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m29xn26xk64xg64_clamp (1 ms) [ RUN ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m101xn34xk128xg64 [ OK ] test_linear_8bit_act_xbit_weight.Kleidi_i8mm_8x4x32_m101xn34xk128xg64 (7 ms) [----------] 131 tests from test_linear_8bit_act_xbit_weight (137 ms total) [----------] Global test environment tear-down [==========] 131 tests from 1 test suite ran. (137 ms total) [ PASSED ] 131 tests.
0c205fa
to
ce71632
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
def main(): | ||
kleidi_template = Template( | ||
""" | ||
/*****************/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little late but consider putting a header suggesting this is autogenerated by this particular script and how
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is there, see line 106 in this file.
As per how, script dumps c++ code on stdout right now, and then manual copy-pasta 🍝
Added this as a note in this commit FWIW. We should improve this, but on the back burner I guess.