This repo contains scripts and a tool to reproduce the NNAPI delegate issue with models containing Concatenation and Dense nodes. The issue appears on Snapdragon 888 (we could not reproduce the issue on Snapdragon 855 and Snapdragon 865). Here is a summary of our findings:
- On devices with Snapdragon 888 (tested with Android 12), the NNAPI delegate always crashes when there is a Quantize node right before a Concatenation node. For instance, consider the following model:
When we quantize this model (INT8), tflite converter adds two Quantize nodes right after the inputs (right before the Concatenation node):
Our experiments show that on Snapdragon 888, whenever there is a Quantize node before a Concatenation node, NNAPI delegate crashes. For this specific case, we can avoid the crash by adding two Identity nodes after model's inputs. We created several dummy models and it turned out MaxPool2D(1,1) is the identity node that can resolve the issue (other Identity nodes like Relu could not help us):
The above-mentioned workaround only works when there is a Concatenation node after model inputs. However, we faced cases where tflite converter adds a Quantize node before intra model Concatenation nodes:
In such cases, the only solution that we could find is to edit the FlatBuffer binary of the tflite model and replace the Quantize node with an Identity node (e.g. Relu). However, unfortunately this workaround could affect model accuracy (specially in light weight models).
- On devices with Snapdragon 888 (tested with Android 12), the INT8 tflite version of a model in which the kernel size in at least one of the Dense layers is larger than 1024x3920, always crashes with the NNAPI delegate. We tried different parameters for the Dense layers and it turned out the threshold for the kernel size is between 1024x3920 and 1024x4096. It means that a Dense layer with kernel size of 1024x3920 does not crash but if you increase the kernel size somewhere it will crash on Snapdragon 888.
model_files
folder contains simple models representing the above-mentioned issues.- You can also use
generate_dummy_model.py
to build the models and useconvert_model.py
to convert them to tflite.
- You can also use
We have implemented a small tool to feed an input to our sample INT8 tflite models using the NNAPI
delegate.
- Linux host computer
- Connectivity to the target device via adb
- Android NDK, version 22 or later
- CMake 3.18 or later
- Unzip the
tensorflow_lite_cpp_2_9_1_static.zip
file inside thetflite_inference_tool
folder. - In a terminal, from
tflite_inference_tool
folder:
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles"
-DCMAKE_SYSTEM_NAME=Android
-DANDROID_ABI=arm64-v8a
-DANDROID_STL=c++_shared
-DANDROID_NATIVE_API_LEVEL=27
-DCMAKE_VERBOSE_MAKEFILE=ON
-DCMAKE_TOOLCHAIN_FILE=<path-to-ndk>/build/cmake/android.toolchain.cmake
-DCMAKE_BUILD_TYPE=Release
-DTensorFlowLite_ROOT=../tensorflow_lite_cpp_2_9_1_static ..
$ make
-
Here, you must replace with the absolute path of the ndk installed on your computer. If you installed NDK through Android studio, it is typically located at:
/home/<username>/Android/Sdk/ndk/<version>/
on Linux -
tensorflow_lite_cpp_2_9_1_static
is TensorflowFlow Lite library package.
WARNING: This step will write to your /data/local/tmp
folder on device. Please make sure existing files in that folder are backed up as needed.
In a terminal, from tflite_inference_tool
folder:
$ adb push ./build/model_test /data/local/tmp
$ adb push ./model_files /data/local/tmp
To run the tool you should enter the name of the model you would like to run. In the following, we have listed the output of tool when running on Snapdragon 888:
Sample 1:
$ adb shell "cd /data/local/tmp && LD_LIBRARY_PATH=. ./model_test --model=int8_quantize_concat.tflite"
INFO: Created TensorFlow Lite delegate for NNAPI.
INFO: Initialized TensorFlow Lite runtime.
INFO: Replacing 5 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 1 partitions.
ERROR: NN API returned error ANEURALNETWORKS_OP_FAILED at line 4650 while completing NNAPI compilation.
ERROR: Node number 5 (TfLiteNnapiDelegate) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.
Segmentation fault
Sample 2:
$ adb shell "cd /data/local/tmp && LD_LIBRARY_PATH=. ./model_test --model=int8_large_Dense.tflite"
INFO: Created TensorFlow Lite delegate for NNAPI.
INFO: Initialized TensorFlow Lite runtime.
INFO: Replacing 3 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 1 partitions.
ERROR: NN API returned error ANEURALNETWORKS_OP_FAILED at line 4650 while completing NNAPI compilation.
ERROR: Node number 3 (TfLiteNnapiDelegate) failed to prepare.
ERROR: Restored original execution plan after delegate application failure.
Segmentation fault