trt-lightNet is a CNN implementation optimized for edge AI devices that combines the advantages of LightNet [1] and TensorRT [2]. LightNet is a lightweight and high-performance neural network framework designed for edge devices, while TensorRT is a high-performance deep learning inference engine developed by NVIDIA for optimizing and running deep learning models on GPUs. trt-lightnet uses the Network Definition API provided by TensorRT to integrate LightNet into TensorRT, allowing it to run efficiently and in real-time on edge devices. This is a reproduction of trt-lightnet [6], which generates a TensorRT engine from the ONNX format.
trt-lightnet utilizes 2:4 structured sparsity [3] to further optimize the network. 2:4 structured sparsity means that two values must be zero in each contiguous block of four values, resulting in a 50% reduction in the number of weights. This technique allows the network to use fewer weights and computations while maintaining accuracy.
trt-lightnet also supports the execution of the neural network on the NVIDIA Deep Learning Accelerator (NVDLA) [4] , a free and open architecture that provides high performance and low power consumption for deep learning inference on edge devices. By using NVDLA, trt-lightnet can further improve the efficiency and performance of the network on edge devices.
In addition to post training quantization [5], trt-lightnet also supports multi-precision quantization, which allows the network to use different precision for weights and activations. By using mixed precision, trt-lightnet can further reduce the memory usage and computational requirements of the network while still maintaining accuracy. By writing it in CFG, you can set the precision for each layer of your CNN.
trt-lightnet also supports multitask execution, allowing the network to perform both object detection and segmentation tasks simultaneously. This enables the network to perform multiple tasks efficiently on edge devices, saving computational resources and power.
-
CUDA 11.0 or later
-
TensorRT 8.5 or 8.6
-
cnpy for debug of tensors This repository has been tested with the following environments:
-
CUDA 11.7 + TensorRT 8.5.2 on Ubuntu 22.04
-
CUDA 12.2 + TensorRT 8.6.0 on Ubuntu 22.04
-
CUDA 11.4 + TensorRT 8.6.0 on Jetson JetPack5.1
-
CUDA 11.8 + TensorRT 8.6.1 on Ubuntu 22.04
-
gcc <= 11.x
- Docker
- NVIDIA Container Toolkit
This repository has been tested with the following environments:
- Docker 24.0.7 + NVIDIA Container Toolkit 1.14.3 on Ubuntu 20.04
- Clone the repository, and the dependent packages
$ git clone --recurse-submodules [email protected]:tier4/trt-lightnet.git
$ cd trt-lightnet
- Install libraries.
$ sudo apt update
$ sudo apt install libgflags-dev
$ sudo apt install libboost-all-dev
$ sudo apt install libopencv-dev
$ sudo apt-get install libeigen3-dev
$ sudo apt install nlohmann-json3-dev
- Compile the TensorRT implementation.
$ mkdir build && cd build
$ cmake ../
$ make -j
- Clone the repository.
$ git clone --recurse-submodules [email protected]:tier4/trt-lightnet.git
$ cd trt-lightnet
- Build the docker image.
# For x86
$ docker build -f Dockerfile_x86 -t trt-lightnet:latest .
# For aarch64
$ docker build -f Dockerfile_aarch64 -t trt-lightnet:latest .
- Run the docker container.
# For x86
$ docker run -it --gpus all trt-lightnet:latest
# For aarch64
$ docker run -it --runtime=nvidia trt-lightnet:latest
T.B.D
Build FP32 engine
$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision fp32
Build FP16(HALF) engine
$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision fp16
Build INT8 engine
(You need to prepare a list for calibration in "configs/calibration_images.txt".)
$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision int8 --first true
First layer is much more sensitive for quantization. Threfore, the first layer is not quanitzed using "--first true"
Build DLA engine (Supported by only Xavier and Orin)
$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision int8 --first true --dla [0/1]
Inference from images
$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision [fp32/fp16/int8] --first true {--dla [0/1]} --d DIRECTORY
Inference from video
$ ./trt-lightnet --flagfile ../configs/CONFIGS.txt --precision [fp32/fp16/int8] --first true {--dla [0/1]} --v VIDEO
Here shows a part of most commonly used options for trt-lightnet
. For more flags implemented, please refer to src/config_parser.cpp
-
--flagfile <path>
(required):- The path to the config file, which contains some basic operations (e.g. onnx, thresh)
- Note that the options in the config file can be overwritten from command line.
- Example:
../configs/CONFIGS.txt
-
--precision <level>
(required):- Specified the quantization level during building the inference engine. Available options are:
fp32
: Full precision inference enginefp16
: Half precision inference engineint8
: int8 precision inference engine
- Note that, if
int8
is picked, it requirescalibration_images.txt
inconfigs/
directory. - Example:
int8
- Specified the quantization level during building the inference engine. Available options are:
-
--first
(optional):- A boolean flag to choose if applying quantization to first layer or not.
- Example:
true
- Example:
- In general, the first layer is a sensitive layer, where the quantization may leads to precision drop. So set
--first
astrue
to skip the quantization is recommended.
- A boolean flag to choose if applying quantization to first layer or not.
-
--d <path>
(optional):- The path to the directory of images
- Example:
../sample_data/images
- During the inference, user can press
space
to jump to next image to infer.
-
--v <path>
(optional):- The path to the video file
- Example:
../sample_data/sample.mp4
-
--save-detections
(optional):- A boolean flag to choose if save the detections result or not.
- Example:
true
-
--save-detections-path
(optional):- The flag to determinate the output directory if
--save-detections
is settrue
- Example:
../workspace/detections_result
- The flag to determinate the output directory if
trt-lightnet is built on the LightNet framework and integrates with TensorRT using the Network Definition API. The implementation is based on the following repositories:
- LightNet: https://github.com/daniel89710/lightNet
- TensorRT: https://github.com/NVIDIA/TensorRT
- NVIDIA DeepStream SDK: https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/restructure
- YOLO-TensorRT: https://github.com/enazoe/yolo-tensorrt
- trt-yoloXP: [https://github.com/tier4/trt-yoloXP]
trt-lightnet is a powerful and efficient implementation of CNNs using Edge AI. With its advanced features and integration with TensorRT, it is an excellent choice for real-time object detection and semantic segmentation applications on edge devices.
[1]. LightNet
[2]. TensorRT
[3]. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
[4]. NVDLA
[5]. Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT
[6]. lightNet-TR