- 🔥 Support TensorRT for YOLOX.
- 🚀 Easy, isolation of the detection and inference.
- ⚡ Fast, preprocess and postprocess with CUDA kernel function.
2023-09-05
Suppurt GPU of preprocess and postprocess2023-08-30
Suppurt CPU of preprocess and postprocess2023-08-28
Suppurt easy TensorRT by the Infer class
Please use the official export script to export the ONNX file. Then use trtexec to convert the ONNX file to trt engine.
We provide the ONNX Model, please place in the workspace folder and unzip, convert into trtengine.
trtexec \
--onnx=yolox_s.onnx \
--saveEngine=yolox_s.trt \
--minShapes=images:1x3x640x640 \
--optShapes=images:1x3x640x640 \
--maxShapes=images:1x3x640x640 \
--memPoolSize=workspace:2048MiB
Please install the TensorRT and CUDA first.
And change the CUDA_DIR
, CUDNN_DIR
, TENSORRT_DIR
and CUDA_ARCH
in the CMakeLists.txt file. The CUDA_ARCH
is the GPU architecture, like Jetson AGX Orin's CUDA_ARCH is: sm_87
, you can find it in the CUDA_ARCH page.
mkdir build && cd build
cmake ..
make -j
We isolate detection and inference into two classes: 'Infer' and 'Yolo.
The 'Infer' class is designed to make learning TensorRT easier, especially when it comes to setting parameters, managing memory, and performing inference.
You only need to load the TRT engine and prepare the memory.
// create infer
Infer* infer = new Infer(engine_dir, ILogger::Severity::kWARNING);
// memory copy
infer->CopyFromHostToDevice(input, 0);
// inference
infer->Forward();
// memory copy
infer->CopyFromDeviceToHost(output, 1);
The 'Yolo' class is designed for fast detection and deployment on edge devices.
We perform all preprocessing and postprocessing using CUDA operations, achieving fast detection.
-
Inference with GPU, and preprocess and postprocess with GPU:
#define USE_DEVICE true // create yolo Yolo* yolo = new Yolo(INPUT_H, INPUT_W, NUM_CLASSES, BBOX_CONF_THRESH, IOU_THRESH, USE_DEVICE); // INPUT_H and INPUT_W is the shape of the input of engine // prepare to save the objects std::vector<Object> objects; // load image cv::Mat img = cv::imread(input_image_path); // preprocess and inference yolo->PreProcessDevice(img); infer->CopyFromDeviceToDeviceIn(yolo->mInputCHWHost, 0); infer->Forward(); infer->CopyFromDeviceToDeviceOut(yolo->mOutputSrcHost, 1); yolo->PostProcessDevice(objects);
-
Inference with GPU, and preprocess and postprocess with CPU:
#define USE_DEVICE false // create yolo Yolo* yolo = new Yolo(INPUT_H, INPUT_W, NUM_CLASSES, BBOX_CONF_THRESH, IOU_THRESH, USE_DEVICE); // INPUT_H and INPUT_W is the shape of the input of engine // prepare to save the objects std::vector<Object> objects; // load image cv::Mat img = cv::imread(input_image_path); // preprocess and inference yolo->PreProcess(img); infer->CopyFromHostToDevice(yolo->mInputCHWHost, 0); infer->Forward(); infer->CopyFromDeviceToHost(yolo->mOutputSrcHost, 1); yolo->PostProcess(objects);