Skip to content

🔥🔥🔥 Simple and fast YOLOX deployment based on TensorRT🚀🚀🚀 基于TensorRT的简单高速的YOLOX部署

Notifications You must be signed in to change notification settings

xuyixuan1999/YOLOXu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorRT-YOLOXu

Cuda TensorRT ubuntu

  • 🔥 Support TensorRT for YOLOX.
  • 🚀 Easy, isolation of the detection and inference.
  • ⚡ Fast, preprocess and postprocess with CUDA kernel function.

street

NEWS

  • 2023-09-05 Suppurt GPU of preprocess and postprocess
  • 2023-08-30 Suppurt CPU of preprocess and postprocess
  • 2023-08-28 Suppurt easy TensorRT by the Infer class

ONNX2TRT

Please use the official export script to export the ONNX file. Then use trtexec to convert the ONNX file to trt engine.

We provide the ONNX Model, please place in the workspace folder and unzip, convert into trtengine.

trtexec \
--onnx=yolox_s.onnx \
--saveEngine=yolox_s.trt \
--minShapes=images:1x3x640x640 \
--optShapes=images:1x3x640x640 \
--maxShapes=images:1x3x640x640 \
--memPoolSize=workspace:2048MiB

Build

Please install the TensorRT and CUDA first.

And change the CUDA_DIR, CUDNN_DIR, TENSORRT_DIR and CUDA_ARCH in the CMakeLists.txt file. The CUDA_ARCH is the GPU architecture, like Jetson AGX Orin's CUDA_ARCH is: sm_87, you can find it in the CUDA_ARCH page.

mkdir build && cd build
cmake ..
make -j

Infer

We isolate detection and inference into two classes: 'Infer' and 'Yolo.

The 'Infer' class is designed to make learning TensorRT easier, especially when it comes to setting parameters, managing memory, and performing inference.

You only need to load the TRT engine and prepare the memory.

// create infer
Infer* infer = new Infer(engine_dir, ILogger::Severity::kWARNING);
// memory copy 
infer->CopyFromHostToDevice(input, 0);
// inference
infer->Forward();
// memory copy 
infer->CopyFromDeviceToHost(output, 1);

Yolo

The 'Yolo' class is designed for fast detection and deployment on edge devices.

We perform all preprocessing and postprocessing using CUDA operations, achieving fast detection.

  • Inference with GPU, and preprocess and postprocess with GPU:

      #define USE_DEVICE true
      // create yolo
      Yolo* yolo = new Yolo(INPUT_H, INPUT_W, NUM_CLASSES, BBOX_CONF_THRESH, IOU_THRESH, USE_DEVICE);
      // INPUT_H and INPUT_W is the shape of the input of engine
    
      // prepare to save the objects
      std::vector<Object> objects;
    
      // load image
      cv::Mat img = cv::imread(input_image_path);
      // preprocess and inference
      yolo->PreProcessDevice(img);
      infer->CopyFromDeviceToDeviceIn(yolo->mInputCHWHost, 0);
      infer->Forward();
      infer->CopyFromDeviceToDeviceOut(yolo->mOutputSrcHost, 1);
      yolo->PostProcessDevice(objects);
  • Inference with GPU, and preprocess and postprocess with CPU:

    #define USE_DEVICE false
    // create yolo 
    Yolo* yolo = new Yolo(INPUT_H, INPUT_W, NUM_CLASSES, BBOX_CONF_THRESH, IOU_THRESH, USE_DEVICE);
    // INPUT_H and INPUT_W is the shape of the input of engine
    
    // prepare to save the objects
    std::vector<Object> objects;
    
    // load image
    cv::Mat img = cv::imread(input_image_path);
    // preprocess and inference
    yolo->PreProcess(img);
    infer->CopyFromHostToDevice(yolo->mInputCHWHost, 0);
    infer->Forward();
    infer->CopyFromDeviceToHost(yolo->mOutputSrcHost, 1);
    yolo->PostProcess(objects);

Reference

  1. TensorRT-Alpha
  2. tiny-tensorrt
  3. tensorRT_Pro

About

🔥🔥🔥 Simple and fast YOLOX deployment based on TensorRT🚀🚀🚀 基于TensorRT的简单高速的YOLOX部署

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published