v3.0.0 #19

HolyWu · 2022-11-06T16:41:15Z

HolyWu
Nov 6, 2022
Maintainer

Add model paramter to support v4.0~v4.6 models.
Add ensemble parameter to smooth predictions in areas where the estimation is uncertain.
Fix corruption with FP16 mode on 4K video.
Replace multi parameter with factor_num, factor_den, fps_num and fps_den for rational frame rate change.
Add sc and sc_threshold parameters for scene change detection.
Add cuda_graphs parameter to use CUDA Graphs.
Add fusion parameter to enable fusion through nvFuser.
Remove device_type parameter. No one bothers to run deep learning inference on CPU anyway.
Add num_streams parameter for parallel execution.
Remove fp16 parameter and now it's controlled by the format of the clip. RGBH format uses FP16 mode and RGBS format uses FP32 mode.
Add trt, trt_max_workspace_size, and trt_cache_path parameters for TensorRT support.

With the usage of TensorRT, it should run at least 40~50% faster than previous version or RIFE-ncnn-Vulkan implementation using FP16 mode on GPUs with Tensor Cores. For ease of installation on Windows, you can download the CUDA 7z file which contains required runtime libraries and Python wheel file. Either add the unzipped directory to your system PATH or copy the DLL files to a directory which already in your system PATH. Finally pip install the Python wheel file.

This discussion was created from the release v3.0.0.

HolyWu · 2022-11-11T13:29:24Z

HolyWu
Nov 11, 2022
Maintainer Author

Benchmark

Configuration: NVIDIA RTX 3050, driver 526.47, Windows 10 21H2, VS R60, Python 3.10.8, 1080p FP16, model 4.6
Data format: FPS / VRAM usage
Plugin/Package version: RIFE-ncnn-Vulkan r9, vs-mlrt v12, vs-rife v3.0.0

RIFE-ncnn-Vulkan

Args	Result
gpu_thread=1	14.10 fps / 531,380 K
gpu_thread=2	30.71 fps / 971,028 K
gpu_thread=3	30.54 fps / 1,408,532 K

vsmlrt-ORT_CUDA

Args	Result
num_streams=1	14.96 fps / 924,608 K
num_streams=1, use_cuda_graph=True	14.89 fps / 5,952,508 K

vsmlrt-TRT

Args	Result
workspace=1024, num_streams=1	24.84 fps / 451,512 K
workspace=1024, num_streams=2	33.90 fps / 791,708 K
workspace=1024, num_streams=3	33.75 fps / 1,131,904 K
workspace=1024, num_streams=1, use_cuda_graph=True	25.05 fps / 457,656 K
workspace=1024, num_streams=2, use_cuda_graph=True	34.35 fps / 803,996 K
workspace=1024, num_streams=3, use_cuda_graph=True	33.66 fps / 1,150,336 K

vs-rife with PyTorch 1.13.0+cu117 + cuDNN 8.6.0 + TensorRT 8.5.1.7 + Torch-TensorRT 1.2.0

Args	Result
num_streams=1	24.02 fps / 815,856 K
num_streams=2	27.14 fps / 1,401,588 K
num_streams=3	25.59 fps / 1,856,248 K
num_streams=1, fusion=True, cuda_graphs=True	27.76 fps / 760,552 K
num_streams=2, fusion=True, cuda_graphs=True	29.72 fps / 1,327,852 K
num_streams=3, fusion=True, cuda_graphs=True	33.71 fps / 1,915,636 K
num_streams=1, trt=True	33.21 fps / 793,316 K
num_streams=2, trt=True	41.93 fps / 1,422,056 K
num_streams=3, trt=True	45.91 fps / 2,050,796 K

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0 #19

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

v3.0.0 #19

HolyWu Nov 6, 2022 Maintainer

Replies: 1 comment

HolyWu Nov 11, 2022 Maintainer Author

Benchmark

RIFE-ncnn-Vulkan

vsmlrt-ORT_CUDA

vsmlrt-TRT

vs-rife with PyTorch 1.13.0+cu117 + cuDNN 8.6.0 + TensorRT 8.5.1.7 + Torch-TensorRT 1.2.0

HolyWu
Nov 6, 2022
Maintainer

HolyWu
Nov 11, 2022
Maintainer Author