onnxruntime的gpu怎么支持？ #10

xinsuinizhuan · 2021-08-01T14:14:19Z

No description provided.

DefTruth · 2021-08-01T14:21:02Z

你需要额外编译GPU版本的onnxruntime的GPU版本的库哦，我这没专门测试，代码是不用改的，但链接的库需要改。

onnxruntime 官方编译的包在这 onnxruntime-win-gpu-x64-1.8.1.zip

另外，有个小伙伴尝试了GPU下的测试 #9 ，应该是ok的

xinsuinizhuan · 2021-08-01T15:02:13Z

你需要额外编译GPU版本的onnxruntime的GPU版本的库哦，我这没专门测试，代码是不用改的，但链接的库需要改。
* onnxruntime 官方编译的包在这 [onnxruntime-win-gpu-x64-1.8.1.zip](https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-win-gpu-x64-1.8.1.zip)
另外，有个小伙伴尝试了GPU下的测试 #9 ，应该是ok的
话说回来，现在onnxruntime竟然有官方built了，真是幸福，之前都是得build from source~ 🙃🙃🙃

这个是1.8.1的库，我们的支持吗？
另外，我之前一直是使用的onnxruntime1.6.0的库，然后，就是默认单线程跑，yolov5和yolox都是95ms左右，然后直接替换了1.7.0的库后，就变成了247ms左右，然后把线程改成了8，就又恢复到了95ms左右。

DefTruth · 2021-08-01T15:07:07Z

我这默认的线程数是1，你可以尝试别的线程数。

class LITE_EXPORTS YoloX : public BasicOrtHandler
{
  public:
    explicit YoloX(const std::string &_onnx_path, unsigned int _num_threads = 1) :  // 线程数默认为1
        BasicOrtHandler(_onnx_path, _num_threads)
    {};
}

可以修改为别的线程数：

auto *yolox = new lite::cv::detection::YoloX(onnx_path, 8);  // 8 threads.

1.8.1的库，我还没有测过高版本的库，一般情况下这些库都是向后兼容的。你可以直接替换1.8.1的库和头文件试一下。另外需要注意的是，可能你得看一下官方说明里边对于CUDA版本的要求。希望小伙伴能尝试成功啊~

xinsuinizhuan · 2021-08-01T15:09:20Z

我这默认的线程数是1，你可以尝试别的线程数。
class LITE_EXPORTS YoloX : public BasicOrtHandler
{
  public:
    explicit YoloX(const std::string &_onnx_path, unsigned int _num_threads = 1) :  // 线程数默认为1
        BasicOrtHandler(_onnx_path, _num_threads)
    {};
}
可以修改为别的线程数：
auto *yolox = new lite::cv::detection::YoloX(onnx_path, 8);  // 8 threads.
1.8.1的库，我还没有测过高版本的库，一般情况下这些库都是向后兼容的。你可以直接替换1.8.1的库和头文件试一下。另外需要注意的是，可能你得看一下官方说明里边对于CUDA版本的要求。希望小伙伴能尝试成功啊~

嗯嗯。刚刚下载了1.8.1的gpu的库，然后重新编译试了试，结果没啥效果，更1.7.0 cpu版本一样，甚至还不如，一张图改成了8线程后，107ms。应该是要使用gpu，是需要一个设置和变量的。

DefTruth · 2021-08-01T15:25:24Z

可以尝试下设置CUDAProvider, 在ort_handler.cpp中修改session_options.

// GPU compatiable.  尝试增加以下2句，emmm..... 我没有windows环境，其实没试过
OrtCUDAProviderOptions provider_options; // C接口
session_options.AppendExecutionProvider_CUDA(provider_options);  
// 1. session
ort_session = new Ort::Session(ort_env, onnx_path, session_options);

以及可以参考下这个官方案例fns_candy_style_transfer.c 里面有不同Provider的设置方法。比如.

#ifdef USE_CUDA
void enable_cuda(OrtSessionOptions* session_options) {
  ORT_ABORT_ON_ERROR(OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0));
}
#endif

使用了OrtSessionOptionsAppendExecutionProvider_CUDA，对应到c++的API, 在onnxruntime_cxx_inline.h中：

// C++ API调用转接到C API
inline SessionOptions& SessionOptions::AppendExecutionProvider_CUDA(const OrtCUDAProviderOptions& provider_options) {
  ThrowOnError(GetApi().SessionOptionsAppendExecutionProvider_CUDA(p_, &provider_options));
  return *this;
}
// OrtCUDAProviderOptions是默认的设置  
typedef struct OrtCUDAProviderOptions {
  int device_id;                                  // cuda device with id=0 as default device.
  OrtCudnnConvAlgoSearch cudnn_conv_algo_search;  // cudnn conv algo search option
  size_t cuda_mem_limit;                          // default cuda memory limitation to maximum finite value of size_t.
  int arena_extend_strategy;                      // default area extend strategy to KNextPowerOfTwo.
  int do_copy_in_default_stream;
  int has_user_compute_stream;
  void* user_compute_stream;
} OrtCUDAProviderOptions;  
// cuda_provider_factory.cc 中的内部实现 只有GPU版本会启用 CPU的则是直接返回ERROR
ORT_API_STATUS_IMPL(OrtApis::SessionOptionsAppendExecutionProvider_CUDA,
                    _In_ OrtSessionOptions* options, _In_ const OrtCUDAProviderOptions* cuda_options) {
  CUDAExecutionProviderInfo info{};
  info.device_id = gsl::narrow<OrtDevice::DeviceId>(cuda_options->device_id); // device_id默认是0
  info.cuda_mem_limit = cuda_options->cuda_mem_limit;
  info.arena_extend_strategy = static_cast<onnxruntime::ArenaExtendStrategy>(cuda_options->arena_extend_strategy);
  info.cudnn_conv_algo_search = cuda_options->cudnn_conv_algo_search;
  info.do_copy_in_default_stream = cuda_options->do_copy_in_default_stream;
  info.has_user_compute_stream = cuda_options->has_user_compute_stream;
  info.user_compute_stream = cuda_options->user_compute_stream;
  options->provider_factories.push_back(onnxruntime::CreateExecutionProviderFactory_CUDA(info));

  return nullptr;
}

1VeniVediVeci1 · 2021-08-01T16:28:39Z

GPU推理需要在ort_handler的initialize_handler()里加上OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);

1VeniVediVeci1 · 2021-08-01T16:31:26Z

gpu除了nuget安装外，不需要额外配置环境，因为cuda在安装时已经在电脑的环境变量中添加了路径

xinsuinizhuan · 2021-08-02T01:11:37Z

 ORT_ABORT_ON_ERROR(OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0));

不对的，我加了这句话后，崩溃：
跟cuda cudnn什么的怎么匹配？

加了
/ GPU compatiable. 尝试增加以下2句，emmm..... 我没有windows环境，其实没试过
OrtCUDAProviderOptions provider_options; // C接口
session_options.AppendExecutionProvider_CUDA(provider_options);
// 1. session
ort_session = new Ort::Session(ort_env, onnx_path, session_options);

后，new Ort::Session崩溃

DefTruth · 2021-08-02T13:18:16Z

 ORT_ABORT_ON_ERROR(OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0));
不对的，我加了这句话后，崩溃：
跟cuda cudnn什么的怎么匹配？

应该也不能完全说不对的，应该还是用法有些问题。我这版代码是暂时没有做GPU兼容的，应该有些地方需要你去修改。首先你需要确保你用的GPU版本的库以及对应CUDA版本是对的。我们可以从源码来分析一下用法。在onnxruntime_c_api.cc中有一段：

# include "core/session/onnxruntime_c_api.h"
// ...省略一部分头文件
#ifdef USE_CUDA
#include "core/providers/cuda/cuda_provider_factory.h"
#endif

可以看到如果是GPU版本定义了USE_CUDA 则会引入cuda_provider_factory.h. 这个头文件放的就是OrtSessionOptionsAppendExecutionProvider_CUDA的函数签名：

/**
 * \param device_id cuda device id, starts from zero.
 */
ORT_API_STATUS(OrtSessionOptionsAppendExecutionProvider_CUDA, _In_ OrtSessionOptions* options, int device_id);

我们看到这里需要输入的是一个OrtSessionOptions指针，而并非Ort::SessionOptions. 但事实上：

struct SessionOptions : Base<OrtSessionOptions> {};
struct Base<const T> {
  using contained_type = const T;

  Base() = default;
  Base(const T* p) : p_{p} {
    if (!p)
      ORT_CXX_API_THROW("Invalid instance ptr", ORT_INVALID_ARGUMENT);
  }
  ~Base() = default;

  operator const T*() const { return p_; }  // 注意这里重载了T*()的类型转换函数。
// ...
}

由于Ort::SessionOptions重载了类型转换函数，所以可以直接使用OrtSessionOptionsAppendExecutionProvider_CUDA接口，在传参时会通过 (OrtSessionOptions*)session_options 对 session_options进行隐式转换。另一种方式就是使用C++封装了一层的接口。他们的本质上是一样的。在onnxruntime_cxx_inline.h中：

inline SessionOptions& SessionOptions::AppendExecutionProvider_CUDA(const OrtCUDAProviderOptions& provider_options) {
  // 这个指针p_就是 OrtSessionOptions*, OrtCUDAProviderOptions在前面的分析已经讲过了，就不重复了
  ThrowOnError(GetApi().SessionOptionsAppendExecutionProvider_CUDA(p_, &provider_options));
  return *this;
}

关于Ort::Session、Ort::SessionOptions、Ort::Env的更多分析，你可以参考 #8 , 这位小伙伴贴出了一份 GPU版本的代码，你可以看下。或者你可以把error具体的log发上来，这样我们能清楚具体发生了什么问题。至于一些环境问题，可能需要小伙伴你自己解决一下了~ good luck ~ 🙃🙃🙃

xinsuinizhuan · 2021-08-02T13:49:20Z

按照上面那个小伙伴说的：OrtSessionOptionsAppendExecutionProvider_CUDA这个接口没找到，没法用，
if (m_gpuIdx != -1)
{
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, m_gpuIdx));
}

xinsuinizhuan · 2021-08-02T13:49:59Z

我想问一下，模型导出的时候，有没有cpu和gpu之分，导出的cpu模型，gpu不能加载？

xinsuinizhuan · 2021-08-02T13:53:52Z

我的环境：cuda 11.3 cudnn 8.0.1.6, vs2019 onnxruntime 1.8.1 gpu版本
报错：

帮忙看看，能不能找到点头绪？

DefTruth · 2021-08-02T13:56:43Z

我想问一下，模型导出的时候，有没有cpu和gpu之分，导出的cpu模型，gpu不能加载？

这个没有cpu和gpu区别的，都是可以用的。你现在用的是GPU版本的库和头文件吗？你直接导入这些头文件试试？onnxruntime_c_api.h已经是被包含在onnxruntime_cxx_api.h里面的， OrtSessionOptionsAppendExecutionProvider_CUDA这个C API应该能找到才对啊。你这有点奇怪

#include <onnxruntime/core/providers/cuda/cuda_provider_factory.h>
#include <onnxruntime/core/session/onnxruntime_cxx_api.h>

xinsuinizhuan · 2021-08-02T14:00:34Z

我想问一下，模型导出的时候，有没有cpu和gpu之分，导出的cpu模型，gpu不能加载？

这个没有cpu和gpu区别的，都是可以用的。你现在用的是GPU版本的库和头文件吗？你直接导入这些头文件试试？onnxruntime_c_api.h已经是被包含在onnxruntime_cxx_api.h里面的， OrtSessionOptionsAppendExecutionProvider_CUDA这个C API应该能找到才对啊。你这有点奇怪
#include <onnxruntime/core/providers/cuda/cuda_provider_factory.h>
#include <onnxruntime/core/session/onnxruntime_cxx_api.h>

https://github.com/microsoft/onnxruntime/releases 我是从这里下载的，这里下载的，貌似就没有cuda那个文件夹

DefTruth · 2021-08-02T14:00:37Z

我的环境：cuda 11.3 cudnn 8.0.1.6, vs2019 onnxruntime 1.8.1 gpu版本
帮忙看看，能不能找到点头绪？

你电脑确定是NVIDIA的显卡吗？只能是N卡哦

xinsuinizhuan · 2021-08-02T14:02:26Z

显卡

xinsuinizhuan · 2021-08-02T14:08:29Z

非常感谢，可以了。哎。
头文件添加：

#include "onnxruntime_c_api.h"

#ifdef USE_CUDA
#include "cuda_provider_factory.h"
#endif

然后：
initialize_handler()中，添加：
#ifdef USE_CUDA
//OrtCUDAProviderOptions provider_options; // C接口
//session_options.AppendExecutionProvider_CUDA(provider_options);
//OrtCUDAProviderOptions options;
//options.device_id = 0;
//options.arena_extend_strategy = 0;
//options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::EXHAUSTIVE;
//options.do_copy_in_default_stream = 1;
//session_options.AppendExecutionProvider_CUDA(options);

OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
#endif

DefTruth · 2021-08-02T14:08:50Z

我想问一下，模型导出的时候，有没有cpu和gpu之分，导出的cpu模型，gpu不能加载？

这个没有cpu和gpu区别的，都是可以用的。你现在用的是GPU版本的库和头文件吗？你直接导入这些头文件试试？onnxruntime_c_api.h已经是被包含在onnxruntime_cxx_api.h里面的， OrtSessionOptionsAppendExecutionProvider_CUDA这个C API应该能找到才对啊。你这有点奇怪
#include <onnxruntime/core/providers/cuda/cuda_provider_factory.h>
#include <onnxruntime/core/session/onnxruntime_cxx_api.h>
https://github.com/microsoft/onnxruntime/releases 我是从这里下载的，这里下载的，貌似就没有cuda那个文件夹

我看到是有cuda_provider_factory.h的，应该是一样。官方包里面的头文件目录结构和自己编的是不一样的。官方包里面的include是这样的。

xinsuinizhuan · 2021-08-02T14:09:58Z

gpu运行时间22ms：

cpu运行时间97ms

xinsuinizhuan · 2021-08-02T14:10:39Z

感谢。感谢。就是，其他两种方式不行，还必须就得OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);

DefTruth · 2021-08-02T14:10:56Z

非常感谢，可以了。哎。

你太艰难了~ 哈哈 ~

xinsuinizhuan · 2021-08-02T14:11:34Z

没事，只要弄成了，再难都值得。哈哈。你可以把这一句加上了，兼容gpu。

DefTruth · 2021-08-02T14:12:09Z

没事，只要弄成了，再难都值得。哈哈。你可以把这一句加上了，兼容gpu。

get ~

xinsuinizhuan · 2021-08-02T14:12:28Z

只是，接下来，希望把yolor模型进行兼容吧

DefTruth · 2021-08-02T14:13:29Z

只是，接下来，希望把yolor模型进行兼容吧

之后看时间允不允许，平时工作也比较忙。不过这个项目应该会长期维护~

xinsuinizhuan · 2021-08-02T14:14:34Z

嗯。加油。希望可以整个qq群吧，这样交流起来比较方便点。

DefTruth · 2021-08-02T14:18:28Z

嗯。加油。希望可以整个qq群吧，这样交流起来比较方便点。

精力有限，暂时没有这个计划哈哈~

xinsuinizhuan · 2021-08-02T14:31:14Z

另外，如果模型输入，不是640*640，怎么修改？

DefTruth · 2021-08-02T14:38:39Z

另外，如果模型输入，不是640*640，怎么修改？

看这个issue #9 ，这里面有讨论这个问题，yolox.cpp现在兼容非方形输入，不会有问题的，你需要下载Lite.AI最新版的代码，或者直接拷贝新的yolox.cpp 和 yolox.h 替换原来的即可。

hejian01 · 2022-03-10T13:38:12Z

非常感谢，可以了。哎。头文件添加：

#include “onnxruntime_c_api.h”

#ifdef USE_CUDA #include "cuda_provider_factory.h" #endif

然后： initialize_handler()中，添加： #ifdef USE_CUDA //OrtCUDAProviderOptions provider_options; // C接口 //session_options.AppendExecutionProvider_CUDA(provider_options); //OrtCUDAProviderOptions 选项； //options.device_id = 0; //options.arena_extend_strategy = 0; //options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::EXHAUSTIVE; //options.do_copy_in_default_stream = 1; //session_options.AppendExecutionProvider_CUDA(options);

OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0); ＃万一

#include <cuda_provider_factory.h> 加上这个头文件找不到以下文件

DefTruth · 2022-03-10T14:24:52Z

非常感谢，可以了。哎。头文件添加：
#include “onnxruntime_c_api.h”
#ifdef USE_CUDA #include "cuda_provider_factory.h" #endif
然后： initialize_handler()中，添加： #ifdef USE_CUDA //OrtCUDAProviderOptions provider_options; // C接口 //session_options.AppendExecutionProvider_CUDA(provider_options); //OrtCUDAProviderOptions 选项； //options.device_id = 0; //options.arena_extend_strategy = 0; //options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::EXHAUSTIVE; //options.do_copy_in_default_stream = 1; //session_options.AppendExecutionProvider_CUDA(options);
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0); ＃万一

#include <cuda_provider_factory.h> 加上这个头文件找不到以下文件

最新的代码已经修复了。你可以看一下ort_config.h和ort_defs.h的代码，编译CUDA版本前，现在可以指定-D ENABLE_ONNXRUNTIME_CUDA=ON，编译GPU版本。但前提是你下载的onnxruntime库是cuda版本，以及你的设备已经配置好相关的环境。windows下的编译可以参考：

🔥Windows10 VS2019 CUDA 11.1 配置lite.ai.toolkit库 #207

两个相关的头文件：

hejian01 · 2022-03-11T00:43:33Z

非常感谢，可以了。哎。头文件添加：
#include “onnxruntime_c_api.h”
#ifdef USE_CUDA #include "cuda_provider_factory.h" #endif
然后： initialize_handler()中，添加： #ifdef USE_CUDA //OrtCUDAProviderOptions provider_options; // C接口 //session_options.AppendExecutionProvider_CUDA(provider_options); //OrtCUDAProviderOptions 选项； //options.device_id = 0; //options.arena_extend_strategy = 0; //options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::EXHAUSTIVE; //options.do_copy_in_default_stream = 1; //session_options.AppendExecutionProvider_CUDA(options);
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0); ＃万一

#include <cuda_provider_factory.h> 加上这个头文件找不到以下文件

最新的代码已经修复了。你可以看一下ort_config.h和ort_defs.h的代码，编译CUDA版本前，现在可以指定-D ENABLE_ONNXRUNTIME_CUDA=ON，编译GPU版本。但前提是你下载的onnxruntime库是cuda版本，以及你的设备已经配置好相关的环境。windows下的编译可以参考：

🔥Windows10 VS2019 CUDA 11.1 配置lite.ai.toolkit库 #207

两个相关的头文件：

https://github.com/DefTruth/lite.ai.toolkit/blob/main/lite/ort/core/ort_defs.h

https://github.com/DefTruth/lite.ai.toolkit/blob/main/lite/ort/core/ort_config.h

感谢，我看完了所有的回复……或多或少的帮助我解决问题啦，哈哈哈

SonwYang · 2022-08-26T07:18:12Z

你好，请问下如果用onnxruntime-gpu的话，链接库的代码在哪里修改

github-actions · 2024-05-05T00:14:13Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2024-05-12T00:14:32Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

DefTruth added GPU question Further information is requested labels Aug 1, 2021

DefTruth closed this as completed Aug 2, 2021

DefTruth added a commit that referenced this issue Aug 2, 2021

add GPU Compatibility for CUDAExecutionProvider (#10)

ab42962

DefTruth reopened this Aug 2, 2021

DefTruth mentioned this issue Aug 2, 2021

在win10上怎么部署、测试？ #3

Closed

DefTruth added the Windows label Aug 2, 2021

DefTruth added a commit that referenced this issue Aug 4, 2021

add GPU Compatibility for CUDAExecutionProvider (#10)

059f74c

DefTruth added a commit that referenced this issue Aug 4, 2021

add GPU Compatibility for CUDAExecutionProvider (#10)

9598d8a

DefTruth added a commit that referenced this issue Aug 4, 2021

add GPU Compatibility for CUDAExecutionProvider (#10)

e2cba79

This was referenced Sep 28, 2021

你好，请问如何使用f16的模型？我使用f32的模型运行起来了，但是速度不是特别理想，3060ti没占满，并且cpu只用到了单个核心 DefTruth/RVM-Inference#2

Closed

Not working under windows 10 #32

Closed

This was referenced Oct 6, 2021

How to build for windows 10 ? Many errors. #39

Closed

👉公告: Mac/Linux/Windows/Android预编译库计划 #48

Closed

DefTruth mentioned this issue Oct 27, 2021

Win10 VS2019遇到 LNK1104 无法打开文件“opencv_highgui.lib” #83

Closed

DefTruth pinned this issue Feb 11, 2022

This was referenced Feb 16, 2022

window10下使用vs2019调用OnnxRuntime.dll时出错！ #195

Closed

👉Windows10 build error(For Windows users) #196

Closed

DefTruth unpinned this issue Feb 16, 2022

This was referenced Feb 23, 2022

Mac编译问题: ffmpeg依赖和opencv动态库 #203

Closed

🔥Windows10 VS2019 CUDA 11.1 配置lite.ai.toolkit库 #207

Closed

DefTruth self-assigned this Mar 26, 2022

DefTruth mentioned this issue Apr 5, 2022

Linux Build Error #271

Closed

github-actions bot added the stale label May 5, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2024

onnxruntime的gpu怎么支持？ #10

onnxruntime的gpu怎么支持？ #10

Comments

xinsuinizhuan commented Aug 1, 2021

DefTruth commented Aug 1, 2021 • edited Loading

xinsuinizhuan commented Aug 1, 2021

DefTruth commented Aug 1, 2021

xinsuinizhuan commented Aug 1, 2021

DefTruth commented Aug 1, 2021 • edited Loading

1VeniVediVeci1 commented Aug 1, 2021

1VeniVediVeci1 commented Aug 1, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021

DefTruth commented Aug 2, 2021

xinsuinizhuan commented Aug 2, 2021 • edited Loading

DefTruth commented Aug 2, 2021

hejian01 commented Mar 10, 2022

DefTruth commented Mar 10, 2022

hejian01 commented Mar 11, 2022

SonwYang commented Aug 26, 2022

github-actions bot commented May 5, 2024

github-actions bot commented May 12, 2024

DefTruth commented Aug 1, 2021 •

edited

Loading

DefTruth commented Aug 1, 2021 •

edited

Loading

xinsuinizhuan commented Aug 2, 2021 •

edited

Loading