We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::vector<const char*> input_node_names = { "src", "r1i", "r2i", "r3i", "r4i" }; std::vector<const char*> output_node_names = { "fgr", "pha", "r1o", "r2o", "r3o", "r4o" }; r1i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r1i"); r2i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r2i"); r3i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r3i"); r4i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r4i"); mnn_interpreter->resizeTensor(r1i_tensor, { 1, 16, 135, 240 }); mnn_interpreter->resizeTensor(r2i_tensor, { 1, 20, 68, 120 }); mnn_interpreter->resizeTensor(r3i_tensor, { 1, 40, 34, 60 }); mnn_interpreter->resizeTensor(r4i_tensor, { 1, 64, 17, 30 }); std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f); std::fill_n(r2i_tensor->host<float>(), r2i_size, 0.f); std::fill_n(r3i_tensor->host<float>(), r3i_size, 0.f); std::fill_n(r4i_tensor->host<float>(), r4i_size, 0.f); auto device_r1o_ptr = output_tensors.at("r1o"); auto device_r2o_ptr = output_tensors.at("r2o"); auto device_r3o_ptr = output_tensors.at("r3o"); auto device_r4o_ptr = output_tensors.at("r4o"); device_r1o_ptr->copyToHostTensor(r1i_tensor); device_r2o_ptr->copyToHostTensor(r2i_tensor); device_r3o_ptr->copyToHostTensor(r3i_tensor); device_r4o_ptr->copyToHostTensor(r4i_tensor);
input_node_names 和 output_node_names 没看到在哪里用到了,请问这些的rxi_tensor是有什么用处吗? 然后我这里初始化的时候报错了 std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f); 最后报错的地方在这里
std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f);
// Copyright (c) Microsoft Corporation. // xutility.h template void _Fill_zero_memset(_DestTy* const _Dest, const size_t _Count) { _CSTD memset(_Dest, 0, _Count * sizeof(_DestTy)); }
这只是一个初始化,怎么看都不应该有问题啊,为什么会报错呢。我把它注掉程序就可以正常运行了,不知道可能是什么原因呢。 然后这边用vulkan的时间反而比CPU更长了,不知道为什么呢?我看python的GPU很快啊
The text was updated successfully, but these errors were encountered:
nput_node_names 和 output_node_names 没有用到,只是提示。rxi_tensor是必须的,是rvm模型的上下文隐向量,具体原理需要看rvm的论文来理解。性能方面,可能是io问题?不太清楚了....
Sorry, something went wrong.
ok,我很疑惑的一点在于,我之前用cpu跑都是瞬间100%。但是现在用了gpu了,跑的过程中cpu是10%几和正常时候一样,gpu的核显是1%,独显1050Ti是0%,也和往常一样。为啥感觉虽然成功调用了vulkna gpu,但是GPU却都没全力去跑呢?是IO这里把它卡住了吗?但是好歹也会有几个峰值吧我觉得?
可能是cpu和gpu的io花的时间较长,rxi_tensor这几个变量我是定义在cpu的(host),目前没有试过定义在gpu,你可以看看怎么改在gpu。
可以试一下onnxruntime的gpu版本
No branches or pull requests
input_node_names 和 output_node_names 没看到在哪里用到了,请问这些的rxi_tensor是有什么用处吗?
然后我这里初始化的时候报错了
std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f);
最后报错的地方在这里
// Copyright (c) Microsoft Corporation.
// xutility.h
template
void _Fill_zero_memset(_DestTy* const _Dest, const size_t _Count) {
_CSTD memset(_Dest, 0, _Count * sizeof(_DestTy));
}
这只是一个初始化,怎么看都不应该有问题啊,为什么会报错呢。我把它注掉程序就可以正常运行了,不知道可能是什么原因呢。
然后这边用vulkan的时间反而比CPU更长了,不知道为什么呢?我看python的GPU很快啊
The text was updated successfully, but these errors were encountered: