Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

我从代码里只看到对rxi_tensor的复制,没看到使用。 #25

Open
yyl9510 opened this issue Mar 14, 2022 · 4 comments
Open

Comments

@yyl9510
Copy link

yyl9510 commented Mar 14, 2022

std::vector<const char*> input_node_names = { "src", "r1i", "r2i", "r3i", "r4i" };
std::vector<const char*> output_node_names = { "fgr", "pha", "r1o", "r2o", "r3o", "r4o" };

r1i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r1i");
r2i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r2i");
r3i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r3i");
r4i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r4i");

mnn_interpreter->resizeTensor(r1i_tensor, { 1, 16, 135, 240 });
mnn_interpreter->resizeTensor(r2i_tensor, { 1, 20, 68, 120 });
mnn_interpreter->resizeTensor(r3i_tensor, { 1, 40, 34, 60 });
mnn_interpreter->resizeTensor(r4i_tensor, { 1, 64, 17, 30 });

std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f);
std::fill_n(r2i_tensor->host<float>(), r2i_size, 0.f);
std::fill_n(r3i_tensor->host<float>(), r3i_size, 0.f);
std::fill_n(r4i_tensor->host<float>(), r4i_size, 0.f);

auto device_r1o_ptr = output_tensors.at("r1o");
auto device_r2o_ptr = output_tensors.at("r2o");
auto device_r3o_ptr = output_tensors.at("r3o");
auto device_r4o_ptr = output_tensors.at("r4o");

device_r1o_ptr->copyToHostTensor(r1i_tensor);
device_r2o_ptr->copyToHostTensor(r2i_tensor);
device_r3o_ptr->copyToHostTensor(r3i_tensor);
device_r4o_ptr->copyToHostTensor(r4i_tensor);

input_node_names 和 output_node_names 没看到在哪里用到了,请问这些的rxi_tensor是有什么用处吗?
然后我这里初始化的时候报错了
std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f);
最后报错的地方在这里

// Copyright (c) Microsoft Corporation.
// xutility.h
template
void _Fill_zero_memset(_DestTy* const _Dest, const size_t _Count) {
_CSTD memset(_Dest, 0, _Count * sizeof(_DestTy));
}

这只是一个初始化,怎么看都不应该有问题啊,为什么会报错呢。我把它注掉程序就可以正常运行了,不知道可能是什么原因呢。
然后这边用vulkan的时间反而比CPU更长了,不知道为什么呢?我看python的GPU很快啊

@DefTruth
Copy link
Owner

nput_node_names 和 output_node_names 没有用到,只是提示。rxi_tensor是必须的,是rvm模型的上下文隐向量,具体原理需要看rvm的论文来理解。性能方面,可能是io问题?不太清楚了....

@yyl9510
Copy link
Author

yyl9510 commented Mar 14, 2022

ok,我很疑惑的一点在于,我之前用cpu跑都是瞬间100%。但是现在用了gpu了,跑的过程中cpu是10%几和正常时候一样,gpu的核显是1%,独显1050Ti是0%,也和往常一样。为啥感觉虽然成功调用了vulkna gpu,但是GPU却都没全力去跑呢?是IO这里把它卡住了吗?但是好歹也会有几个峰值吧我觉得?

@DefTruth
Copy link
Owner

可能是cpu和gpu的io花的时间较长,rxi_tensor这几个变量我是定义在cpu的(host),目前没有试过定义在gpu,你可以看看怎么改在gpu。

@DefTruth
Copy link
Owner

可以试一下onnxruntime的gpu版本

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants