我从代码里只看到对rxi_tensor的复制，没看到使用。 #25

yyl9510 · 2022-03-14T08:25:28Z

std::vector<const char*> input_node_names = { "src", "r1i", "r2i", "r3i", "r4i" };
std::vector<const char*> output_node_names = { "fgr", "pha", "r1o", "r2o", "r3o", "r4o" };

r1i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r1i");
r2i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r2i");
r3i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r3i");
r4i_tensor = mnn_interpreter->getSessionInput(mnn_session, "r4i");

mnn_interpreter->resizeTensor(r1i_tensor, { 1, 16, 135, 240 });
mnn_interpreter->resizeTensor(r2i_tensor, { 1, 20, 68, 120 });
mnn_interpreter->resizeTensor(r3i_tensor, { 1, 40, 34, 60 });
mnn_interpreter->resizeTensor(r4i_tensor, { 1, 64, 17, 30 });

std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f);
std::fill_n(r2i_tensor->host<float>(), r2i_size, 0.f);
std::fill_n(r3i_tensor->host<float>(), r3i_size, 0.f);
std::fill_n(r4i_tensor->host<float>(), r4i_size, 0.f);

auto device_r1o_ptr = output_tensors.at("r1o");
auto device_r2o_ptr = output_tensors.at("r2o");
auto device_r3o_ptr = output_tensors.at("r3o");
auto device_r4o_ptr = output_tensors.at("r4o");

device_r1o_ptr->copyToHostTensor(r1i_tensor);
device_r2o_ptr->copyToHostTensor(r2i_tensor);
device_r3o_ptr->copyToHostTensor(r3i_tensor);
device_r4o_ptr->copyToHostTensor(r4i_tensor);

input_node_names 和 output_node_names 没看到在哪里用到了，请问这些的rxi_tensor是有什么用处吗？
然后我这里初始化的时候报错了
std::fill_n(r1i_tensor->host<float>(), r1i_size, 0.f);
最后报错的地方在这里

// Copyright (c) Microsoft Corporation.
// xutility.h
template
void _Fill_zero_memset(_DestTy* const _Dest, const size_t _Count) {
_CSTD memset(_Dest, 0, _Count * sizeof(_DestTy));
}

这只是一个初始化，怎么看都不应该有问题啊，为什么会报错呢。我把它注掉程序就可以正常运行了，不知道可能是什么原因呢。
然后这边用vulkan的时间反而比CPU更长了，不知道为什么呢？我看python的GPU很快啊

The text was updated successfully, but these errors were encountered:

DefTruth · 2022-03-14T09:20:33Z

nput_node_names 和 output_node_names 没有用到，只是提示。rxi_tensor是必须的，是rvm模型的上下文隐向量，具体原理需要看rvm的论文来理解。性能方面，可能是io问题？不太清楚了....

yyl9510 · 2022-03-14T09:24:18Z

ok，我很疑惑的一点在于，我之前用cpu跑都是瞬间100%。但是现在用了gpu了，跑的过程中cpu是10%几和正常时候一样，gpu的核显是1%，独显1050Ti是0%，也和往常一样。为啥感觉虽然成功调用了vulkna gpu，但是GPU却都没全力去跑呢？是IO这里把它卡住了吗？但是好歹也会有几个峰值吧我觉得？

DefTruth · 2022-03-14T10:01:55Z

可能是cpu和gpu的io花的时间较长，rxi_tensor这几个变量我是定义在cpu的(host)，目前没有试过定义在gpu，你可以看看怎么改在gpu。

DefTruth · 2022-03-14T10:18:24Z

可以试一下onnxruntime的gpu版本

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

我从代码里只看到对rxi_tensor的复制，没看到使用。 #25

我从代码里只看到对rxi_tensor的复制，没看到使用。 #25

yyl9510 commented Mar 14, 2022

DefTruth commented Mar 14, 2022

yyl9510 commented Mar 14, 2022

DefTruth commented Mar 14, 2022

DefTruth commented Mar 14, 2022

我从代码里只看到对rxi_tensor的复制，没看到使用。 #25

我从代码里只看到对rxi_tensor的复制，没看到使用。 #25

Comments

yyl9510 commented Mar 14, 2022

DefTruth commented Mar 14, 2022

yyl9510 commented Mar 14, 2022

DefTruth commented Mar 14, 2022

DefTruth commented Mar 14, 2022