Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mnn推理时如何降低内存占用 #3086

Open
lxh0510 opened this issue Nov 16, 2024 · 2 comments
Open

mnn推理时如何降低内存占用 #3086

lxh0510 opened this issue Nov 16, 2024 · 2 comments
Labels
User The user ask question about how to use. Or don't use MNN correctly and cause bug.

Comments

@lxh0510
Copy link

lxh0510 commented Nov 16, 2024

目前使用--fp16将模型大小降低了一倍,但运行过程中内存并无变化,请问该如何修改以降低内存呢

@jxt1234 jxt1234 added the User The user ask question about how to use. Or don't use MNN correctly and cause bug. label Nov 18, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 18, 2024

模型转换的 --fp16 与是否使用 fp16 推理没有关联,使用 fp16 的开关是:编译 mnn 打开 MNN_ARM82 ,创建 session 或者 module 时,precision 设成 low ,这样如果设备支持便会启用 fp16 优化

此外可以考虑用动态量化的方式:

  1. 加 --weightQuantBits=8 量化模型
  2. 编译 mnn 打开 MNN_LOW_MEMORY 宏
  3. 设置 memory = low

@lxh0510
Copy link
Author

lxh0510 commented Nov 21, 2024

谢谢您,我还想问一下,使用动态量化将模型转化为int8后,是不是也是只有模型大小减少,但推理时会反量化,运行内存并没有变化呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User The user ask question about how to use. Or don't use MNN correctly and cause bug.
Projects
None yet
Development

No branches or pull requests

2 participants