-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 升级0.2.0 后,加载 qwen-14b 或者 internlm2-chat-20b 的awq量化模型报显存错误 #998
Comments
大佬,我也碰到一样的问题了,想请教一下你,用上一个版本可以正常serving吗?另外在部署 |
我今天早上还在用 0.1.0, 然后发现0.2.0才支持`internlm2-chat-20b-4bits”,所以刚刚升级了,结果就报错了,连之前能运行的qwen14量化模型也报这个错,然后我就降级回0.1.0,qwen就可以了 |
我觉得应该不是 |
那这个文件,怎么获取呢?我看之前的模型也没有这个。 另外,最近有fix OOM的MR,说不定就是fix这个问题的。#973 |
暂时没搞懂什么问题,先不测internlm2-chat-20b-4bits的效果了,等官方的大佬看看怎么修,我这边先保持0.1.0跑吧 |
什么型号的显卡?用的是哪个命令? |
A800, model_path = '/home/mingqiang/model/model_file/qwen-14b-chat-finetune-4bit/' |
我的是4090. 命令参考的 HF(https://huggingface.co/internlm/internlm2-chat-20b-4bits): lmdeploy serve api_server internlm/internlm2-chat-20b-4bits --backend turbomind --model-format awq |
加一个选项: --cache-max-entry-count 0.4 试试。 |
internlm2-chat-20b模型上周更新了special_tokens,对应的要使用 lmdeploy v0.2.1 |
我就是刚刚更了 0.2.1 跑出来的:
LMDeploy: 0.2.1+ |
已经可以了,我刚刚上去更新了 internlm2-chat-20b 模型里面的tokenizer_config.json |
Checklist
Describe the bug
Exception in thread Thread-132 (_create_model_instance):
Traceback (most recent call last):
File "/home/mingqiang/.conda/envs/qwen/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
self.run()
File "/home/mingqiang/.conda/envs/qwen/lib/python3.10/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
File "/home/mingqiang/.conda/envs/qwen/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 486, in _create_model_instance
model_inst = self.tm_model.model_comm.create_model_instance(
RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/allocator.h:231
Reproduction
import lmdeploy
model_path = '/home/mingqiang/model/model_file/qwen-14b-chat-finetune-4bit/'
pipe = lmdeploy.pipeline(model_path, model_name='qwen-14b')
response = pipe(["你是谁", "你在哪"])
print(response)
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: