[Bug] 如何使用python代码加载turbomind模型并对话(菜鸡请求大佬支援) #2098
Replies: 4 comments 1 reply
-
建议使用 pipeline 接口,而不是 turbomind 的接口。 |
Beta Was this translation helpful? Give feedback.
-
pipeline可以实现多轮对话吗,我好像没有看到 |
Beta Was this translation helpful? Give feedback.
-
Yes. Please read the "An example for OpenAI format prompt input:" example in the LLM pipeline user guide |
Beta Was this translation helpful? Give feedback.
-
@quanfeifan I think this is not a bug, but rather a QA. It has currently been changed to discussion for further questions to be discussed in the discussion. |
Beta Was this translation helpful? Give feedback.
-
Checklist
Describe the bug
我参考了https://github.com/InternLM/lmdeploy/issues/1835#issue-2369484615。
看了EngineOutput结构体,但是不太会改代码,想请大佬帮我纠正一下,如何才能正确输入与输出。我所用的模型是经过4bit量化的internlm2-7b
另外,我的输入是比较简单的”hello“,他的输出却要经过好几秒才给出,step=520才有结果,这正常吗?
Reproduction
import json
from lmdeploy import turbomind as tm
tm_model = tm.TurboMind.from_pretrained('/root/autodl-tmp/internlm2-4b')
generator = tm_model.create_instance()
import tool
def chat(prompt):
input_ids = tm_model.tokenizer.encode(prompt)
for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
res = outputs.token_ids
system_prompt = tool.system_prompt
system_prompt = "hello"
system_prompt_template = """<|im_start|>你是书生。<|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
"""
prompt, response, response_dict = chat(system_prompt_template.format(system_prompt))
print(response)
Environment
Error traceback
No response
Beta Was this translation helpful? Give feedback.
All reactions