We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
验证chatglm2-6b和chatglm2-6b-int4都出现首token时延随输入长度成倍快速增长,从输入长度512到2048,首token时延从500ms增长至1.8s
输入部分应该是并行的,为什么增长会这么明显
tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path, trust_remote_code=True) base_model = AutoModel.from_pretrained(base_model_name_or_path, trust_remote_code=True, revision=True) model = PeftModel.from_pretrained(base_model, peft_model_id,torch_dtype=torch.float16)
str="测试文本" pt_data = tokenizer(str, return_tensors="pt", padding=True).to('cuda') gen_kwargs = {"max_length": pt_data["input_ids"].shape[-1] + 1, "num_beams": 1, "do_sample": False, "top_p": 0.8, "temperature": 0, "logits_processor": logits_processor}
outputs = model.generate(**pt_data, **gen_kwargs)
V100和T4两个GPU上都进行了验证
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is there an existing issue for this?
Current Behavior
验证chatglm2-6b和chatglm2-6b-int4都出现首token时延随输入长度成倍快速增长,从输入长度512到2048,首token时延从500ms增长至1.8s
Expected Behavior
输入部分应该是并行的,为什么增长会这么明显
Steps To Reproduce
tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path, trust_remote_code=True)
base_model = AutoModel.from_pretrained(base_model_name_or_path, trust_remote_code=True, revision=True)
model = PeftModel.from_pretrained(base_model, peft_model_id,torch_dtype=torch.float16)
str="测试文本"
pt_data = tokenizer(str, return_tensors="pt", padding=True).to('cuda')
gen_kwargs = {"max_length": pt_data["input_ids"].shape[-1] + 1, "num_beams": 1, "do_sample": False,
"top_p": 0.8,
"temperature": 0, "logits_processor": logits_processor}
outputs = model.generate(**pt_data, **gen_kwargs)
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: