You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello CogAgent Team,
I hope this message finds you well. I am currently working with the cogagent-9b-20241220 model and have noted that there is currently no support for inference with the vLLM framework, as mentioned in the README file.
I would like to inquire if the vLLM inference process for the cogagent model is similar to that of the glm-4v-9b model. If there are significant differences, could you please provide guidance on what modifications might be necessary to adapt the cogagent model for vLLM inference?
Specifically, I am interested in understanding:
The differences in model inputs and outputs between the two models.
Any changes required in the model architecture or configuration for vLLM compatibility.
Any known issues or limitations when adapting cogagent for vLLM inference.
Thank you in advance for your assistance. I look forward to your insights and any recommendations you might have for enabling vLLM inference with the cogagent model.
Motivation / 动机
from PIL import Image
from vllm import LLM, SamplingParams
model_name = "THUDM/glm-4v-9b"
llm = LLM(model=model_name,
tensor_parallel_size=1,
max_model_len=8192,
trust_remote_code=True,
enforce_eager=True)
stop_token_ids = [151329, 151336, 151338]
sampling_params = SamplingParams(temperature=0.2,
max_tokens=1024,
stop_token_ids=stop_token_ids)
prompt = "What's the content of the image?"
image = Image.open("your image").convert('RGB')
inputs = {
"prompt": prompt,
"multi_modal_data": {
"image": image
},
}
outputs = llm.generate(inputs, sampling_params=sampling_params)
for o in outputs:
generated_text = o.outputs[0].text
print(generated_text)
Your contribution / 您的贡献
xx
The text was updated successfully, but these errors were encountered:
Hello, thank you for your attention to the work of CogAgent's vLLM. Regarding the differences between modeling_chatglm.py of THUDM/cogagent-9b-20241220 and GLM-4v-9b, after auditing the open-source code of both models, I found that the main differences lie in two aspects.
CogAgent uses a non-interleaved rotary position encoding method, while GLM-4v-9b uses an interleaved method.
The position_ids of GLM-4v-9b insert num_patches identical characters in the middle, and CogAgent uses an incremental sequence of [0, len(input_id) + num_patches - 1] length, specifically num_patches = (image_size // patch_size // 2) ** 2.
These differences result in their reasoning code in vLLM being different. In the multi-modal model code of vLLM (see this PR vllm-project/vllm#11742), CogAgent is mainly manifested as setting is_neox_style to True, which will allow the smooth use of vLLM for reasoning, however, the reasoning performance is not as good as directly using transformers. We are exploring ways to improve the reasoning performance of CogAgent in vLLM.
If you wish to obtain the minimal demo for CogAgent with vLLM inference, you may refer to the code below. As stated in the README, CogAgent requires strict control over input fields, including task, platform, format, and history_step, and an image is mandatory. The output is also formatted, and you can refer to the format specifications for guidance.
Feature request / 功能建议
Hello CogAgent Team,
I hope this message finds you well. I am currently working with the cogagent-9b-20241220 model and have noted that there is currently no support for inference with the vLLM framework, as mentioned in the README file.
I would like to inquire if the vLLM inference process for the cogagent model is similar to that of the glm-4v-9b model. If there are significant differences, could you please provide guidance on what modifications might be necessary to adapt the cogagent model for vLLM inference?
Specifically, I am interested in understanding:
The differences in model inputs and outputs between the two models.
Any changes required in the model architecture or configuration for vLLM compatibility.
Any known issues or limitations when adapting cogagent for vLLM inference.
Thank you in advance for your assistance. I look forward to your insights and any recommendations you might have for enabling vLLM inference with the cogagent model.
Motivation / 动机
Your contribution / 您的贡献
xx
The text was updated successfully, but these errors were encountered: