-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Cannot Run Qwen2 Embedding Model on Gaudi #583
Comments
Hi @rvoleti89 from the collect env printout I see that you are using vllm version different from what is support in 1.18. Can you try: and see if that helps? |
I have tried the v0.5.3.post1+Gaudi-1.18.0 tagged version as well. I don't believe embeddings were supported at all in that version. This version I am trying here was the latest build off of the |
To be clear, this is the commit I used for the build: My commit hash in the printout is different as I forked it to a personal repository to automate the build with a CI/CD pipeline, but no other modifications were made. EDIT: I also tried with a build from this latest commit, same issue: 8754e17 |
@rvoleti89 thanks for reporting the issue. We can reproduce it with the sha1 id you mentioned. In vllm upstream v0.6.4.post1 , the embedding task is using embedding_model_runner.py to handle it. The vllm-fork hpu_model_runner.py is missing embedding task handle. We will find a solution to support it, and keep you updated. |
@libinta thank you for your responsiveness regarding this issue. I look forward to future updates regarding embedding support and take a look at the v0.6.4.post1 implementation upstream to see if there's something I can do in the meantime to find a workaround. |
@libinta Is there any update on embeddings model support in vLLM for Gaudi? Since I opened this issue, an official Will future support for embeddings require 1.19.0, or can we expect it on 1.18.0 also? Is there some sort of timeline for this? Much appreciated. Thanks! |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I am trying to serve the following embedding model in a Kubernetes pod on a Gaudi2 Node with Habana 1.18:
Alibaba-NLP/gte-Qwen2-7B-instruct
The pod is running the model with the following serving command:
vllm serve Alibaba-NLP/gte-Qwen2-7B-instruct --task embedding --trust-remote-code
I see the server start successfully, load the weights, and complete warm up as expected. I have set up a k8s service to access the pod externally via curl or python requests with the OpenAI client.
Example curl request:
This unfortunately just crashes my pod with the following TraceBack in server logs:
As a sanity check, I ran this exact same command with vLLM
v0.6.4.post1
on a 20 GB Nvidia A100 MIG slice, and it worked perfectly, so this issue seems Gaudi specific to me.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: