lmdeploy serve is more than two times slower than normal transformers code #2248
Unanswered
paniabhisek
asked this question in
Q&A
Replies: 1 comment 7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using phi-3-vision model. When I run the example snippet from huggingface, it takes around 5 seconds for the response.
However if I serve the model using
lmdeploy serve api_server microsoft/Phi-3-vision-128k-instruct --server-port 23333
. Then use the following code to get responseIt takes around 10 to 12 seconds. Why is there so much delay.
In both of the cases, the same image and query is being used. And I have disabled flash_attention and using eager mode in the
config.json
Beta Was this translation helpful? Give feedback.
All reactions