-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomly distribute traffic across multiple workers of the same model #2857
Comments
Launch the same model with replica=2, the model will have 2 replicas on 2 workers. |
In my situation, I intend to launch multiple GPU Docker instances, each automatically initiating one xinference worker. Is this scenario suitable for utilizing the replicas configuration? |
That should work well. |
start supervisor:
start worker 2:
How can I modify my deployment method? |
I have tested that if all GPU Docker instances are ready and all workers have started, then launching the model once by setting replica to 2 works. However, in my scenario, I want to dynamically add workers to an existing model. Is there any method to achieve this? |
Oh, you mean dynamically scale replica, e.g. from 1 to 2 then to 3? |
Yes, the replica may need to be adjusted dynamically after the initial model launch due to traffic. I am hoping for support to add/delete workers and increase/decrease model replicas dynamically after the first model launch. |
Sorry this is the functionality of enterprise version. |
Got it, thank you for your kind reply. |
Feature request / 功能建议
I have deployed one supervisor and two qwen2-vl-7b-instruct workers. However, I've noticed that currently clients can only query by model_id. I would like to query models by name, such as qwen2-vl-7b-instruct, and randomly distribute traffic across multiple workers of the same model. Is this currently supported?

Motivation / 动机
Additional workers should be deployed for models that require more resources.
Your contribution / 您的贡献
None
The text was updated successfully, but these errors were encountered: