Replies: 1 comment 5 replies
-
Yes - use |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am trying out LiteLLM for load balancing the requests to Azure OpenAI.
I understand that you can have multiple deployments of models and use LiteLLM for routing your requests to certain deployments.
However, is it possible to have one deployment of OpenAI model, then have multiple requests being sent and then LiteLLM with usage of Redis can throttle the requests to avoid the RateLimitErrors? So that, we don't need to retry the requests and we could have some waiting time to make sure that the request will pass.
Thanks for the reply in advance
Beta Was this translation helpful? Give feedback.
All reactions