sending batch of requests #3541
-
I'm working with a lot of different LLMs liteLLM is supporting and it makes it very easy for me to set up the connections to these LLMs using the router, I have a yaml config file with all the api keys, endpoints, tpm & rpm settings, and I just pass it to router constructor as model_list. However, I'm looking for an easy way to send many requests to the same / different LLMs, and to get responses as soon as possible (depending on rpm & tpm & other constraints). I would also like to have an estimation of ETA and a progress bar. I've seen that there is batch completion, but it looks like it's separated from the router. I can't seem to use batch completion with router. Does anybody can suggest a right way of using liteLLM in my use case? Should I just use the router and implement the batching, ETA & progress bar myself with tqdm or something? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@anterart you can now do this on the router too router = litellm.Router(
model_list=[
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "gpt-3.5-turbo",
},
},
{
"model_name": "groq-llama",
"litellm_params": {
"model": "groq/llama3-8b-8192",
},
},
]
)
response = await router.abatch_completion(
models=["gpt-3.5-turbo", "groq-llama"],
messages=[
{"role": "user", "content": "is litellm becoming a better product ?"}
],
max_tokens=15,
) |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick reply @ishaan-jaff ! Azure OpenAI gpt4-turbo: VertexAI gemini-1: However, it works when I'm using my own code to send batches:
|
Beta Was this translation helpful? Give feedback.
@anterart you can now do this on the router too