sending batch of requests #3541

anterart · 2024-05-09T08:08:37Z

anterart
May 9, 2024

I'm working with a lot of different LLMs liteLLM is supporting and it makes it very easy for me to set up the connections to these LLMs using the router, I have a yaml config file with all the api keys, endpoints, tpm & rpm settings, and I just pass it to router constructor as model_list.

However, I'm looking for an easy way to send many requests to the same / different LLMs, and to get responses as soon as possible (depending on rpm & tpm & other constraints). I would also like to have an estimation of ETA and a progress bar.

I've seen that there is batch completion, but it looks like it's separated from the router. I can't seem to use batch completion with router.
I also thought of using the batch completion with the proxy server, but in the documentation it is not recommended to use the sdk to send requests to the proxy server.

Does anybody can suggest a right way of using liteLLM in my use case? Should I just use the router and implement the batching, ETA & progress bar myself with tqdm or something?

Answered by ishaan-jaff

May 11, 2024

@anterart you can now do this on the router too

router = litellm.Router(
        model_list=[
            {
                "model_name": "gpt-3.5-turbo",
                "litellm_params": {
                    "model": "gpt-3.5-turbo",
                },
            },
            {
                "model_name": "groq-llama",
                "litellm_params": {
                    "model": "groq/llama3-8b-8192",
                },
            },
        ]
    )

    response = await router.abatch_completion(
        models=["gpt-3.5-turbo", "groq-llama"],
        messages=[
            {"role": "user", "content": "is litellm becoming a better product ?"}
        ],
        max_tokens=15,…

View full answer

ishaan-jaff · 2024-05-11T21:40:38Z

ishaan-jaff
May 11, 2024
Maintainer

@anterart you can now do this on the router too

router = litellm.Router(
        model_list=[
            {
                "model_name": "gpt-3.5-turbo",
                "litellm_params": {
                    "model": "gpt-3.5-turbo",
                },
            },
            {
                "model_name": "groq-llama",
                "litellm_params": {
                    "model": "groq/llama3-8b-8192",
                },
            },
        ]
    )

    response = await router.abatch_completion(
        models=["gpt-3.5-turbo", "groq-llama"],
        messages=[
            {"role": "user", "content": "is litellm becoming a better product ?"}
        ],
        max_tokens=15,
    )

0 replies

anterart · 2024-05-12T12:41:16Z

anterart
May 12, 2024
Author

Thanks for the quick reply @ishaan-jaff !
I tried the new method with Azure OpenAI and vertexai deployments and got errors:

Azure OpenAI gpt4-turbo:
litellm.exceptions.APIError('AzureException - upstream connect error or disconnect/reset before headers. reset reason: connection termination \nModel:

VertexAI gemini-1:
litellm.exceptions.APIError('VertexAIException - VertexAIException - list indices must be integers or slices, not str

However, it works when I'm using my own code to send batches:

answers_dict = {}
model_names = ['gemini-1.5']
for model_name in model_names:
    print(model_name)
    tasks = [router.acompletion(model=model_name, messages=message) for message in final]
    results = await tqdm.gather(*tasks)
    answers = [results[i].choices[0].message.content for i in range(len(results))]
    answers_dict[f'{model_name}'] = answers

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sending batch of requests #3541

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

sending batch of requests #3541

anterart May 9, 2024

Replies: 2 comments

ishaan-jaff May 11, 2024 Maintainer

anterart May 12, 2024 Author

anterart
May 9, 2024

ishaan-jaff
May 11, 2024
Maintainer

anterart
May 12, 2024
Author