Support "n" generations for same prompt in OpenAI API, instead of requiring batching #21830

mikeFore4 · 2024-05-17T20:11:49Z

mikeFore4
May 17, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

In the OpenAI API, the chat.completions.create function has a parameter called "n" which controls the number of generations in response to the given prompt. Because the output is non-deterministic there are many applications in which you'd like to generate and compare multiple responses to the same input. Here is a basic example of how someone might use this parameter using openai (no langchain):

from openai import OpenAI

client = OpenAI()
completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Here is my example prompt"
                }
            ],
        model=model,
        n=5
        )

Note this would return a list of 5 different responses to the prompt

Langchain doesn't natively support this. The workaround is to use the "batch" method for a ChatModel and copy the same prompt multiple times. But the openai models don't override the batch implementation in the default langchain runnable so this means separate calls are being made to the openai API which is not necessary. Here is an example of how one would do this in langchain:

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
messages = [[HumanMessage("Here is my example prompt")]]*5
completion = llm.batch(messages)

Motivation

This is a common problem, which is why the OpenAI API supports it natively. It is used in research, in metric computation (for example via self-consistency), etc.

I profiled the examples I gave in the request description - doing the batch calls with langchain increases latency by 20% !!!

Proposal (If applicable)

Probably the best way to tackle this would be to add an "n" parameter to the invoke method for runnables. By default, the method would check if n > 1, if not, do the same thing it already does now. However, if it is > 1, then call a separate method. In the default implementation of a runnable, this separate method would use the batching workaround described above. However, specific models, like the openAI, models would override this method to use the "n" feature in openai API.

adam-sierakowski · 2024-10-29T10:55:50Z

adam-sierakowski
Oct 29, 2024

Hi! This is actually possible. You probably won't need this after such a long time but I decided to share my work in case somebody lands here by googling the issue, like I did. Here's a working code snippet that I created based on this answer

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

def get_batch_with_n(batch_size: int):
    model = ChatOpenAI(model="gpt-4o-mini", temperature=0.7, n=batch_size)
    prompt = ChatPromptTemplate.from_template("Tell me a joke about {subject}")
    parser = StrOutputParser()

    prompt_inputs = prompt.format_prompt(subject="Gordon Ramsay").to_messages()
    output = model._generate(prompt_inputs)

    str_responses = [parser.parse(response.message.content) for response in output.generations]

    return {
        "responses": str_responses,
        "all_count": len(str_responses),
        "unique_count": len(set(str_responses))
    }

But, I discovered that if I want to get more control over how UNIQUE or DIVERSE the answers are, it's better to ask for a list of answers (just like you do in the chat interface). This can be further parsed with Pydantic to get a Python list to work with. Here's the code:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

class Jokes(BaseModel):

    jokes: list = Field(description="List of strings, each containing one joke")

structured_llm = llm.with_structured_output(Jokes)

prompt = ChatPromptTemplate.from_template("Tell me {batch_size} jokes about {subject}")  # here you can specify that you want the to be unique, never repeat etc.

chain = prompt | structured_llm

response = chain.invoke(
        {
        "batch_size": 5,
        "subject": "Gordon Ramsay"
        }
    )

for joke in response.jokes:
    print(joke)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "n" generations for same prompt in OpenAI API, instead of requiring batching #21830

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Support "n" generations for same prompt in OpenAI API, instead of requiring batching #21830

mikeFore4 May 17, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 1 comment

adam-sierakowski Oct 29, 2024

mikeFore4
May 17, 2024

adam-sierakowski
Oct 29, 2024