Replies: 1 comment
-
Hi! This is actually possible. You probably won't need this after such a long time but I decided to share my work in case somebody lands here by googling the issue, like I did. Here's a working code snippet that I created based on this answer
But, I discovered that if I want to get more control over how UNIQUE or DIVERSE the answers are, it's better to ask for a list of answers (just like you do in the chat interface). This can be further parsed with Pydantic to get a Python list to work with. Here's the code:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked
Feature request
In the OpenAI API, the chat.completions.create function has a parameter called "n" which controls the number of generations in response to the given prompt. Because the output is non-deterministic there are many applications in which you'd like to generate and compare multiple responses to the same input. Here is a basic example of how someone might use this parameter using openai (no langchain):
Note this would return a list of 5 different responses to the prompt
Langchain doesn't natively support this. The workaround is to use the "batch" method for a ChatModel and copy the same prompt multiple times. But the openai models don't override the batch implementation in the default langchain runnable so this means separate calls are being made to the openai API which is not necessary. Here is an example of how one would do this in langchain:
Motivation
This is a common problem, which is why the OpenAI API supports it natively. It is used in research, in metric computation (for example via self-consistency), etc.
I profiled the examples I gave in the request description - doing the batch calls with langchain increases latency by 20% !!!
Proposal (If applicable)
Probably the best way to tackle this would be to add an "n" parameter to the invoke method for runnables. By default, the method would check if n > 1, if not, do the same thing it already does now. However, if it is > 1, then call a separate method. In the default implementation of a runnable, this separate method would use the batching workaround described above. However, specific models, like the openAI, models would override this method to use the "n" feature in openai API.
Beta Was this translation helpful? Give feedback.
All reactions