Replies: 1 comment
-
🤖 Hello, Based on the information you've provided and the context from the LangChain repository, it seems like you're encountering rate limit issues when trying to process large amounts of text with the ada-embeddings-02 model. There was a similar issue reported in the LangChain repository titled Langchain QA over large documents results in Rate limit errors. However, it doesn't seem like LangChain has a built-in mechanism for handling rate limits when making requests to OpenAI. To handle this, you can implement a backoff mechanism in your code to avoid exceeding the OpenAI token limit. You can modify the Here's how you can modify the def _generate(
self,
prompts: List[str],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> LLMResult:
params = self._invocation_params
params = {**params, **kwargs}
sub_prompts = self.get_sub_prompts(params, prompts, stop)
choices = []
token_usage: Dict[str, int] = {}
_keys = {"completion_tokens", "prompt_tokens", "total_tokens"}
for _prompts in sub_prompts:
if self.streaming:
if len(_prompts) > 1:
raise ValueError("Cannot stream results with multiple prompts.")
response = completion_with_retry(
self, prompt=_prompts, run_manager=run_manager, **params
)
choices.extend(response["choices"])
update_token_usage(_keys, response, token_usage)
# Check if token limit is about to be exceeded
if token_usage["total_tokens"] > SOME_TOKEN_LIMIT:
# Pause for some time
time.sleep(SOME_BACKOFF_TIME)
return self.create_llm_result(choices, prompts, token_usage) And here's how you can modify the async def _agenerate(
self,
prompts: List[str],
stop: Optional[List[str]] = None,
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> LLMResult:
params = self._invocation_params
params = {**params, **kwargs}
sub_prompts = self.get_sub_prompts(params, prompts, stop)
choices = []
token_usage: Dict[str, int] = {}
_keys = {"completion_tokens", "prompt_tokens", "total_tokens"}
for _prompts in sub_prompts:
if self.streaming:
if len(_prompts) > 1:
raise ValueError("Cannot stream results with multiple prompts.")
response = await acompletion_with_retry(
self, prompt=_prompts, run_manager=run_manager, **params
)
choices.extend(response["choices"])
update_token_usage(_keys, response, token_usage)
# Check if token limit is about to be exceeded
if token_usage["total_tokens"] > SOME_TOKEN_LIMIT:
# Pause for some time
await asyncio.sleep(SOME_BACKOFF_TIME)
return self.create_llm_result(choices, prompts, token_usage) In both methods, Please note that these modifications should be made in the I hope this helps! If you have any other questions or need further clarification, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
I've been working with langchain for a short time, but recently, when trying to vectorize and store multiple documents in chromadb, I've hit the token-per-minute limit for the ada-embeddings-02 model. So, I tried implementing a simple backoff for this (even tried a solution from this discussion), but I ended up exceeding the daily request limit as well. I'm wondering if langchain has already introduced a class or method to handle this. If not, I'd appreciate any assistance from someone who's faced this issue and managed to resolve it.
Initially, in my project, users can upload several PDFs at once. So, if one exceeds the 300,000 tokens-per-minute limit, it's obvious the others won't get processed.
I based my code on a tutorial, so there might be some remnants from it. Below is the section where I process a file, clean it, extract its text and pages, and store it using the Chroma.from_documents() method:
So, I believe the other important method is "_text_to_docs". This is where the text is turned into chunks:
_process_document extracts metadata, and _clean_text, as its name suggests, cleans text using a loop.
I believe that covers all the important code. Does anyone have a solution for this?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions