community: YandexGPT embeddings rate quota limit handling #19773

mkhludnev · 2024-03-29T21:23:28Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

vectorstore = Qdrant(qdrant_client,
                     collection_name=qcollection,
                     embeddings=YandexGPTEmbeddings(folder_id="cafebabe")  #hell
                     )
vectorstore.add_texts()

context #14767

Error Message and Stack Trace (if applicable)

Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:158.160.54.160:443 {grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests", grpc_status:8, created_time:"2024-03-29T23:40:55.529921+03:00"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests", grpc_status:8, created_time:"2024-03-29T23:40:57.02899+03:00"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {created_time:"2024-03-29T23:40:59.671796+03:00", grpc_status:8, grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {created_time:"2024-03-29T23:41:04.443389+03:00", grpc_status:8, grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 16.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests", grpc_status:8, created_time:"2024-03-29T23:41:13.526651+03:00"}"
>.
Traceback (most recent call last):
  File "/.venv/lib/python3.9/site-packages/gradio/queueing.py", line 522, in process_events
    response = await route_utils.call_process_api(
  File "/.venv/lib/python3.9/site-packages/gradio/route_utils.py", line 260, in call_process_api
    output = await app.get_blocks().process_api(
  File "venv/lib/python3.9/site-packages/gradio/blocks.py", line 1689, in process_api
    result = await self.call_function(
  File ".venv/lib/python3.9/site-packages/gradio/blocks.py", line 1255, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "...venv/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "...venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "...venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "...venv/lib/python3.9/site-packages/gradio/utils.py", line 750, in wrapper
    response = f(*args, **kwargs)
  File "..l/yyyy.py", line 38, in upload_file
    out = vectorstore.add_texts(texts=[doc.page_content for doc in splits],
  File "...venv/lib/python3.9/site-packages/langchain_community/vectorstores/qdrant.py", line 187, in add_texts
    for batch_ids, points in self._generate_rest_batches(
  File "...venv/lib/python3.9/site-packages/langchain_community/vectorstores/qdrant.py", line 2118, in _generate_rest_batches
    batch_embeddings = self._embed_texts(batch_texts)
  File "...venv/lib/python3.9/site-packages/langchain_community/vectorstores/qdrant.py", line 2058, in _embed_texts
    embeddings = self.embeddings.embed_documents(list(texts))
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 110, in embed_documents
    return _embed_with_retry(self, texts=texts)
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 146, in _embed_with_retry
    return _completion_with_retry(**kwargs)
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "..python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "..python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 144, in _completion_with_retry
    return _make_request(llm, **_kwargs)
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 170, in _make_request
    res = stub.TextEmbedding(request, metadata=self._grpc_metadata)  # type: ignore[attr-defined]
  File "...venv/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "...venv/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:158.160.54.160:443 {created_time:"2024-03-29T23:41:30.793786+03:00", grpc_status:8, grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"}"
>

Description

If I use YandexGPTEmbeddings() without sleep_interval it fails after sequence of retries.
Perhaps my serverside quota is miserable, and I need to put some money on, I don't even know. Neverthless

How I can configure rate limit in client side?
Is it reasonable to handle rate limit exception via limited numbers of retires.

cc @tyumentsev4

System Info

$ pip show yandexcloud
Name: yandexcloud
Version: 0.248.0

The text was updated successfully, but these errors were encountered:

tyumentsev4 · 2024-03-30T07:20:23Z

There is currently a default quota of 10 text vectorization requests per second.

If you need more resources, contact support
and tell us which quotas you need to increase and by how much.

Therefore it is necessary to set sleep_interval=0.1

vectorstore = Qdrant(qdrant_client,
                     collection_name=qcollection,
                     embeddings=YandexGPTEmbeddings(folder_id="cafebabe", sleep_interval=0.1)
                     )
vectorstore.add_texts()

dosubot bot added Ɑ: embeddings Related to text embedding models module 🔌: qdrant Primarily related to Qdrant vector store integration 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Mar 29, 2024

mkhludnev changed the title ~~community: YandexGPT embeddings #14767 rate quota limit handling~~ community: YandexGPT embeddings rate quota limit handling Mar 29, 2024

mkhludnev closed this as completed Mar 30, 2024

mkhludnev mentioned this issue Apr 7, 2024

community: Fix YandexGPT embeddings #19720

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: YandexGPT embeddings rate quota limit handling #19773

community: YandexGPT embeddings rate quota limit handling #19773

mkhludnev commented Mar 29, 2024 •

edited

Loading

tyumentsev4 commented Mar 30, 2024 •

edited

Loading

community: YandexGPT embeddings rate quota limit handling #19773

community: YandexGPT embeddings rate quota limit handling #19773

Comments

mkhludnev commented Mar 29, 2024 • edited Loading

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

tyumentsev4 commented Mar 30, 2024 • edited Loading

mkhludnev commented Mar 29, 2024 •

edited

Loading

tyumentsev4 commented Mar 30, 2024 •

edited

Loading