Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: YandexGPT embeddings rate quota limit handling #19773

Closed
5 tasks done
mkhludnev opened this issue Mar 29, 2024 · 1 comment
Closed
5 tasks done

community: YandexGPT embeddings rate quota limit handling #19773

mkhludnev opened this issue Mar 29, 2024 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module 🔌: qdrant Primarily related to Qdrant vector store integration

Comments

@mkhludnev
Copy link
Contributor

mkhludnev commented Mar 29, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

vectorstore = Qdrant(qdrant_client,
                     collection_name=qcollection,
                     embeddings=YandexGPTEmbeddings(folder_id="cafebabe")  #hell
                     )
vectorstore.add_texts()

context #14767

Error Message and Stack Trace (if applicable)

Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:158.160.54.160:443 {grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests", grpc_status:8, created_time:"2024-03-29T23:40:55.529921+03:00"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests", grpc_status:8, created_time:"2024-03-29T23:40:57.02899+03:00"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {created_time:"2024-03-29T23:40:59.671796+03:00", grpc_status:8, grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {created_time:"2024-03-29T23:41:04.443389+03:00", grpc_status:8, grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"}"
>.
Retrying langchain_community.embeddings.yandex._embed_with_retry.<locals>._completion_with_retry in 16.0 seconds as it raised _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4::443 {grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests", grpc_status:8, created_time:"2024-03-29T23:41:13.526651+03:00"}"
>.
Traceback (most recent call last):
  File "/.venv/lib/python3.9/site-packages/gradio/queueing.py", line 522, in process_events
    response = await route_utils.call_process_api(
  File "/.venv/lib/python3.9/site-packages/gradio/route_utils.py", line 260, in call_process_api
    output = await app.get_blocks().process_api(
  File "venv/lib/python3.9/site-packages/gradio/blocks.py", line 1689, in process_api
    result = await self.call_function(
  File ".venv/lib/python3.9/site-packages/gradio/blocks.py", line 1255, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "...venv/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "...venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "...venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "...venv/lib/python3.9/site-packages/gradio/utils.py", line 750, in wrapper
    response = f(*args, **kwargs)
  File "..l/yyyy.py", line 38, in upload_file
    out = vectorstore.add_texts(texts=[doc.page_content for doc in splits],
  File "...venv/lib/python3.9/site-packages/langchain_community/vectorstores/qdrant.py", line 187, in add_texts
    for batch_ids, points in self._generate_rest_batches(
  File "...venv/lib/python3.9/site-packages/langchain_community/vectorstores/qdrant.py", line 2118, in _generate_rest_batches
    batch_embeddings = self._embed_texts(batch_texts)
  File "...venv/lib/python3.9/site-packages/langchain_community/vectorstores/qdrant.py", line 2058, in _embed_texts
    embeddings = self.embeddings.embed_documents(list(texts))
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 110, in embed_documents
    return _embed_with_retry(self, texts=texts)
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 146, in _embed_with_retry
    return _completion_with_retry(**kwargs)
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "..python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "..python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "...venv/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 144, in _completion_with_retry
    return _make_request(llm, **_kwargs)
  File "...venv/lib/python3.9/site-packages/langchain_community/embeddings/yandex.py", line 170, in _make_request
    res = stub.TextEmbedding(request, metadata=self._grpc_metadata)  # type: ignore[attr-defined]
  File "...venv/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "...venv/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:158.160.54.160:443 {created_time:"2024-03-29T23:41:30.793786+03:00", grpc_status:8, grpc_message:"ai.embeddingsTextEmbeddingRequestsPerSecond.rate rate quota limit exceed: allowed 10 requests"}"
>

Description

If I use YandexGPTEmbeddings() without sleep_interval it fails after sequence of retries.
Perhaps my serverside quota is miserable, and I need to put some money on, I don't even know. Neverthless

  1. How I can configure rate limit in client side?
  2. Is it reasonable to handle rate limit exception via limited numbers of retires.

cc @tyumentsev4

System Info

$ pip show yandexcloud
Name: yandexcloud
Version: 0.248.0
@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🔌: qdrant Primarily related to Qdrant vector store integration 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Mar 29, 2024
@mkhludnev mkhludnev changed the title community: YandexGPT embeddings #14767 rate quota limit handling community: YandexGPT embeddings rate quota limit handling Mar 29, 2024
@tyumentsev4
Copy link
Contributor

tyumentsev4 commented Mar 30, 2024

There is currently a default quota of 10 text vectorization requests per second.

If you need more resources, contact support
and tell us which quotas you need to increase and by how much.

Therefore it is necessary to set sleep_interval=0.1

vectorstore = Qdrant(qdrant_client,
                     collection_name=qcollection,
                     embeddings=YandexGPTEmbeddings(folder_id="cafebabe", sleep_interval=0.1)
                     )
vectorstore.add_texts()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module 🔌: qdrant Primarily related to Qdrant vector store integration
Projects
None yet
Development

No branches or pull requests

2 participants