Skip to content

Latest commit

 

History

History
45 lines (35 loc) · 4.83 KB

README_en.md

File metadata and controls

45 lines (35 loc) · 4.83 KB

Retrieval-augmented generation in Yandex Cloud services

You can use YandexGPT to implement scenarios for answering questions on documentation, regulatory policies, knowledge bases, etc. We recommend employing retrieval-augmented generation, so that the model can help you based on a relevant corpus of documents rather than just the data it was trained on.

Retrieval-augmented generation architecture

How it works:

  1. A knowledge base (collection of documents) is split into fragments (chunks) for YandexGPT to vectorize. These representations (embeddings) are stored in a vector database, e.g., OpenSearch, ChromaDB, or LanceDB.
  2. A user sends a text query to the system.
  3. YandexGPT vectorizes the query.
  4. The vector database is searched for chunks closest to the user’s query. Depending on chunk size, top n most relevant documents are selected.
  5. These documents, the user query, and task statement (prompt) are provided to YandexGPT, which generates a final response to return to the user.

In this guide, we will use the following Yandex Cloud services to demo the this scenario:

  1. YandexGPT: Large language model for creating document embeddings and answering questions.
  2. Yandex Managed Service for OpenSearch: Service for managing OpenSearch clusters. We will use this to store pairs of document chunks and vector representations of those chunks.
  3. Yandex Object Storage: Object storage where the knowledge base files are initially stored.
  4. Yandex DataSphere: Python IDE to train ML models and work with YandexGPT and OpenSearch.

The vector database and YandexGPT will be managed with LangChain, a popular open-source framework.

For detailed comments on working with the components, see the project files (we recommend you open them in Yandex DataSphere).

Step-by-step guide

  1. Go to OpenSearch and create a cluster as per this guide. Create a group of OpenSearch hosts and a group of virtual dashboard hosts. The OpenSearch cluster, Object Storage, and DataSphere must be on the same subnet, e.g., default-ru-central1-a.
  2. Go to Yandex Object Storage and create a new bucket. Upload the documents to answer questions on to this bucket.
  3. Go to Yandex DataSphere and create a community and project to run Python code in.
  4. In the DataSphere project, create a connector to Object Storage S3. Activate the connector for JupyterLab.
  5. In the created project, go to the Settings tab and specify:
  • Default folder: Folder where you created your Yandex Managed Service for OpenSearch and YandexGPT services.
  • Service account: Service account to access other services from your DataSphere project. The service account must be assigned the following roles: ai.languageModels.user to access the YandexGPT model, managed-opensearch.admin to work with OpenSearch, and vpc.user.
  • Subnet: Specify the subnet where OpenSearch and YandexGPT services are located.
  1. Open your project in JupyterLab and clone this repository with Git. We recommend that you use dedicated mode when launching it.
  2. In your project, open the YandexGPT_OpenSearch.ipynb notebook and run all the code cells.

Key steps in the YandexGPT_OpenSearch notebook

  1. Documents from the object storage are split into small fragments (chunks) of this size: chunk_size. Make sure to define chunk_size with these factors in mind:
    • Allowable context length for the embedding model. Yandex GPT embeddings support 2,048 tokens.
    • Allowable size of the LLM context window. If we want to use the top 3 search results in a query, then 3*chunk_size+prompt_size+response_size must not exceed the model’s context length.
  2. Next, with the Yandex GPT Embedding API, we generate vector embeddings from our text chunks. For the adapter to handle embeddings and the YandexGPT model, refer to YaGPT.py.
  3. We add the obtained vectors to OpenSearch.
  4. We test whether chunks returned for a query are relevant.
  5. We shape the retrieval-augmented generation pipeline and verify that it works correctly.