You can use YandexGPT to implement scenarios for answering questions on documentation, regulatory policies, knowledge bases, etc. We recommend employing retrieval-augmented generation, so that the model can help you based on a relevant corpus of documents rather than just the data it was trained on.
- A knowledge base (collection of documents) is split into fragments (chunks) for YandexGPT to vectorize. These representations (embeddings) are stored in a vector database, e.g., OpenSearch, ChromaDB, or LanceDB.
- A user sends a text query to the system.
- YandexGPT vectorizes the query.
- The vector database is searched for chunks closest to the user’s query. Depending on chunk size, top n most relevant documents are selected.
- These documents, the user query, and task statement (prompt) are provided to YandexGPT, which generates a final response to return to the user.
In this guide, we will use the following Yandex Cloud services to demo the this scenario:
- YandexGPT: Large language model for creating document embeddings and answering questions.
- Yandex Managed Service for OpenSearch: Service for managing OpenSearch clusters. We will use this to store pairs of document chunks and vector representations of those chunks.
- Yandex Object Storage: Object storage where the knowledge base files are initially stored.
- Yandex DataSphere: Python IDE to train ML models and work with YandexGPT and OpenSearch.
The vector database and YandexGPT will be managed with LangChain, a popular open-source framework.
For detailed comments on working with the components, see the project files (we recommend you open them in Yandex DataSphere).
- Go to OpenSearch and create a cluster as per this guide. Create a group of OpenSearch hosts and a group of virtual dashboard hosts. The OpenSearch cluster, Object Storage, and DataSphere must be on the same subnet, e.g., default-ru-central1-a.
- Go to Yandex Object Storage and create a new bucket. Upload the documents to answer questions on to this bucket.
- Go to Yandex DataSphere and create a community and project to run Python code in.
- In the DataSphere project, create a connector to Object Storage S3. Activate the connector for JupyterLab.
- In the created project, go to the Settings tab and specify:
Default folder
: Folder where you created your Yandex Managed Service for OpenSearch and YandexGPT services.Service account
: Service account to access other services from your DataSphere project. The service account must be assigned the following roles: ai.languageModels.user to access the YandexGPT model, managed-opensearch.admin to work with OpenSearch, and vpc.user.Subnet
: Specify the subnet where OpenSearch and YandexGPT services are located.
- Open your project in JupyterLab and clone this repository with Git. We recommend that you use dedicated mode when launching it.
- In your project, open the YandexGPT_OpenSearch.ipynb notebook and run all the code cells.
- Documents from the object storage are split into small fragments (chunks) of this size:
chunk_size
. Make sure to definechunk_size
with these factors in mind:- Allowable context length for the embedding model. Yandex GPT embeddings support 2,048 tokens.
- Allowable size of the LLM context window. If we want to use the top 3 search results in a query, then
3*chunk_size+prompt_size+response_size
must not exceed the model’s context length.
- Next, with the Yandex GPT Embedding API, we generate vector embeddings from our text chunks. For the adapter to handle embeddings and the YandexGPT model, refer to YaGPT.py.
- We add the obtained vectors to OpenSearch.
- We test whether chunks returned for a query are relevant.
- We shape the retrieval-augmented generation pipeline and verify that it works correctly.