Welcome to the STACKIT RAG Template! This is a basic example of how to use the RAG-API libraries, designed to help you get started with building AI-powered chatbots and document management systems 📖 (see main.py, container.py and chat_endpoint.py).
Document Management: Supports PDFs, DOCX, PPTX, XML, and Confluence documents.
AI Integration: Multiple LLM and embedder providers for flexibility.
Tracing & Evaluation: Tools for monitoring and assessing system performance.
Frontends: User-friendly interfaces for easy interaction.
Security: Basic authentication for secure access.
Deployment: Options for both local and production environments.
The template supports multiple LLM (Large Language Model) providers, such as STACKIT and Ollama, giving you flexibility in choosing the best fit for your project. It also integrates with Langfuse for enhanced monitoring and analytics, and uses S3 object storage for document management. 📁
A Tiltfile
is provided to get you started 🚀. If Tilt is new for you, and you want to learn more about it, please take a look at the Tilt guides.
This repository contains the following components:
- rag-backend: The main component of the RAG.
- admin-backend: Manages user documents and confluence spaces, interacts with document-extractor and rag-backend.
- document-extractor: Extracts content from documents and Confluence spaces.
- frontend: Frontend for both, chat and admin APIs.
- rag-infrastructure: Contains the helm-chart and other files related to infrastructure and deployment. Please consult this README for further information.
- rag-core-library: Contains the API-libraries that are used to construct the backend-services in this repository. For further information, please consult this README.
The backend is the main component of the RAG. It handles all connections to the vector database, as well as chatting.
All components are provided by the rag-core-api. For further information on endpoints and requirements, please consult this README.
The Admin backend is a component that is used to manage user provided documents and confluence spaces. It communicates with the document-extractor to extract the content from the documents and confluence spaces. Besides, it communicates with the rag-backend to store the document chunks into the vector database. For storing the documents, it uses the S3 object storage. It also acts as interface to provide the current status of the documents and confluence spaces in the RAG.
All components are provided by the admin-api-lib. For further information on endpoints and requirements, please consult this README.
The Document extractor is a component that is used to extract the content from the documents and confluence spaces.
All components are provided by the extractor-api-lib. For further information on endpoints and requirements, please consult this README.
📝 Windows users: make sure you use wsl for infrastructure setup & orchestration.
Every package contains a pyproject.toml
with the required Python packages.
Poetry is used for requirement management.
To ensure the requirements are consistent, you have to update the poetry.lock
in addition to the pyproject.toml
when updating/changing requirements. Additional requirements like black and flake8 are provided for development. You can install them with poetry install --with dev
inside the package-directory.
📝 Do not update the requirements in the
pyproject.toml
manually. Doing so will invalidate thepoetry.lock
. Use the poetry application for this.
Run
poetry add --lock <package>
insisde of the package directory in order to add new packages. This will automatically update the pyproject.toml
and the poetry.lock
.
System requirements have to manually be added to the Dockerfile
.
This example of the rag-template includes a WebUI for document-management, as well as for the chat.
After following the setup instruction for either the local installation or the installation on a server the WebUI is accessible via the configured ingress. After uploading a file in the document-management WebUI you can start asking question about your document in the chat WebUI.
For a complete documentation of the available REST-APIs, please consult the README of the rag-core-library.
If you want to replace some dependencies with you own dependencies, see the rag-backend folder, especially the main.py, container.py and chat_endpoint.py.
The following is a list of the dependencies. If you miss one of the dependencies, click on the name and follow the install instructions.
For local deployment, a few env variables need to be provided by an .env
file (here: .)
The .env
needs to contain the following values:
BASIC_AUTH=Zm9vOiRhcHIxJGh1VDVpL0ZKJG10elZQUm1IM29JQlBVMlZ4YkpUQy8K
S3_ACCESS_KEY_ID=...
S3_SECRET_ACCESS_KEY=...
VITE_AUTH_USERNAME=...
VITE_AUTH_PASSWORD=...
RAGAS_OPENAI_API_KEY=...
STACKIT_VLLM_API_KEY=...
STACKIT_EMBEDDER_API_KEY=...
# ONLY necessary, if no init values are set. if init values are set,
# the following two values should match the init values or be commented out
# or be created via the langfuse UI.
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
This results in a basic auth with username=foo
and password=bar
.
📝 NOTE: All values containing
...
are placeholders and have to be replaced with real values. This deployment comes with multiple options. You change theglobal.config.envs.rag_class_types.RAG_CLASS_TYPE_LLM_TYPE
in the helm-deployment to on of the following values:
stackit
: Uses an OpenAI compatible LLM, like the STACKIT model serving service.ollama
: Uses ollama as an LLM provider.
Optionally you can set the following values in the .env
file:
# Instead of generating the org, project, user, public key
# and secret key through the UI, you can set INIT values for them.
LANGFUSE_INIT_ORG_ID=...
LANGFUSE_INIT_PROJECT_ID=...
LANGFUSE_INIT_PROJECT_PUBLIC_KEY=pk-lf-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
LANGFUSE_INIT_PROJECT_SECRET_KEY=sk-lf-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
LANGFUSE_INIT_USER_EMAIL=...
LANGFUSE_INIT_USER_NAME=...
LANGFUSE_INIT_USER_PASSWORD=...
# If you wanna extract content from a confluence space, you need to provide the following values
CONFLUENCE_URL=...
CONFLUENCE_TOKEN=...
CONFLUENCE_SPACE_KEY=...
In the following, the k3d cluster setup and the setup inside the k3d will be explained.
For a detailed explanation of the k3d setup, please consult the rag-infrastructure README.
If this is the first time you are starting the Tiltfile
you have to build the helm-chart first.
This can be done with the following command from the root of the git-repository:
cd rag-infrastructure/rag;helm dependency update; cd ../..
📝 NOTE: The configuration of the
Tiltfile
requiresfeatures.frontend.enabled=true
,features.keydb.enabled=true
,features.langfuse.enabled=true
andfeatures.qdrant.enabled=true
.
After the initial build of the helm chart Tilt is able to update the files.
The following will tear up the microservices in k3d. For the following steps, it is assumed your current working directory is the root of the git-repository.
tilt up
Environment variables are loaded from .env
file in the root of this git-repository.
The Tilt UI is available at http://localhost:10350/
If you want to access Qdrant etc. just click the resource in the UI. In the upper corner will be the link, to access the resource.
To enable debugging, start tilt with the following command:
tilt up -- --debug=true
The backend will wait until your debugger is connected before it will fully start.
The debugger used is debugpy
which is compatible with VS Code.
To connect the debugger, you can use the following launch.json
:
{
"version": "0.2.0",
"configurations": [
{
"name": "rag_backend",
"type": "python",
"request": "attach",
"host": "localhost",
"port": 31415,
"justMyCode": false,
"env": {
"PYDEVD_WARN_EVALUATION_TIMEOUT": "600",
"PYDEVD_THREAD_DUMP_ON_WARN_EVALUATION_TIMEOUT": "600"
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/rag-backend",
"remoteRoot": "/app/rag-backend"
},
{
"localRoot": "${workspaceFolder}/rag-core-library/rag-core-lib",
"remoteRoot": "/app/rag-core-library/rag-core-lib"
},
{
"localRoot": "${workspaceFolder}/rag-core-library/rag-core-api",
"remoteRoot": "/app/rag-core-library/rag-core-api"
},
// avoid tilt warning of missing path mapping
{
"localRoot": "${workspaceFolder}/rag-core-library/admin-api-lib",
"remoteRoot": "/app/rag-core-library/admin-api-lib"
},
]
},
{
"name": "document_extractor",
"type": "python",
"request": "attach",
"host": "localhost",
"port": 31416,
"justMyCode": false,
"env": {
"PYDEVD_WARN_EVALUATION_TIMEOUT": "600",
"PYDEVD_THREAD_DUMP_ON_WARN_EVALUATION_TIMEOUT": "600"
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/document-extractor",
"remoteRoot": "/app/document-extractor"
},
{
"localRoot": "${workspaceFolder}/rag-core-library/extractor-api-lib",
"remoteRoot": "/app/rag-core-library/extractor-api-lib"
},
// avoid tilt warning of missing path mapping
{
"localRoot": "${workspaceFolder}/rag-core-library/rag-core-api",
"remoteRoot": "/app/rag-core-library/rag-core-api"
},
{
"localRoot": "${workspaceFolder}/rag-core-library/admin-api-lib",
"remoteRoot": "/app/rag-core-library/admin-api-lib"
},
]
},
{
"name": "rag_admin_backend",
"type": "python",
"request": "attach",
"host": "localhost",
"port": 31417,
"justMyCode": false,
"env": {
"PYDEVD_WARN_EVALUATION_TIMEOUT": "600",
"PYDEVD_THREAD_DUMP_ON_WARN_EVALUATION_TIMEOUT": "600"
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/admin-backend",
"remoteRoot": "/app/admin-backend"
},
{
"localRoot": "${workspaceFolder}/rag-core-library/rag-core-lib",
"remoteRoot": "/app/rag-core-library/rag-core-lib"
},
{
"localRoot": "${workspaceFolder}/rag-core-library/admin-api-lib",
"remoteRoot": "/app/rag-core-library/admin-api-lib"
},
// avoid tilt warning of missing path mapping
{
"localRoot": "${workspaceFolder}/rag-core-library/rag-core-api",
"remoteRoot": "/app/rag-core-library/rag-core-api"
}
]
}
]
}
The following will delete everything deployed with tilt up
command
tilt down
A detailed explanation of, how to access a service via ingress, can be found in the rag-infrastructure README.
The RAG template requires at least:
- A Kubernetes Cluster
- S3 ObjectStorage
Provided is an example Terraform script, using the STACKIT Terrraform Provider:
resource "stackit_ske_project" "rag-ske" {
project_id = var.stackit_project_id
}
resource "stackit_ske_cluster" "rag-ske" {
project_id = stackit_ske_project.rag-ske.id
name = "rag"
kubernetes_version = "1.27"
node_pools = [
{
name = "rag-node1"
machine_type = "g1.4"
max_surge = 1
minimum = "1"
maximum = "1"
availability_zones = ["eu01-1"]
os_version = "3815.2.5"
volume_size = 320
volume_type = "storage_premium_perf1"
}
]
maintenance = {
enable_kubernetes_version_updates = true
enable_machine_image_version_updates = true
start = "01:00:00Z"
end = "02:00:00Z"
}
}
resource "stackit_objectstorage_credentials_group" "credentials-group" {
project_id = stackit_ske_project.rag-ske.id
name = "credentials-group"
depends_on = [stackit_ske_project.rag-ske, stackit_objectstorage_bucket.docs]
}
resource "stackit_objectstorage_credential" "misc-creds" {
depends_on = [stackit_objectstorage_credentials_group.credentials-group]
project_id = stackit_objectstorage_credentials_group.credentials-group.project_id
credentials_group_id = stackit_objectstorage_credentials_group.credentials-group.credentials_group_id
expiration_timestamp = "2027-01-02T03:04:05Z"
}
resource "stackit_objectstorage_bucket" "docs" {
project_id = stackit_ske_project.rag-ske.id
name = "docs"
}
For further information please consult the STACKIT Terrraform Provider documentation.
Further requirements for the server can be found here.
A detailed description regarding the configuration of Langfuse can be found here.
The example Tiltfile
provides a triggered linting and testing.
The linting-settings can be changed in the rag-backend/pyproject.toml
file under section tool.flake8
.
This use case example contains 2 git submodules, the rag-infrastructure
and the rag-core-library
.
In order to contribute please consult the CONTRIBUTING.md.