Skip to content

Backend service for routing request from Frontend UI to RAG datasources and LLMs

Notifications You must be signed in to change notification settings

redhat-composer-ai/quarkus-llm-router

Repository files navigation

Quarkus LLM Routing Service

Quarkus Service that allows for the routing to different RAG sources and LLMs.

Architecture

Components

  • Assistants - An assistant is the top level component that describes how all the components below are connected.
  • Content Retrievers- Content Retrievers is the RAG(Retrieval-Augmented Generation) connection info used to retrieve data that will be included in the message to the LLM
    • Embedding Models- These are generally the model used to convert data that is stored/retrieved from a vector database, which is a common pattern for RAG Datasources
  • LLMs- The connection information to the Runtime Serving Environment hosting the Large Language Model
  • AI Services - The component orchestrating the calls to the Content Retrievers and LLMs

Note

Currently working on finding a less restrictive way than AI Services in which to perform the orchestration of our calls

Chat Bot Endpoints

Assistant

The assistant/chat/streaming is the primary entrypoint into the application. This used to specify an assistant

Example Message

{
  "message": "User Message",
  "assistantName": "assistant_name"
}

Default Assistants

The following assistants are loaded into the application by default using liquibase and changeLog File:

Assistant Name Description
default_ocp Default assistant for OpenShift Container Platform (OCP)
default_rhel Default assistant for Red Hat Enterprise Linux (RHEL)
default_rho_2025_faq Default assistant for RHO 2025 FAQ
default_ansible Default assistant for Ansible automation
default_rhoai Default assistant for RHO AI
default_assistant General default assistant

Direct Chat

The /chatbot/chat/stream endpoint allows for connections to be specified directly, and can be used for initial testing of connections.

{
  "message": "User Message",
  "context": "Message history",
  "retriverRequest": {
    "index": "weveIndex",
    "scheme": "weveScheme",
    "host": "weavHost.com",
    "apiKey": "xxx"
  },
  "modelRequest": {
    "modelType": "servingRuntime",
    "apiKey": "xxxxx",
    "modelName": "mistral-instruct"
  }
}

Local Development

Use the following commands to run locally:

mvn clean install
# Setting the profile to to use the `application-local.properties` as explained below
mvn quarkus:dev -Dquarkus.profile=local

Tip

Recommended that properties below are set in the application-local.properties file which is gitignored. This will prevent any accidental check-ins of secret information

LLM Connection

The following properties should be set in order to connect to properly connect to your LLM running on an OpenAI instance:

openai.default.url=<RUNTIME_URL>/v1
openai.default.apiKey=<API_KEY> # If Required
openai.default.modelName=<MODEL_NAME>

Tip

The default_assistant can be used without having to configure a rag data source

Weaviate Setup

The default assistants all assume a connection to a Weaviate DB for RAG purposes.

A locally hosted Weaviate can be deployed and used, more information found here(TBD)

If a remote instance of weaviate exist on an OpenShift cluster and has the correct indexes, that instance can be used with the following port forward commands:

oc project $PROJECT
oc port-forward service/weaviate-vector-db 8086:8080 50051:50051

Once forwarded the following values can be changed

weaviate.default.scheme=http
weaviate.default.host=localhost:8086
weaviate.default.apiKey=<API KEY>

If using the App of Apps repo the API key is retrieved from autogenerated secret weaviate-api-key-secret

Embedding Model

Currently the supported models are added to the resources folder and loaded directly. We would like to move this logic to pull these models using maven as seen here

Important

The embedding model is too large to check into our repo. Download it from huggingface or here if internal to RH. Then add it to resources/embedding/nomic with the name model.onnx, it should be gitignored if done correctly. The download can be performed by running the download-nomic-embeddings-model.sh script.

Local Curl

If the LLM Connection has been setup correctly the following curl command should stream a response from your LLM

curl -X 'POST'   'http://localhost:8080/assistant/chat/streaming' -H 'Content-Type: application/json'   -d '{
  "message": "What is this product?",
  "assistantName": "default_assistant"
}' -N

The assistantName can be swapped out for other assistants inside of the table above, but the other assistants will required a connection to a weaviate db with the correct indexes. The App Of Apps repository contains a validation script that can be used to show which indexes currently exist.

Local Curl with File Upload

To send a local curl request with an uploaded file, the following command may be used:

curl 'http://localhost:8080/assistant/chat/streaming' -F 'jsonRequest={
  "message": "Please summarize the document that I uploaded",
  "assistantName": "default_assistant"
};type=application/json' -F 'document=@/path/to/my/file.txt'

Admin Flow

Information about the creation/updating of Assistants, ContentRetrievers, and LLMs can be found in the admin flow docs

Authentication

Authentication is disabled by default. It can be enabled by using the environment variable DISABLE_AUTHORIZATION=false. If enabled in dev mode, a keycloak instance will be spun up locally and populated with a default realm. The authentication can be tested with the following steps:


# Keycloak runs on a random port, retrieve the port and update the curl command based on that.
docker ps

export access_token=$(\
    curl --insecure -X POST http://localhost:32886/realms/quarkus/protocol/openid-connect/token \
    --user backend-service:secret \
    -H 'content-type: application/x-www-form-urlencoded' \
    -d 'username=alice&password=alice&grant_type=password' | jq --raw-output '.access_token' \
 )

curl -X 'POST'  'http://localhost:8080/assistant/chat/streaming' -H 'Authorization: Bearer '$access_token -H 'Content-Type: application/json'   -d '{
  "message": "What is this product?",
  "assistantName": "default_assistant"
}' -N -v

See https://quarkus.io/guides/security-keycloak-authorization if an external keycloak instance is required.

Contributing

We welcome contributions from the community! Here's how you can get involved:

1. Find an Issue:

  • Check the issue tracker or jira board for open issues that interest you.
  • If you find a bug or have a feature request, please open a new issue with a clear description.

Note

Currently this project is primarlly tracking using Red Hat's internal Jira.

2. Fork the Repository:

  • Fork this repository to your own GitHub account.

3. Create a Branch:

  • Create a new branch for your changes. Use a descriptive name that reflects the issue or feature you're working on (e.g., fix-issue-123 or add-new-feature).

4. Make Changes:

  • Make your desired changes to the codebase.
  • Follow the existing code style and conventions.
  • Write clear commit messages that explain the purpose of your changes.

5. Test Your Changes:

  • Thoroughly test your changes to ensure they work as expected.
  • If there are existing tests, make sure they all pass. Consider adding new tests to cover your changes.

6. Code Style and Formatting:

  • Ensure your code adheres to the project's established code style guidelines.
  • This project uses Checkstyle's automated code formatting tools. Your code must pass these checks before it can be merged.

Tip

Checkstyle can be run locally using mvn site which creates a report page under target/site.

It is also recommended that you use a checkstyle tool in your IDE such as this VS Code Plugin in order order to adhear to guidelines as you code.

7. Open a Pull Request:

  • Push your branch to your forked repository.
  • Open a pull request to the main repository.
  • In the pull request description, clearly explain the changes you've made and reference the related issue (if applicable).
  • Validate that all automate checks are passing

8. Review Process:

  • Your pull request will be reviewed by the project maintainers.
    • Feel free to ping in our general slack channel asking for approval/assistants
  • Be prepared to address any feedback or questions.
  • Once your code has passed all automated checks and received at least one approval from a maintainer, it will be merged.

Important Notes:

  • All code contributions must pass automated code scanning checks before they can be merged.
  • At least one approval from a maintainer is required for all pull requests.

Thank you for your contributions!

Security Scanning

The OWASP Dependency-Check Plugin not required to pass but is included, we ask that the scanner be run if any changes are made to the dependencies.

mvn validate -P security-scanner

About

Backend service for routing request from Frontend UI to RAG datasources and LLMs

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages