Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR: Chat Module Architecture #185

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .spellcheck-en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ agentic
Akash
AMDGPU
Anil
architected
arge
args
arXiv
Expand Down Expand Up @@ -138,6 +139,7 @@ Miniforge
MiniLM
Mixtral
mixtral
modifiability
MLX
MMLU
modularize
Expand Down Expand Up @@ -242,6 +244,7 @@ tatsu
TBD
templating
Tesla
testability
th
tl
TODO
Expand All @@ -253,6 +256,7 @@ Triagers
triagers
UI
ui
understandability
unquantized
unstaged
URI
Expand Down
31 changes: 31 additions & 0 deletions docs/adr-chat-module-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# InstructLab Chat Module Architecture

## Context

InstructLab recently gained the ability of [retrieval augmented generation (RAG) for chat](https://github.com/instructlab/instructlab/pull/2886). The initial implementation is the simplest possible, inserting retrieved document chunks in the chat session history before submitting the user query to the target model. This is done by adding a retriever instance into the chat class which is called by submitting the user query as-is to the vector store as a search query to retrieve those chunks.

Tuning retrieval systems requires significant expertise and testing, and there are many techniques that may be needed specifically and only for that use case. For example, query reformulation or agentic approaches may be necessary to achieve reliable retrieval quality while being redundant and expensive in a non-retrieval scenario.

Having a single code path that conditionally executes various strategies based on a mixture of configuration, application state, and user input can quickly become difficult to maintain, debug, and even understand. That chat module in InstructLab would benefit from a principled approach; that is, to be purposefully architected before becoming a [big ball of mud](http://www.laputan.org/mud/?ref=blog.codinghorror.com).

Prompt template management, model interaction pattern, chat session history, and any interactions with third party systems can vary by use case and so should be logically grouped for purposes of at least maintainability, understandability, modifiability, and testability.

## Decision

The InstructLab chat module will adopt a [strategy pattern](https://en.wikipedia.org/wiki/Strategy_pattern).

## Status

Accepted

## Consequences

* A refactor to the chat module will be necessary before additional feature work.
* Encapsulation of different sets of logic as distinct pattern should decrease the risk of regression in one area when implementing another.
* Division of responsibility will be clear from the code and project structure.
* Testing different chat strategies will become simpler, with a smaller total testing surface.
* Development velocity of RAG-specific improvements should be higher.
* Code review will become simpler.
* Risk of unsustainable maintenance overhead should be decreased.
* As blocks of logic common to multiple strategies emerge over time, there will be a natural path to a pipeline approach in the future.
* Configuration might become more complex; care should be taken in designing it to be flexible and to avoid one-way doors.