DOCHAT

This repository contains the proof of concept (POC) for creating a domain-specific RAG bot.

The process begins by collecting documents as strings in a Pandas DataFrame. Subsequently, the collection is embedded, with each document being chunked, using ChromaDB. Upon receiving a query, the bot retrieves the most pertinent chunks based on their embedding distance from the query and passes them to the Language Model (LLM) as context, along with the original question.

The application runs in a docker-compose project, with one container for the application and one container for the database.

Starting from this project, a domain-specific RAG bot can be built by employing specific preprocessing functions to extract the required knowledge base format and defining any post-generation functions and behaviors.

Usage guide

In order to start the application, copy and paste the following commands in your terminal (Docker Desktop must be running on your machine, and you also need to store your OpenAI API key in a .env file in the project root directory):

git clone https://github.com/apiraccini/dochat.git
cd datachat
docker-compose up

TODO list

Implement control flows that filter query result based on a threshold distance.
Add token count and cost estimation (with tiktoken).
Improve the UI to show prompt and logs.
Improve the embedding/retrieving steps (chunk size vs number of results vs longer context).
Use hallucination prompts to improve retrieval? See here.
Make it a chatbot? With langchain or without?

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data/processed		data/processed
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOCHAT

Usage guide

TODO list

About

Releases

Packages

Languages

License

apiraccini/dochat

Folders and files

Latest commit

History

Repository files navigation

DOCHAT

Usage guide

TODO list

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages