This is a starter repo for an easy and quick Retrieval-Augmented Generation (RAG) system.
This application demonstrates a basic implementation of a RAG system, which combines the power of large language models with a custom knowledge base. It uses Streamlit for the frontend interface.
A Jupyter Notebook is available
rag_tutorial.ipynb
if needed.
- Python 3.7+
- Create a new virtual environment
- Install the required Python libraries:
pip install -r requirements.txt
- Add your OpenAI API key to the environment variable:
OPENAI_API_KEY
Run loader.py
to create the Vector Database and vectorize your PDF documents:
python app/loader.py
This script will create a ChromaDB instance, which is an open-source embedding database. For more information, visit: https://docs.trychroma.com/
To start the Streamlit app, run:
streamlit run app/app.py
- PDF document ingestion and vectorization
- Natural language querying of the knowledge base
- Integration with OpenAI's language models for response generation
You can customize this template by:
- Adding more document types for ingestion
- Implementing different embedding models
- Enhancing the user interface with additional Streamlit components
- Currently only supports PDF documents
- Requires an OpenAI API key
This readme was written by Claude 3.5 and checked by me.