Privatized chatbots based on RAG and Llama3.
- RAG (Retrieval Enhanced Generation) refers to the optimization of large language model output so that it can reference an authoritative knowledge base outside of the training data source before generating a response. Large language models (LLMs) are trained on massive amounts of data, using billions of parameters to generate raw output for tasks such as answering questions, translating language, and completing sentences. Building on the already powerful capabilities of LLMs, RAG extends them to provide access to domain-specific or organization-specific internal knowledge bases, all without the need to retrain models. It's a cost-effective way to improve LLM output to keep it relevant, accurate, and useful in a variety of contexts.
- Ollama Get a framework to get large models up and running quickly。
- Llama3 8B Meta's open-source model.
- LangChain Help developers easily build applications based on large language models (LLMs).。
- Install python dependencies
pip install dspy gradio langchain langchain_community langchain_core langchain_huggingface pypdf fastembed chromadb sentence-transformers pandas openpyxl
- Ollama
see:https://github.com/ollama/ollama
# 1. install Ollama: https://github.com/ollama/ollama
# 2. Run ollama
ollama serve
# 3. Download llama3 8B
ollama pull llama3
# 4. run
ollama run llama3
Supported data types:
- json(training_jsons)
- pdf(training_pdfs)
- xlsx(training_xlsx)
- tweets @see https://github.com/chainupcloud/twitter-scan
# Convert xlsx file to json
python training_xlsx_to_json.py
# Create a local ChromaDB (/db in the root directory of the project)
python create_chroma_collection.py
# Load training data to ChromaDB (new files can be added at any time)
python load_data.py
python chatbot.py
After running, open the browser to access: http://localhost:7860, which comes with a web UI interface.