Welcome to the repository for the Bournemouth University Chatbot Project! This project aims to build a robust, context-based chatbot designed to provide accurate and conversational responses to user queries. Leveraging state-of-the-art natural language processing (NLP) techniques, this chatbot connects large language models (LLMs) to external data sources, ensuring reliable and real-time information retrieval.
- Data Collection: Automates data collection by webscraping information from university websites and extracts useful information from PDF files using
PyPDF2
. The extracted data undergoes cleaning, processing, and transformation for use in the chatbot. - LLM Connection: Uses Langchain to integrate large language models (OpenAI, Gemini) with external data sources for accurate query resolution.
- Vector Database: Implements Pinecone for efficient storage and fast retrieval of data, enabling the chatbot to perform accurate and quick searches.
- Multilingual Support: Provides responses in multiple languages (e.g., Spanish, Italian) to support diverse user groups.
- Interactive UI: Built using Streamlit, the web interface offers an intuitive and user-friendly platform for engaging with the chatbot.
- Evaluation Metrics: Ensures robustness with evaluation metrics such as BERT score and other LLM-specific evaluation measures.
- Deployment: Deployed using Render for free hosting, with Docker for containerization and potential cloud integration (e.g., Azure).
- CI/CD: Utilizes GitHub Actions for continuous integration and delivery, ensuring streamlined collaboration and project updates.
- Performance Monitoring: Includes logging mechanisms to track the chatbot's performance and identify areas for improvement.
Category | Tool/Library |
---|---|
LLM Models | OpenAI, Gemini |
Data Processing | Pandas, Matplotlib, Plotly |
Data Retrieval | Langchain |
Vector Database | Pinecone |
Code Formatting | Ruff, Black |
Web Application | Streamlit |
Collaboration | GitHub, GitHub Actions |
Deployment | Render, Docker, Azure |
Automation (Optional) | Airflow |
- Clone the Repository
git clone https://github.com/username/university-chatbot.git
cd university-chatbot
- Create a Virtual Environment and activate it
python -m venv venv
source venv/bin/activate # Linux/Mac OS
venv\Scripts\activate # Windows
- Install Dependencies and required packages
pip install -r requirements.txt
-
Create '.env file and configure Environment Variables (Refer to the .env.example file) for the required configuration settings.
-
Get the API keys for OpenAI, Pinecone, and Render and update the .env file with the respective keys from Pinecone, OpenAI, and Render.
-
Run the Chatbot Application
streamlit run app.py
- Data Extraction: Data Extraction: Scrape university websites and extract PDF content.
- Data Processing: Transform data using Pandas and store it in a Pinecone vector database for quick retrieval.
- LLM integration: Use Langchain to connect the stored data to an LLM for context-aware query resolution.
- UI Deployment: Create a user-friendly interface with Streamlit.
- Evaluation: Measure the chatbot’s accuracy using metrics like BERT score and other LLM evaluation tools.
- CI/CD: Use GitHub Actions for continuous integration and delivery, ensuring streamlined collaboration and project updates.
- Deployment: Deploy the chatbot on Render or Docker, with optional CI/CD pipelines using GitHub Actions.
- Performance Monitoring: Includes logging mechanisms to track the chatbot's performance and identify areas for improvement.
we welcome contributions from the community! If you're interested in contributing to this project, please follow these steps:
- Fork the repository
- Clone the forked repository (
git clone repository
) - Create a new branch (
git checkout -b feature/improvement
) - Make your changes and commit them (
git commit -am 'Add new feature'
) - Push the changes to your branch (
git push origin feature/improvement
) - Create a pull request to the main repository with a detailed description of the changes made and the reason for the changes. (
git pull-request
)
- Clone the forked repository (
This project is licensed under the MIT License - see the LICENSE file for details.
For further information or inquiries, please contact the project maintainers:
- (mailto: [email protected])
- LinkedIn: Tamunonengiyeofori Kenn-Wariso
We would like to acknowledge the following resources and references that inspired and guided this project:
- Streamlit Documentation
- Pinecone Documentation
- OpenAI API Documentation
- Langchain Documentation
- Render Documentation
- Docker Documentation
- Azure Documentation
- GitHub Actions Documentation
- PY2PDF Documentation