Trending Research Papers - Summary and AI Chat

Project Overview

This project is designed to automate the process of scraping trending research papers from paperswithcode.com, downloading the actual research papers in the arXiv URLs found in the metadata, and processing this information to generate insights and utility. The paper abstracts are presented on a Streamlit page for easy viewing, while topic modeling techniques are applied to generate relevant keywords for the set of papers Furthermore, the arXiv articles are embedded into a vector database, to facilitate advanced retrieval-augmented generation capabilities.

Features

Trending Research Paper Scraping: Automatically scrapes trending research papers from paperswithcode.com.
PDF Retrieval: Downloads research papers in PDF format from arXiv.
Streamlit Summaries: Writes summaries of the papers to a Streamlit page for interactive exploration.
Keyword Generation: Applies topic modeling to abstracts to generate relevant keywords.
Vector Database Embedding: Embeds articles into ChromaDB for enhanced retrieval capabilities.
Q&A on Research Papers: Leverages retrieval-augmented generation (RAG) for question answering based on the paper content.

Installation

Using python 3.10.13

git clone https://github.com/clmangham/SME_LMM.git
cd SME_LLM
pip install -r requirements.txt

Configuration

AN API key should be stored in a .env file in the root of the directory like so:

OPENAI_API_KEY="API KEY HERE"

Usage

# Pull and save data to /data
python get_data.py

# Run streamlit app
streamlit run main.py

Potential Improvements

Pull new data automatically on a regular basis (get_data.py)
Improve topic modeling to provide topics keywords for each abstract specifically (topic_modeling.py)
Experiment with different topic models
Add chat history to rag to create chatbot that can answer follow-up questions (rag.py)
- https://python.langchain.com/docs/use_cases/question_answering/chat_history
Experiment with different LLM models and how to best evaluate their performance, including prompt engineering, question generation for retrieval evaluation, and metrics such as LLM output relevance and latency:
Containerize and move to local server for development and deployment

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.streamlit		.streamlit
jupyter_notebooks		jupyter_notebooks
styles		styles
.gitignore		.gitignore
README.md		README.md
database.py		database.py
get_data.py		get_data.py
main.py		main.py
rag.py		rag.py
requirements.txt		requirements.txt
summaries.py		summaries.py
topic_modeling.py		topic_modeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trending Research Papers - Summary and AI Chat

Project Overview

Features

Installation

Configuration

Usage

Potential Improvements

About

Releases

Packages

Languages

clmangham/SME_LLM

Folders and files

Latest commit

History

Repository files navigation

Trending Research Papers - Summary and AI Chat

Project Overview

Features

Installation

Configuration

Usage

Potential Improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages