From Words to Worth: Newborn Article Impact Prediction with LLM

Using LLM as Academic Article Impact Predictor

🚀 Update Log

240808 - Eerly Access
- We have released the Early Access version of our code！
241126 - V1.0 We’re thrilled to announce the end of Early Access and the official release of V1.0! ✨
- The codebase is now more organized and easier to navigate! 🧹
- Updated and streamlined README with detailed instructions for setup and usage. 💡
- Decoupling the dataset, more LoRa adapters weight download links, and more! 🔄
- Known Issues: The functionality for building the NAID dataset has not been tested on other machines, which may lead to potential issues. We plan to replace this function with a more powerful framefowk in our another codebase.

Introduction

This repository contains the official implementation for the paper "From Words to Worth: Newborn Article Impact Prediction with LLM". The tool is designed to PEFT the LLMs for the prediction of the future impact.

Quick Try (for most researchers)

First, pull the repo and type following commands in the console:

cd ScImpactPredict
pip install -r requirements.txt

To begin with default setting, you should request access and download the LLaMA-3 pretrain weights at huggingface official sites. Then, download the provided LLaMA-3 LoRA weights (runs_dir) here.

After that, modify the path to the model's weights in the demo.py file, and type python demo.py in the console.

Fine-tuning (to reproduce, optional)

For fine-tuning, you may manually modify the 'xxxForSequenceClassification' in the transformers package (see llama_for_naip/NAIP_LLaMA.py for more details). Or follow the instruction to use custom code.

Then, prepare train.sh bash file like below:

DATA_PATH="ScImpactPredict/NAID/NAID_train_extrainfo.csv"
TEST_DATA_PATH="ScImpactPredict/NAID/NAID_test_extrainfo.csv"

OMP_NUM_THREADS=1 accelerate launch offcial_train.py \
    --total_epochs 5 \
    --learning_rate 1e-4 \
    --data_path $DATA_PATH \
    --test_data_path $TEST_DATA_PATH \
    --runs_dir official_runs/LLAMA3 \
    --checkpoint  path_to_huggingface_LLaMA3

Finally, type sh train.sh in the console. Wating for the training ends~

Testing (to reproduce, optional)

Similar to fine-tuning, prepare test.sh as below:

python inference.py \
 --data_path ScImpactPredict/NAID/NAID_test_extrainfo.csv \
 --weight_dir path_to_runs_dir

Then, type sh test.sh.

Model Weights

We also offer the weights of other models for download.

LLMs	Size	MAE	NDCG	Mem	Download Link
Phi-3	3.8B	0.226	0.742	6.2GB	Download
Falcon	7B	0.231	0.740	8.9GB	Download
Qwen-2	7B	0.223	0.774	12.6GB	Download
Mistral	7B	0.220	0.850	15.4GB	Download
Llama-3	8B	0.216	0.901	9.4GB	Download

Compare with Previous Methods

With a few adjustments based on your specific needs, it should work fine. Since these models train very quickly (less than a few minutes on a single RTX 3080), we won’t be providing the trained weights.

Repo Structure Description

Folders like furnace, database, and tools are used for building the NAID and TKPD datasets. They have no direct connection to training or inference.

We are pretty confident in our methodology and experiments, and you should be able to achieve any of the performance reported in our paper within an acceptable margin.

BibTex

@article{Zhao2024NAIP,
  title={From Words to Worth: Newborn Article Impact Prediction with LLM},
  author={Penghai Zhao and Qinghua Xing and Kairan Dou and Jinyu Tian and Ying Tai and Jian Yang and Ming-Ming Cheng and Xiang Li},
  journal={ArXiv},
  year={2024},
  volume={abs/2408.03934},
  url={https://api.semanticscholar.org/CorpusID:271744831}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Words to Worth: Newborn Article Impact Prediction with LLM

Using LLM as Academic Article Impact Predictor

🚀 Update Log

Introduction

Quick Try (for most researchers)

Fine-tuning (to reproduce, optional)

Testing (to reproduce, optional)

Model Weights

Compare with Previous Methods

Repo Structure Description

We are pretty confident in our methodology and experiments, and you should be able to achieve any of the performance reported in our paper within an acceptable margin.

BibTex

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.idea		.idea
CACHE		CACHE
NAID		NAID
TKPD		TKPD
database		database
furnace		furnace
img		img
llama_for_naip		llama_for_naip
previous_methods		previous_methods
script		script
tools		tools
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
offcial_train.py		offcial_train.py
official_test.py		official_test.py
requirements.txt		requirements.txt

ssocean/NAIP

Folders and files

Latest commit

History

Repository files navigation

From Words to Worth: Newborn Article Impact Prediction with LLM

Using LLM as Academic Article Impact Predictor

🚀 Update Log

Introduction

Quick Try (for most researchers)

Fine-tuning (to reproduce, optional)

Testing (to reproduce, optional)

Model Weights

Compare with Previous Methods

Repo Structure Description

We are pretty confident in our methodology and experiments, and you should be able to achieve any of the performance reported in our paper within an acceptable margin.

BibTex

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages