Never Miss A Beat: Efficient Long Context Extension via Middle-Focused Positional Encoding

Updates

(2024.09.26) Our Paper have been accepted by NeurIPS 2024.
(2024.06.11) Paper Release on Arxiv.

🚀 Overview

We propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices.

Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K).

To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the “Lost-in-the-Middle” problem faced by long-context LLMs.

Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with “Never Miss A Beat”.

⚙️ Installation

# clone project
git clone [email protected]:wutong4012/CREAM.git
cd CREAM

# create conda environment
conda create -n cream python=3.9
conda activate cream

# install requirements
pip install -r requirements.txt
pip install flash_attn-2.6.3+cu123torch2.2cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

💡 How to run

You can download all the finetune data and evaluation data from huggingface

Train model

bash scripts/run_CREAM.sh 8 linear llama2 5946 CREAM

bash scripts/run_CREAM_chat.sh 8 linear llama2_chat 5946 CREAM

Evaluate model

bash scripts/eval_longchat_lines.sh 8 linear llama2 CREAM 1000

bash scripts/eval_lost_in_the_middle.sh 8 linear llama2 CREAM 1000

bash scripts/eval_needle.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_longbench.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_ppl.sh 8 linear llama2 CREAM 1000

bash scripts/eval_long_ppl.sh 64 linear llama2 CREAM 1000

⚽ Evaluation Results

LongChat-Lines

Lost in the Middle

Needle in a Haystack

LongBench

Acknowledgement

Data / Code:

📜 Citation

Please cite our paper if you use CREAM in your work:

@misc{wu2024missbeatefficientrecipe,
      title={Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement}, 
      author={Tong Wu and Yanpeng Zhao and Zilong Zheng},
      year={2024},
      eprint={2406.07138},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.07138}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
scripts		scripts
src		src
README.md		README.md
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Never Miss A Beat: Efficient Long Context Extension via Middle-Focused Positional Encoding

Updates

🚀 Overview

⚙️ Installation

💡 How to run

⚽ Evaluation Results

Acknowledgement

📜 Citation

About

Releases

Packages

Languages

wutong4012/CREAM

Folders and files

Latest commit

History

Repository files navigation

Never Miss A Beat: Efficient Long Context Extension via Middle-Focused Positional Encoding

Updates

🚀 Overview

⚙️ Installation

💡 How to run

⚽ Evaluation Results

Acknowledgement

📜 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages