- (2024.09.26) Our Paper have been accepted by NeurIPS 2024.
- (2024.06.11) Paper Release on Arxiv.
We propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices.
Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K).
To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the “Lost-in-the-Middle” problem faced by long-context LLMs.
Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with “Never Miss A Beat”.
# clone project
git clone [email protected]:wutong4012/CREAM.git
cd CREAM
# create conda environment
conda create -n cream python=3.9
conda activate cream
# install requirements
pip install -r requirements.txt
pip install flash_attn-2.6.3+cu123torch2.2cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
You can download all the finetune data and evaluation data from huggingface
Train model
bash scripts/run_CREAM.sh 8 linear llama2 5946 CREAM
bash scripts/run_CREAM_chat.sh 8 linear llama2_chat 5946 CREAM
Evaluate model
bash scripts/eval_longchat_lines.sh 8 linear llama2 CREAM 1000
bash scripts/eval_lost_in_the_middle.sh 8 linear llama2 CREAM 1000
bash scripts/eval_needle.sh 8 linear llama2_chat CREAM 100
bash scripts/eval_longbench.sh 8 linear llama2_chat CREAM 100
bash scripts/eval_ppl.sh 8 linear llama2 CREAM 1000
bash scripts/eval_long_ppl.sh 64 linear llama2 CREAM 1000
LongChat-Lines
Lost in the Middle
Needle in a Haystack
LongBench
Data / Code:
Please cite our paper if you use CREAM in your work:
@misc{wu2024missbeatefficientrecipe,
title={Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement},
author={Tong Wu and Yanpeng Zhao and Zilong Zheng},
year={2024},
eprint={2406.07138},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.07138},
}