This repository contains the code and data for the paper:
Lost in the Distance: Large Language Models Struggle to Capture Long-Distance Relational Knowledge
We expose the "Lost in the Distance" phenomenon, where the performance of large language models in capturing relational knowledge degrades significantly when the relational information is separated by noise, i.e., unrelated sentences that interfere with solving the task.
Ensure you have the following dependencies installed:
- Python 3.10.11 or above
- Required Python packages (can be installed via
requirements.txt
).
Due to upload constraints, only a subset of representative predictions
is included. We will release the full set of predictions
via Google Drive. All log
are included in this repository. You can run the following script to reproduce all the figure
from the paper:
cd script
python3 plot.py
To reproduce the main experiments, you can run the following scripts:
- Main Experiment:
bash lost_in_the_distance.sh
bash lost_in_the_distance_sonnet.sh
bash no_distance_no_degradation.sh
- Ablation Study:
bash ablation_study_noise.sh
bash ablation_study_task.sh
revname
: Name2Descriptionrevcause
: Cause2Effectrevparent
: Parent2Childqa
: ABqna
: ANBqnna
: ANNBqnnna
: ANNNB