Skip to content

[ACL 2024] Making Long-Context Language Models Better Multi-Hop Reasoners

License

Notifications You must be signed in to change notification settings

LaVi-Lab/LongContextReasoner

Repository files navigation

Making Long-Context Language Models Better Multi-Hop Reasoners

This repository contains the code and data for "Making Long-Context Language Models Better Multi-Hop Reasoners" (ACL 2024). In this paper, we introduce Reasoning with Attributions, a prompting technique to improve the multi-hop reasoning of long-context language models. This work also collects attribution annotations for MuSiQue, a popular multi-hop reasoning dataset, to facilitate the future research.

MuSiQue-Attribute Dataset

MuSiQue-Attribute is a subset of the original MuSiQue dataset with additional attribution annotations. There are 1,358 training examples in MuSiQue-Attribute. The data follows the same format of MuSiQue, with an additional reasoning_steps field. Below is an example of reasoning_steps:

{
    ...,
    "reasoning_steps": [
        {
            "paragraphs": [
                {
                    "title": "CIMI-FM",
                    "text_substring": "She Did It"
                }
            ],
            "cot_sent": "The performer of \"She Did It\" is Eric Carmen",
        },
        {
            "paragraphs": [
                {
                    "title": "CIMI-FM",
                    "text_substring": "The Definitive Collection is a 1997 greatest hits compilation album of all the singles released by Cleveland, Ohio singer-songwriter Eric Carmen"
                }
            ],
            "cot_sent": "Eric Carmen was born in Cleveland, Ohio",
        },
        {
            "paragraphs": [
                {
                    "title": "Quebec Winter Carnival",
                    "text_substring": "Cleveland is a suburb of Chicago, located southwest of the city. It shares borders with the city in two areas, but is surrounded mostly by other suburbs"
                }
            ],
            "cot_sent": "The county that shares a border with Cuyahoga County, where Cleveland is located, is Lake County",
        },
    ]
}

The dataset is available at assets/MuSiQue-Attribute.zip.

Reproduction

To reproduce our LoRA fine-tuned model results, we provide the following commands to train & test the model. Our code is based on FastChat.

Installation

git clone https://github.com/lm-sys/FastChat.git
cd FastChat
git checkout 722ab0299fd10221fa4686267fe068a688bacd4c
pip install --upgrade pip  # enable PEP 660 support
pip install -e ".[model_worker,llm_judge]"
pip install pytablewriter rouge_score nltk rapidfuzz jsonnet
cd ..

Data Preparation

First, you should obtain all raw data for fine-tuning and evaluation:

  1. Download the Alpaca-52K instruction tuning data from https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json and save it to data/alpaca_data.json.
  2. Unzip MuSiQue-Attribute data into MuSiQue-Attribute/train.jsonl.
    unzip assets/MuSiQue-Attribute.zip
  3. Download the subsampled dev & test sets from IRCoT:
    git clone https://github.com/StonyBrookNLP/ircot.git
    ./ircot/download/processed_data.sh
    ./ircot/download/raw_data.sh

Then, run the following command to convert Alpaca-52K to FastChat format.

python -m fastchat.data.convert_alpaca --in data/alpaca_data.json --out data/alpaca_data_fschat.json

Finally, run the following commands to get our fine-tuning data.

python attach_multihop_train.py --original-data data/alpaca_data_fschat.json --random-n 7200 --multihop-data MuSiQue-Attribute/train.jsonl --template prompts/ao.json --max-context-per-instance 2 --attached-data data/alpaca-7200-musique-ao-2.json
python attach_multihop_train.py --original-data data/alpaca-7200-musique-ao-2.json --multihop-data MuSiQue-Attribute/train.jsonl --template prompts/cot.json --max-context-per-instance 2 --attached-data data/alpaca-7200-musique-ao-2-cot-2.json
python attach_multihop_train.py --original-data data/alpaca-7200-musique-ao-2-cot-2.json --multihop-data MuSiQue-Attribute/train.jsonl --template prompts/coc.json --max-context-per-instance 2 --attached-data data/alpaca-7200-musique-ao-2-cot-2-coc-2.json
python attach_multihop_train.py --original-data data/alpaca-7200-musique-ao-2-cot-2-coc-2.json --multihop-data MuSiQue-Attribute/train.jsonl --auxiliary-tasks quotation_identification_all:prompts/qia.json --max-context-per-instance 1 --attached-data data/alpaca-7200-musique-ao-2-cot-2-coc-2-qia-1.json

For the evaluation data, run the following command to convert the subsampled, preprocessed MuSiQue test set data to LLM Judge format.

python convert_data_format.py --raw-datadir raw_data/musique --annotation-dir ircot/prompt_generator/data_annotations --subsampled-path processed_data/musique/test_subsampled.jsonl --output-dir data/musique
python convert_multihop_test_to_llm_judge.py --raw-data data/musique/processed_test_subsampled.jsonl --bench-name musique-coc --template prompts/coc.json
python convert_multihop_test_to_llm_judge.py --raw-data data/musique/processed_test_subsampled.jsonl --bench-name musique-cot --template prompts/cot.json
python convert_multihop_test_to_llm_judge.py --raw-data data/musique/processed_test_subsampled.jsonl --bench-name musique-ao --template prompts/ao.json

Test data will be stored into fastchat/llm_judge/data/{bench-name}.

Fine-tuning

You can use the following command to replicate our fine-tuned model on 8 NVIDIA A100 80G:

bash scripts/train.sh data/alpaca-7200-musique-ao-2-cot-2-coc-2-qia-1.json <MODEL_NAME>

We also release the model weight of AttrLoRA in assets/AttrLoRA.zip:

unzip assets/AttrLoRA.zip -d output 

Evaluation

To evaluate the model on MuSiQue, run the command:

bash eval.sh <MODEL_NAME>

If you want to use our released model for inference, please use AttrLoRA as the model name in the script after unzipping our model weight archive.

Citation

Please cite our paper if you use our data, code or model in your work:

@inproceedings{li2024making,
   title={Making Long-Context Language Models Better Multi-Hop Reasoners},
   author={Li, Yanyang and Liang, Shuo and Lyu, Michael and Wang, Liwei},
   year={2024},
   booktitle={Annual Meeting of the Association for Computational Linguistics (ACL)},
}

About

[ACL 2024] Making Long-Context Language Models Better Multi-Hop Reasoners

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published