Skip to content
/ llm-qlm Public

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

License

Notifications You must be signed in to change notification settings

ielab/llm-qlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-QLM

The official repository for paper Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking, Shengyao Zhuang, Bing Liu, Bevan Koopman, and Guido Zuccon, EMNLP2023 finding.

Requriements

  • Python 3.10
  • pyserini 0.21.0
  • transformers 4.29.1
  • ranx 0.3.10

LLM-QLM inference

The results in our paper can be reproduced by simply running the following commands. All the model and data files will be automatically downloaded and cached in the ~/cache directory.

HF_MODEL=huggyllama/llama-7b
RUN_NAME=llama-7b
DATASET=trec-covid  # choose from trec-covid, dbpedia-entity, fiqa, robust04

python -m torch.distributed.run \
    --nproc_per_node $GPUS_PER_NODE --nnodes $SLURM_NNODES --node_rank $SLURM_PROCID \
    --master_addr $MASTER_ADDR --master_port $MASTER_PORT run.py \
    --model_name_or_path ${HF_MODEL} \
    --index beir-v1.0.0-${DATASET}.flat \
    --topics beir-v1.0.0-${DATASET}-test \
    --output run.beir-bm25-${RUN_NAME}.${DATASET}.txt \
    --output_format trec \
    --output_dir runs/${DATASET} \
    --hits 100 \
    --remove_query True \
    --per_device_eval_batch_size 6 \
    --eval_accumulation_steps 2 \
    --dataloader_drop_last False \
    --bf16 True \
    --query_max_length 128 \
    --doc_max_length 512 \
    --save_first_stage_run True \
    --in_context False \
    --cache_dir cache
#--in_context True
#--deepspeed ds_config_s3.json \


python3 fuse.py \
--run1 runs/${DATASET}/run.bm25.txt \
--run2 runs/${DATASET}/run.beir-bm25-${RUN_NAME}.${DATASET}.txt \
--weight1 0.2 --weight2 0.8 \
--output runs/${DATASET}/run.beir-bm25-${RUN_NAME}.${DATASET}.txt.fuse

python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 -m recall.100 beir-v1.0.0-${DATASET}-test \
  runs/${DATASET}/run.beir-bm25-${RUN_NAME}.${DATASET}.txt

python -m pyserini.eval.trec_eval \
  -c -m ndcg_cut.10 -m recall.100 beir-v1.0.0-${DATASET}-test \
  runs/${DATASET}/run.beir-bm25-${RUN_NAME}.${DATASET}.txt.fuse
  • Simply change the HF_MODEL to the huggingface model name that listed in the paper Appendix A for testing other LLMs.
  • Our code supports multi-gpu inference, as well as DeepSpeed zero3. Adding --deepspeed ds_config_s3.json if your gpu does not have enough memory.
  • The --in_context flag is used to control whether to use QBG few-shot inference or not.

BEIR nDCG@10 Results table

fusion = wsum, 0.2 * BM25 + 0.8 * LLM

Model trec-covid dbpedia-entity fiqa robust04 avg
BM25 (pyserini) 0.5947 0.3180 0.2361 0.4070 0.3890
QLM (pyserini) 0.5081 0.2954 0.2053 0.4067 0.3539
Contriever 0.2732 0.2916 0.2449 0.3155 0.2913
Contriever (msmarco tuned) 0.5964 0.4128 0.3293 0.4729 0.4529
HyDE 0.5824 0.3715 0.2661 0.4183 0.4096
SPLADE 0.7109 0.4416 0.3514 0.4581 0.4905
DRAGON+ 0.7590 0.4170 0.3560 0.4790 0.5028
monoT5-3b (zero-shot) 0.4385 0.1405 0.0380 0.1425 0.1900
BM25 + monoT5-3b (zero-shot) 0.5215 0.2712 0.1720 0.2423 0.3018
monoT5-3b (msmarco tuned) 0.7983 0.4480 0.4596 0.5620 0.5670
BM25 + monoT5-3b (msmarco tuned) 0.6634 0.4455 0.4152 0.5506 0.5187
monoT5-3b-InPars-v2 (msmarco tuned) 0.8375 0.4655 0.4606 0.5851 0.5872
BM25 + monoT5-3b-InPars-v2 (msmarco tuned) 0.8209 0.4550 0.4353 0.5397 0.5627
monoFlanT5-3b (zero-shot) 0.6972 0.3236 0.3860 0.5381 0.4862
BM25 + monoFlanT5-3b (zero-shot) 0.7365 0.3826 0.3940 0.5441 0.5143
QLM-T5 0.7140 0.3803 0.3899 0.4770 0.4903
BM25 + QLM-T5 0.7169 0.4058 0.3839 0.4980 0.5012
------------------------------------------- ------------ ---------------- -------- ---------- --------
T5-3b 0.4718 0.1764 0.1319 0.3258 0.2765
BM25 + T5-3b 0.4866 0.2185 0.1617 0.3799 0.3117
T5-11b 0.6587 0.2843 0.3023 0.2405 0.3715
BM25 + T5-11b 0.6792 0.3372 0.3198 0.2740 0.4026
T0-3b 0.6999 0.3624 0.4218 0.4774 0.4904
BM25 + T0-3b 0.7162 0.3884 0.4144 0.5009 0.5050
T0-11b 0.7212 0.3615 0.4530 0.4718 0.5019
BM25 + T0-11b 0.7387 0.3873 0.4375 0.4969 0.5151
FlanT5-3b 0.7199 0.3702 0.4173 0.4702 0.4944
BM25 + FlanT5-3b 0.7112 0.3966 0.4116 0.5004 0.5050
FlanT5-11b 0.7505 0.3991 0.4486 0.5080 0.5267
BM25 + FlanT5-11b 0.7493 0.4168 0.4334 0.5244 0.5310
FlanT5-11b-yes-no 0.7251 0.3673 0.3926 0.5265 0.5029
BM25 + FlanT5-11b-yes-no 0.7452 0.4016 0.3996 0.5337 0.5200
LLAMA-7b-yes-no 0.4332 0.0798 0.0503 0.2106 0.1935
BM25 + LLAMA-7b-yes-no 0.5018 0.1557 0.1009 0.2607 0.2546
LLAMA-7b 0.6802 0.3748 0.4175 0.5159 0.4971
BM25 + LLAMA-7b 0.6944 0.3988 0.4153 0.5357 0.5111
Alpaca-7b 0.6380 0.3015 0.2988 0.3967 0.4089
BM25 + Alpaca-7b 0.6705 0.3499 0.3368 0.4456 0.4507
LLAMA-13b 0.6787 0.3486 0.4217 0.5214 0.4926
BM25 + LLAMA-13b 0.6976 0.3763 0.4177 0.5416 0.5083
Alpaca-7b-yes-no 0.5297 0.1657 0.0562 0.2473 0.2497
BM25 + Alpaca-7b-yes-no 0.5991 0.2934 0.1214 0.2963 0.3276
StableLM-7b 0.7285 0.3358 0.3243 0.4558 0.4611
BM25 + StableLM-7b 0.7403 0.3718 0.3408 0.4827 0.4839
falcon-7b 0.7314 0.3949 0.4170 0.4920 0.5088
BM25 + falcon-7b 0.7329 0.4170 0.4125 0.5247 0.5218
falcon-7b-yes-no 0.3818 0.0898 0.0320 0.1068 0.1526
BM25 + falcon-7b-yes-no 0.4163 0.1772 0.0685 0.1533 0.2038
falcon-7b-instruct 0.6300 0.3541 0.3110 0.4793 0.4436
BM25 + falcon-7b-instruct 0.6676 0.3823 0.3341 0.5069 0.4727
falcon-40b 0.7268 0.3860 0.4353 0.5033 0.5129
BM25 + falcon-40b 0.7520 0.4098 0.4311 0.5310 0.5310
falcon-40b-instruct 0.6809 0.3789 0.4049 0.4837 0.4871
BM25 + falcon-40b-instruct 0.7019 0.4054 0.4085 0.5130 0.5072
stable-vicuna-13b 0.7128 0.3580 0.3816 0.4802 0.4832
BM25 + stable-vicuna-13b 0.7177 0.3942 0.3907 0.5127 0.5038
------------------------------------------- ------------ ---------------- -------- ---------- --------
BM25 + HyDE 0.6981 0.4170 0.3094 0.4966 0.4803
BM25 + HyDE -> FlanT5-11b 0.7581 0.4257 0.4938 0.5352 0.5532
BM25 + HyDE -> FlanT5-11b fusion 0.7578 0.4623 0.4947 0.5659 0.5702
BM25 + HyDE -> FlanT5-11b-gbq 0.7583 0.4173 0.5032 0.5541 0.5582
BM25 + HyDE -> FlanT5-11b-gbq fusion 0.7721 0.4514 0.4973 0.5824 0.5758
BM25 + HyDE -> LLAMA-7b 0.7024 0.4064 0.4585 0.5385 0.5264
BM25 + HyDE -> LLAMA-7b fusion 0.7238 0.4539 0.4675 0.5744 0.5547
BM25 + HyDE -> LLAMA-7b-gbq 0.7699 0.4424 0.5078 0.5740 0.5735
BM25 + HyDE -> LLAMA-7b-gbq fusion 0.7780 0.4768 0.5036 0.5946 0.5883
BM25 + HyDE -> Falcon-7b 0.7390 0.4236 0.4500 0.5126 0.5313
BM25 + HyDE -> Falcon-7b fusion 0.7664 0.4611 0.4576 0.5510 0.5590
BM25 + HyDE -> Falcon-7b-gbq 0.7838 0.4390 0.4850 0.5631 0.5677
BM25 + HyDE -> Falcon-7b-gbq fusion 0.7856 0.4799 0.4858 0.5897 0.5853
BM25 + HyDE -> monoT5-3b-InPars-v2 0.8322 0.5300 0.5131 0.6448 0.6300
BM25 + HyDE -> monoT5-3b-InPars-v2 fusion 0.7019 0.4190 0.3095 0.4981 0.4821
------------------------------------------- ------------ ---------------- -------- ---------- --------
LLAMA-7b-finetuned-size100-10epoch 0.7836 0.3861 0.4093 0.4742 0.5133
BM25 + LLAMA-7b-finetuned-size100-10epoch 0.7817 0.4064 0.4052 0.4989 0.5231
LLAMA-7b-finetuned-size100-3epoch 0.7675 0.3829 0.4250 0.4935 0.5172
BM25 + LLAMA-7b-finetuned-size100-3epoch 0.7646 0.4002 0.4165 0.5169 0.5246
LLAMA-7b-4shots 0.7397 0.3929 0.4477 0.5218 0.5255
BM25 + LLAMA-7b-4shots 0.7614 0.4105 0.4355 0.5369 0.5361
LLAMA-7b-3shots-gbq 0.7572 0.3829 0.4314 0.5349 0.5266
BM25 + LLAMA-3shots-gbq 0.7471 0.4093 0.4255 0.5538 0.5339
LLAMA-13b-4shots 0.7343 0.3819 0.4501 0.5195 0.5215
BM25 + LLAMA-13b-4shots 0.7417 0.3991 0.4353 0.5455 0.5304
LLAMA-13b-3shots-gbq 0.7460 0.3820 0.4348 0.5428 0.5264
BM25 + LLAMA-13b-3shots-gbq 0.7454 0.4059 0.4305 0.5616 0.5359

About

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages