utility_judgments

GTI Benchmark

Download datsets with ground_truth evidence

Download NQ test data from test data of NQ; Dev data of HotpotQA from KILT; dev data of MSMACRO from msmarco and msm-qa

Dense retrieval

We directly use RocketQAv2 on wiki-based NQ and HotpotQA datsets and ADORE on web-based MSMARCO dataset.

Counterfactual passages (CP)

We use the entity substition and generation method

Highly relevant noisy passages (HRNP) and Weakly relevant noisy passages (WRNP)

Filter out results from existing retrievers that do not contain answers, and the reference is noisy passages

Candidate passages construction

We have also provided the final GTI benchmark, which you can download from link

GTU benchmark

We have also provided the final GTI benchmark, which you can download from link

Utility judgments of LLMs

Taking the testing of LlaMa 2-13B as an example, we demonstrated the use of four methods: pointwise, pairwise, list wise set, and list wise rank. If you want to test other models, you can directly replace them.

python llama2-point.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
utils		utils
README.md		README.md
candidate_passages		candidate_passages
llama2-13b-listwise-set.py		llama2-13b-listwise-set.py
llama2-13b-pairwise.py		llama2-13b-pairwise.py
llama2-13b-rank.py		llama2-13b-rank.py
llama2-point.py		llama2-point.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

utility_judgments

GTI Benchmark

Download datsets with ground_truth evidence

Dense retrieval

Counterfactual passages (CP)

Highly relevant noisy passages (HRNP) and Weakly relevant noisy passages (WRNP)

Candidate passages construction

GTU benchmark

Utility judgments of LLMs

About

Releases

Packages

Languages

ict-bigdatalab/utility_judgments

Folders and files

Latest commit

History

Repository files navigation

utility_judgments

GTI Benchmark

Download datsets with ground_truth evidence

Dense retrieval

Counterfactual passages (CP)

Highly relevant noisy passages (HRNP) and Weakly relevant noisy passages (WRNP)

Candidate passages construction

GTU benchmark

Utility judgments of LLMs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages