Skip to content

Latest commit

 

History

History
224 lines (195 loc) · 10.9 KB

README.md

File metadata and controls

224 lines (195 loc) · 10.9 KB

GPT_Ranker

This Repo Is For MSMARCO Document/Passage Reranking Task Using GPT-2/T5 Model


Related Links

Dataset

Models

Traditional Models

Tools


To Clone this repo

git clone --recurse-submodules [email protected]:ielab/GPT_Ranker.git

The Recall/MRR@10 evaluation for msmarco doc dev (5193 queries)

Model Name Top K Recall MRR@100
T5-Base Top 100
(Tuned BM25)
Top 200
(Tuned BM25)
Top 500
(Tuned BM25)
Top 1000
(Tuned BM25)
GPT-2 Top 100
(Tuned BM25)
Top 200
(Tuned BM25)
Top 500
(Tuned BM25)
Top 1000
(Tuned BM25)
BM25 Initial Retrieval
(Tuned k1=3.44 b=0.87)
N/A R@5 0.4024
R@10 0.4946
R@15 0.5640
R@20 0.6095
R@30 0.6649
R@100 0.7874
R@200 0.8373
R@500 0.8850
R@1000 0.9187
0.27880910
MS MARCO Top 1000
(Provided Run)
N/A

The Recall/MRR@10 evaluation for msmarco passage dev small (6980 queries)

Model Name Top K Recall MRR@10
T5-Base Top 100
(Tuned BM25)
R@5 0.4294
R@10 0.5321
R@15 0.5778
R@20 0.6076
R@30 0.6354
R@100 0.6701
R@200 0.6701
R@500 0.6701
R@1000 0.6701
0.28134977
Top 200
(Tuned BM25)
R@5 0.4413
R@10 0.5496
R@15 0.6037
R@20 0.6371
R@30 0.6719
R@100 0.7333
R@200 0.7383
R@500 0.7383
R@1000 0.7383
0.286065970
Top 500
(Tuned BM25)
R@5 0.4502
R@10 0.5630
R@15 0.6195
R@20 0.6561
R@30 0.6987
R@100 0.7812
R@200 0.8040
R@500 0.8116
R@1000 0.8116
0.2904781
Top 1000
(Tuned BM25)
R@5 0.4553
R@10 0.5708
R@15 0.6318
R@20 0.6700
R@30 0.7148
R@100 0.8093
R@200 0.8390
R@500 0.8553
R@1000 0.8573
0.2920570
Top 1000
(Tuned BM25 +
Lowercase P)
R@5 0.4546
R@10 0.5733
R@15 0.6346
R@20 0.6679
R@30 0.7133
R@100 0.8095
R@200 0.8384
R@500 0.8552
R@1000 0.8573
0.2945217
Top 1000
(docTTTTTquery init run)
R@5 0.4665
R@10 0.5857
R@15 0.6537
R@20 0.6968
R@30 0.7485
R@100 0.8643
R@200 0.9089
R@500 0.9404
R@1000 0.9471
0.29763735
GPT-2 Top 100
(Tuned BM25)
Top 200
(Tuned BM25)
Top 500
(Tuned BM25)
Top 1000
(Tuned BM25)
BM25 Initial Retrieval
(Tuned k1=0.82 b=0.68)
N/A R@5 0.2944
R@10 0.3916
R@15 0.4459
R@20 0.4842
R@30 0.5307
R@100 0.6701
R@200 0.7383
R@500 0.8116
R@1000 0.8573
0.187412
MS MARCO Top 1000
(Provided Run)
N/A R@5 0.0093
R@10 0.0150
R@15 0.0196
R@20 0.0224
R@30 0.0270
R@100 0.1026
R@200 0.1641
R@500 0.3893
R@1000 0.8140
0.00456946
docTTTTTquery Top 1000
(40 Samples)
N/A R@5 0.4244
R@10 0.5411
R@15 0.6033
R@20 0.6484
R@30 0.6987
R@100 0.8190
R@200 0.8688
R@500 0.9164
R@1000 0.9471
0.2767497

For example in passage task:

For query:

how many tables can sql server join

Our model ranks this at top 1:

id: 7485889
contents: How many tables can I have in 1 Sql Azure Database. I know in Sql Server, Tables per database Limited by number of objects in a database, Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of objects in a database cannot exceed 2,147,483,647..

The actual relevant document is (we rank this at 949, BM25 rank this at 300):

id: 7485894
contents: SQL JOIN. A JOIN clause is used to combine rows from two or more tables, based on a related column between them. Let's look at a selection from the Orders table: Then, look at a selection from the Customers table: Notice that the CustomerID column in the Orders table refers to the CustomerID in the Customers table. The relationship between the two tables above is the CustomerID column. Then, we can create the following SQL statement (that contains an INNER JOIN), that selects records that have matching values in both tables: Example SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate

For query:

what does it mean when you dream about babies

Our model ranks this at top 1:

id: 6680536
contents: A baby in general. Dreams that include babies are positive signs. Dreaming about interacting with a baby or simply seeing a baby in a dream can mean that pleasant surprises and fortuitous occurrences are about to occur in your life. This dream doesn't specify what, but something unexpectedly good is on your horizon.

The actual relevant document is (we rank this at 2, BM25 rank this at 993):

id: 7551052
contents: To see a baby in your dream signifies innocence, warmth and new beginnings. Babies symbolize something in your own inner nature that is pure, vulnerable, helpless and/or uncorrupted. If you dream that the baby is smiling at you, then it suggests that you are experiencing pure joy.

BM25 ranks this at top 1 (we rank this at 47):

id: 3347686
contents: Health related question in topics Psychology .We found some answers as below for this question What does it mean when you dream about somebody having a baby,you can compare them. When you dream of someone having a baby, it means new beginnings. It also means hidden potential related to you is being released. [ Source: http://www.chacha.com/question/what-does-it-mean-when-you-dream-about-somebody-having-a-baby ] More Answers to What does it mean when you dream about somebody having a baby.

Comparison with BM25 model:

Rank Comparison


File Structure

The file structure should be same as this:

GPT_Ranker/
+--- anserini/
+--- data/
|    +--- doc_rerank/
|    |    +--- collection_jsonl/
|    |    |    +--- docs00.json
|    |    |    +--- docs01.json
|    |    |    +--- docs02.json
|    |    |    +--- docs03.json
|    |    |    +--- docs04.json
|    |    |    +--- docs05.json
|    |    |    +--- docs06.json
|    |    +--- index/
|    |    |    +--- lucene-index.msmarco-doc.pos+docvectors+rawdocs/
|    |    |    |    +--- HERE CONTAINS THE ANSERINI MSMARCO DOCUMENT INDEX
|    |    +--- qrels/
|    |    |    +--- qrels.msmarco-doc.dev.txt
|    |    +--- query/
|    |    |    +--- doc-msmarco-dev-queries.json
|    |    |    +--- doc-msmarco-test2020-queries.json
|    |    +--- formatted_run.msmarco-doc.dev.bm25.tuned.txt
|    |    +--- formatted_run.msmarco-doc.test.bm25.tuned.txt
|    +--- pass_rerank
|    |    +--- collection_jsonl/
|    |    |    +--- docs00.json
|    |    |    +--- docs01.json
|    |    |    +--- docs02.json
|    |    |    +--- docs03.json
|    |    |    +--- docs04.json
|    |    |    +--- docs05.json
|    |    |    +--- docs06.json
|    |    |    +--- docs07.json
|    |    |    +--- docs08.json
|    |    +--- index/
|    |    |    +--- lucene-index-msmarco/
|    |    |    |    +--- HERE CONTAINS THE ANSERINI MSMARCO PASSAGE INDEX
|    |    +--- qrels/
|    |    |    +--- qrels.dev.small.tsv
|    |    |    +--- pass-qrels.msmarco-dev.full.txt
|    |    +--- query/
|    |    |    +--- pass-queries.dev.json
|    |    |    +--- pass-queries.eval.json
|    |    |    +--- pass-query-dev.small.json
|    |    +--- run.msmarco-passage.dev.small.tsv
|    |    +--- run.msmarco-passage.dev.full.tsv
|    |    +--- run.msmarco-passage.dev.eval.tsv
|    +--- pass_train/
|    |    +--- doc_query_pairs.train.tsv
|    +--- doc_train/
|    |    +--- doc_query_pairs.train.tsv
+--- logs/
|    +--- gpt2/
|    |    +--- CONTAIN GPT2 TRAINING LOGS
|    +--- t5/
|    |    +--- CONTAIN T5 TRAINING LOGS
+--- model/
|    +--- gpt-2/
|    |    +--- CONTAIN PRETRAINED GPT-2 (UNTUNED)
|    +--- t5-base/
|    |    +--- CONTAIN T5 FINE TUNED ON MSMARCO PASSAGE
|    +--- t5-base-tuned/
|    |    +--- tuned_on_doc/
|    |    |    +--- CONTAIN T5 FINE TUNED ON MSMARCO DOCUMENT
|    +--- gpt-2-tuned/
|    |    +--- tuned_on_pass/
|    |    |    +--- CONTAIN GPT-2 FINE TUNED ON MSMARCO PASSAGE
|    |    +--- tuned_on_doc/
|    |    |    +--- CONTAIN GPT-2 FINE TUNED ON MSMARCO DOCUMENT
+--- result/
|    +--- doc_rerank/
|    |    +--- gpt2/
|    |    +--- t5/
|    +--- pass_rerank/
|    |    +--- gpt2/
|    |    +--- t5/
+--- notes/
+--- scripts/
|    +--- fine_tuning.py
|    +--- passage_msmarco_eval.py
|    +--- doc_msmarco_eval.py
|    +--- SOME OTHER SCRIPTS
+--- config.json
+--- helper.py
+--- main.py
+--- middleware.py
+--- ranker.py
+--- anserini_retriever.py
+--- README.md
+--- SOME OTHER FILES