Skip to content

Commit

Permalink
Merge pull request #91 from debby1103/main
Browse files Browse the repository at this point in the history
Replace invalid links
  • Loading branch information
huybery authored Nov 13, 2023
2 parents 64f4da7 + 33c94b5 commit fe86237
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions oltqa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ The PyTorch implementation of paper [Long-Tailed Question Answering in an Open W
cd LongTailQA
pip install -r r.txt
```
The raw dataset is available [here](https://drive.google.com/file/d/12-w1bMAevcXmjsl8DKcCrJe1kI-pyZTd/view?usp=share_link) to be placed in data_process/data
The raw dataset is available [here](https://drive.google.com/file/d/1yvWuMYKSEoeutA-o_1VviuD_lZKRorBP/view?usp=sharing) to be placed in data_process/data

# Construct Pareto Long-Tail subset of raw data
```bash
python gen_lt.py
```

# large PLM inference
# preprocessing: large PLM inference
## BM25 candidates
use [This Repo](https://github.com/OhadRubin/EPR) to select BM25 examples for PLM inference
use [This Repo](https://github.com/OhadRubin/EPR) to select BM25 examples for PLM inference (to construct a candidate pool for further selection)
```bash
python find_bm25.py output_path=$PWD/data/{compute_bm25_outfile} \
dataset_split=train setup_type={bm25_setup_type} task_name={dataset} +ds_size={ds_size} L={finder_L}
Expand All @@ -26,10 +26,10 @@ Install [GLM-10B](https://github.com/THUDM/GLM) or [GLM-130B](https://github.com
```bash
cd plm
bash ./install_glm.sh
bash ./run.sh ${input_file}
bash ./scripts/generate_block.sh \
config_tasks/model_blocklm_10B_chinese.sh
```


# Two-stage Training

## generate dataset for example selection
Expand All @@ -44,7 +44,7 @@ python gen_seltest.py
```bash
bash ./train_stage1.sh ${train batch size}
```
For a quickstart, pre-trained [bi-encoder](https://drive.google.com/file/d/1RRau7Y7PX2rv3CxVHK5FGM4aJPhsLbiv/view?usp=share_link) and [cross-encoder](https://drive.google.com/file/d/1YNi5TSBvo4eevdw7DPcLApX96LSJlu4p/view?usp=share_link) checkpoints are available.
For a quickstart, pre-trained [bi-encoder](https://drive.google.com/file/d/1j_i28_zvBuhcRE--Lr_PkIUPrYUZKB5O/view?usp=sharing) and [cross-encoder](https://drive.google.com/file/d/1S6Aa_8SSShz5EhwjTlsurGkH7gfhlfh5/view?usp=sharing) checkpoints are available.

## train and evaluate the framework
```bash
Expand All @@ -66,4 +66,4 @@ and
```bash
#w/o knowledge mining
bash ./ablationknowledge.sh ${train batch size} ${eval batch size} ${epoch}
```
```

0 comments on commit fe86237

Please sign in to comment.