We provide the processed dataset in folder dataset/
.
You have to download the checkpoints in hugging face version, and save them in folder model/
with the model name as the sub-folder name.
Here are the main environmental configurations
python==3.9.13
torch==1.12.1
transformers==4.29.1
datasets==2.10.1
sympy==1.11.1
openai
Specially, for running falcon on our code, you need pytorch 2.x version instead of pytorch 1.x.
torch==2.0.1
You can running the scripts in folder scripts/
to evaluate the LLMs and reproduce our experiment:
bash scripts/run_eval_color.sh
bash scripts/run_eval_penguins.sh
bash scripts/run_eval_gsm8k.sh
bash scripts/run_eval_math.sh
To better evaluate the results from prediction of LLMs with the golden labels, we use evaluate.py
to clean the results from LLMs and evaluate the accuracy. You can find the details in the corresponding file.