LLMSurvey/Experiments/ToolManipulation at main · RUCAIBox/LLMSurvey

History

Name		Name	Last commit message	Last commit date
parent directory ..
.idea		.idea
Gorilla		Gorilla
HotPotQA		HotPotQA
README.md		README.md

README.md

Datasets

The paths of these two datasets:

Gorilla: ./Gorilla/eval/eval-data/questions/
HotPotQA: ./HotPotQA/data/

Requirements

torch
transformers
openai
anthropic
tree_sitter

Note that Falcon LLMs require PyTorch 2.0 for use with transformers!

Run

Gorilla

1. Getting Responses

LLM APIs:

cd ./Gorilla/eval
python get_llm_responses.py --model gpt-3.5-turbo --api_key $API_KEY --output_file gpt-3.5-turbo_torchhub_0_shot.jsonl --question_data eval-data/questions/torchhub/questions_torchhub_0_shot.jsonl --api_name torchhub

Open source models:

cd ./Gorilla/eval
python get_hf_responses.py --model_path $MODEL_PATH --question_file eval-data/questions/torchhub/questions_torchhub_0_shot.jsonl --device cuda --answer_file torchhub_0_shot.jsonl --api_name torchhub

2. Evaluate the Response

cd ./Gorilla/eval/eval-scripts
python ast_eval_th.py --api_dataset ../../data/api/torchhub_api.jsonl --apibench ../../data/apibench/torchhub_eval.json --llm_responses gpt-3.5-turbo_torchhub_0_shot.jsonl

HotPotQA

1. LLM APIs

text-davinci-002 or text-davinci-003:

cd ./HotPotQA
python hotpotqa.py --model text-davinci-002 --api_key $API_KEY

gpt-3.5-turbo:

cd ./HotPotQA
python hotpotqa-chat.py --model gpt-3.5-turbo --api_key $API_KEY

claude:

cd ./HotPotQA
python hotpotqa-claude.py --model claude-v1.3 --api_key $API_KEY

2. Open source models

cd ./HotPotQA
python hotpotqa-hf.py --model-path $MODEL_PATH --device cuda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ToolManipulation

ToolManipulation

README.md

Datasets

Requirements

Run

Gorilla

1. Getting Responses

2. Evaluate the Response

HotPotQA

1. LLM APIs

2. Open source models

Files

ToolManipulation

Directory actions

More options

Directory actions

More options

Latest commit

History

ToolManipulation

Folders and files

parent directory

README.md

Datasets

Requirements

Run

Gorilla

1. Getting Responses

2. Evaluate the Response

HotPotQA

1. LLM APIs

2. Open source models