Classifying tweets with large language models with zero- and few-shot learning with custom and generic prompts, as well as supervised learning algorithms for comparison.
F1-scores & Accuracies | Precision-Recall |
---|---|
Install all requirements for the LLM classification script.
pip install -r requirements.txt
NB: This will only install a minimal set of requirements to create figures for reproducability sake with the code below. A more complete requirements file for running the full pipeline can be found in configs.
The repo contains a CLI script llm_classification.py
.
You can use it for running arbitrary classification tasks in .tsv
or .csv
files with Large Language models from either
HuggingFace or OpenAI.
If you intend to use OpenAI models, you will have to specify your API key and ORG as environment variables.
export OPENAI_API_KEY="..."
export OPENAI_ORG="..."
The script has one command-line argument, namely a config file of the following format:
[paths]
in_file="labelled_data.csv"
out_dir="predictions/"
[system]
seed=0
device="cpu"
[model]
name="google/flan-t5-base"
task="few-shot"
[inference]
x_column="raw_text"
y_column="exemplar"
n_examples=5
If you intend to use a custom prompt for a given model, you can save it in a txt file and add its path to the
paths
section of the config.
[paths]
in_file="labelled_data.csv"
out_dir="predictions/"
prompt_file="custom_prompt.txt"
If you want to use hand-selected examples for few-shot learning, pass along a subset of the original data int the paths section of the config. Examples have to be in the same format as the data.
[paths]
in_file="labelled_data.csv"
out_dir="predictions/"
examples="examples.csv"
You can run the CLI like this:
python3 llm_classification.py "config.cfg"
- Paths:
- in_file:
str
- Path to input file, either.csv
or.tsv
- out_dir:
str
- Output directory. The script creates one if not already there.
- in_file:
- System:
- seed:
int
- Random seed for selecting few-shot examples. Is ignored whentask=="zero-shot"
- device:
str
- Device to run inference on. Change tocuda:0
if you want to run on GPU.
- seed:
- Model:
- name:
str
- Name of the model from OpenAI or HuggingFace. - task:
{"few-shot", "zero-shot"}
- Indicates whether zero-shot or few-shot inference should be run.
- name:
- Inference:
- x_column:
str
- Name of independent variable in the table. - y_column:
str
- Name of dependent variable in the table. - n_examples:
int
- Number of examples to give to few-shot models. Is ignored whentask=="zero-shot"
- x_column:
For ease of use we have developed a script that generates predictions for all OpenAI models in one run. We did this, because OpenAI inference can run on low performance instances, as such it isn't a problem if it takes a long time to run. Additionally since all instances access the same API, and there are rate limits, we could not start multiple instances and run them in parallel.
Paths in this script are hardcoded and you might need to adjust it for personal use.
python3 run_gpt_inference.py
For supervised models we made a separate script. This includes running and evaluating Glove-200d with logistic regression and finetuning DistilBert for classification.
This script requires different requirements, therefore you should install these from the appropriate file:
pip install -r supervised_requirements.txt
Paths in this script are hardcoded and you might need to adjust it for personal use.
python3 supervised_classification.py
This will output a table with predictions added to the out_dir
folder in the config.
The file name format is as follows:
f"predictions/{task}_pred_{column}_{model}.csv"
Each table will have a pred_<y_column>
and also a train_test_set
column that is labelled train
for all examples included in the prompt for few-shot
learning and test
everywhere else.
To evaluate the performance of the model(s), you can run the CLI evaluation.py
script. It has two command line arguments: --in_dir and --out_dir. These, respectively, refer to the folder in which the predictions from the llm_classification.py script has been saved (i.e., your predictions folder), and the folder where the classification report(s) should be saved.
--in_dir defaults to 'predictions/' and --out_dir defaults to 'output/' (which is a folder that is created if it does not exist already)
It can be run as follows:
python3 evaluation.py --in_dir "your/data/path" --out_dir "your/out/path"
It expects the output file(s) from llm_classification.py
in the specified file name format and placement.
It will output two files to the specified out folder:
- a txt file with the classification report for the test data for each of the files in the --in_dir folder.
- a csv file with the same information as the txt file, but which can be used for plotting the results.
The plotting.py
script takes the csv-file produced by the evaluation script and makes three plots:
- acc_figure.png: The accuracy for each of the 8 models on each outcome (political, exemplar) in each task (zero-shot, few-shot) with each prompt type (generic, custom). It's split into four quadrants, with the left side being the exemplar column, the right being political, the upper line being custom prompts and the lower column being generic prompts.
- f1_figure.png: The f1-score for positive labels for each model in each task – again split into political and exemplar + generic and custom prompt.
- prec_rec_figure.png: Precision plotted against recall for each of the models, split into three rows and four columns. Rows indicate task (zero-shot, few-shot, supervised classification), columns indiciate label column (political, exemplar) and prompt type (generic, custom)
python3 plotting.py
These are all saved in a figures/ folder.