This project aims to make DeepMind's Language Perceiver easily usable for Multiclass Classification.
Table of Contents
- Clone the repo
git clone https://github.com/DvdNss/nlp-perceiver
- Install requirements
pip install -r requirements.txt
data/
: contains torch data filesmodel/
: contains modelsresource/
: contains readme imagessource/
: contains main scriptsdatabuilder.py
: loads, transforms and saves datasetstrain.py
: training scriptmapping.py
: mapping functionsevaluate.py
: evaluation scriptpipeline.py
: model pipeline (inference)inference_example.py
: inference use case
app.py
: streamlit app script
- Set correct mapping functions in
source/mapping.py
for a given dataset
# Map inputs
def map_inputs(row: dict):
"""
Map inputs with a given format.
:param row: dataset row
:return:
"""
return row['text']
def map_targets(labels: List[int]):
"""
Map targets with a given format.
:param labels: list of labels
:return:
"""
targets = [0] * 28
for label in labels:
targets[label] = 1
return {'targets': targets}
- Build the torch files using
source/databuilder.py
script
python source/databuilder.py --dataset go_emotions --split train+validation --output_dir data --max_size max_size
Once the script stops running, there should be a .pt file in the
output_dir
for each split you selected.
- Train your model using
source/train.py
script
python source/train.py --train_data train_data --validation_data validation_data --batch_size batch_size --lr lr --epochs epochs --output_dir output_dir
A model will be saved in
output_dir
each epoch, which will be named as :
output_dir/perceiver-e<epoch>-acc<eval_acc>.pt
.
- Evaluate your model using
source/evaluate.py
script
python source/evaluate.py --model model_path --validation_data validation_data --batch_size batch_size
- Inference using the
source/pipeline.py
script (see use case ininference_example.py
)
from pipeline import MultiLabelPipeline, inputs_to_dataset
model_path = '../model/perceiver-e2-acc0.pt'
# Load pipeline
pipeline = MultiLabelPipeline(model_path=model_path)
# Build a little dataset
inputs = ['This this a test.', 'Another test.', 'The final test.']
# Make inference
outputs = pipeline(inputs_to_dataset(inputs), batch_size=3)
print(outputs)
- Finally, run streamlit app
streamlit run app.py
David NAISSE - @LinkedIn - [email protected]