Skip to content

DvdNss/multiclass-classification-perceiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Multiclass Classification using DeepMind's Perceiver

About The Project

This project aims to make DeepMind's Language Perceiver easily usable for Multiclass Classification.

HuggingFace

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contact

Getting Started

Installation

  1. Clone the repo
git clone https://github.com/DvdNss/nlp-perceiver
  1. Install requirements
pip install -r requirements.txt

Usage

Structure

  • data/: contains torch data files
  • model/: contains models
  • resource/: contains readme images
  • source/: contains main scripts
    • databuilder.py: loads, transforms and saves datasets
    • train.py: training script
    • mapping.py: mapping functions
    • evaluate.py: evaluation script
    • pipeline.py: model pipeline (inference)
    • inference_example.py: inference use case
  • app.py: streamlit app script

Example

  1. Set correct mapping functions in source/mapping.py for a given dataset
# Map inputs
def map_inputs(row: dict):
    """
    Map inputs with a given format.

    :param row: dataset row
    :return:
    """

    return row['text']


def map_targets(labels: List[int]):
    """
    Map targets with a given format.

    :param labels: list of labels
    :return:
    """

    targets = [0] * 28
    for label in labels:
        targets[label] = 1

    return {'targets': targets}
  1. Build the torch files using source/databuilder.py script
python source/databuilder.py --dataset go_emotions --split train+validation --output_dir data --max_size max_size

Once the script stops running, there should be a .pt file in the output_dir for each split you selected.

  1. Train your model using source/train.py script
python source/train.py --train_data train_data --validation_data validation_data --batch_size batch_size --lr lr --epochs epochs --output_dir output_dir

A model will be saved in output_dir each epoch, which will be named as :
output_dir/perceiver-e<epoch>-acc<eval_acc>.pt.

  1. Evaluate your model using source/evaluate.py script
python source/evaluate.py --model model_path --validation_data validation_data --batch_size batch_size
  1. Inference using the source/pipeline.py script (see use case in inference_example.py)
from pipeline import MultiLabelPipeline, inputs_to_dataset

model_path = '../model/perceiver-e2-acc0.pt'

# Load pipeline
pipeline = MultiLabelPipeline(model_path=model_path)

# Build a little dataset
inputs = ['This this a test.', 'Another test.', 'The final test.']

# Make inference
outputs = pipeline(inputs_to_dataset(inputs), batch_size=3)
print(outputs)
  1. Finally, run streamlit app
streamlit run app.py

Contact

David NAISSE - @LinkedIn - [email protected]

About

DeepMind's Perceiver for Multiclass Emotion Classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages