Link to hugging face model : airbnb-reviews-helpfulness-classifier-roberta-base
This model is an AirBnB reviews helpfulness classifier. It can predict the helpfulness, from most helpful (A) to least helpful (C) of the reviews on AirBnB website.
Our project fine-tuned FacebookAI/roberta-base for multi-class text (sequence) classification.
5000 samples are scraped from AirBnB website based on listing_id
from this Kaggle AirBnB Listings & Reviews dataset. Samples were translated from French to English language.
Training Set : 4560 samples synthetically labelled by GPT-4 Turbo. Cost was approximately $60.
Test/Evaluation Set : 500 samples labelled manually by two groups (each group labelled 250 samples), majority votes applies. A scoring rubrics (shown below) is used for labelling.
hyperparameters = {'learning_rate': 3e-05,
'per_device_train_batch_size': 16,
'weight_decay': 1e-04,
'num_train_epochs': 4,
'warmup_steps': 500}
We trained our model on Colab Pro which costed us approximately 56 computing units.
This fine-tuned roberta-based model is a text classifier to predict the helpfulness of AirBnB reviews.
Collaborators: Li Hui Cham, Nicholas Wong, Isaac Sparrow, Christopher Arraya, Lei Zhang, Leonard Yang
Credit to my wonderful teammate Li Hui for organizing our work