This repository contains the code for the TUM Social Computing team at the GermEval 2022 shared task. Our SVR models were trained with Python 3.9.
- Clone this repository
- Setup Python 3.9 environment
- Install requirements with
pip install -r utils/requirements.txt
- Download the spacy pipeline with
python -m spacy download de_core_news_sm
- The data used for training and evaluation is in the
data/
directory. You can download it either from the competition homepage or the original Github repository. For the later one, use theratings.csv
file and adapt the sentence and label column in the settings. - Adap the paths and column names in
utils/settings.py
to your version of the data.
Our SVR models are uploaded in this reporsitory in the models
folder. The fine-tuned DistilBERT model is uploaded to HuggingFace and can be found here.
To run the respective models, use these commands from the command line
python support_vector_regression.py
python support_vector_regression.py --only_statistics
python eval_distilbert.py
This will store .npy
files with the embedding vectors of the training data in the data folder.
python eval_distilbert.py --embedding
To analyze the relevant features in the SVR models, use the feature_relevance_analysis.py
module.
python feature_relevance_analysis.py -s
The -s
flag samples to data to speed up the SHAP value calculation. If you want to evaluate the combined model, add the -c
parameter.
To recreate the SVR models, run the following command.
python support_vector_regression.py --training_mode
To retrain the DistilBERT fine-tuning, use the finetune_distilbert.py
module.