Text Complexity Assessment of German Text

This repository contains the code for the TUM Social Computing team at the GermEval 2022 shared task. Our SVR models were trained with Python 3.9.

Setup

Clone this repository
Setup Python 3.9 environment
Install requirements with pip install -r utils/requirements.txt
Download the spacy pipeline with python -m spacy download de_core_news_sm
The data used for training and evaluation is in the data/ directory. You can download it either from the competition homepage or the original Github repository. For the later one, use the ratings.csv file and adapt the sentence and label column in the settings.
Adap the paths and column names in utils/settings.py to your version of the data.

Use pretrained models

Our SVR models are uploaded in this reporsitory in the models folder. The fine-tuned DistilBERT model is uploaded to HuggingFace and can be found here. To run the respective models, use these commands from the command line

Combination of neural embedding with text statistics

python support_vector_regression.py

SVR with statistics only

python support_vector_regression.py --only_statistics

Fine-tuned DistilBERT

python eval_distilbert.py

Create neural embeddingt with fine-tuned DistilBERT

This will store .npy files with the embedding vectors of the training data in the data folder.

python eval_distilbert.py --embedding

Generate explanations

To analyze the relevant features in the SVR models, use the feature_relevance_analysis.py module.

python feature_relevance_analysis.py -s

The -s flag samples to data to speed up the SHAP value calculation. If you want to evaluate the combined model, add the -c parameter.

Train models

To recreate the SVR models, run the following command.

python support_vector_regression.py --training_mode

To retrain the DistilBERT fine-tuning, use the finetune_distilbert.py module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Text Complexity Assessment of German Text

Setup

Use pretrained models

Combination of neural embedding with text statistics

SVR with statistics only

Fine-tuned DistilBERT

Create neural embeddingt with fine-tuned DistilBERT

Generate explanations

Train models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Text Complexity Assessment of German Text

Setup

Use pretrained models

Combination of neural embedding with text statistics

SVR with statistics only

Fine-tuned DistilBERT

Create neural embeddingt with fine-tuned DistilBERT

Generate explanations

Train models