Gender-Neutral Translation Evaluator

We propose a reference-free method to assess gender-neutral translations in GeNTE. In this repository you can find the code, the checkpoint, and instructions for conducting GeNTE evaluations with our reference-free solution. It is a classifier finetuned on UmBERTo, a Roberta-based Language Model trained on large Italian Corpora.
The classifier will output a label for each sentence provided – 0 for neutral and 1 for gendered – along with the probability for the label.

All data are released under a Creative Commons Attribution 4.0 International license (CC BY 4.0). For data generated with proprietary models, please refer to their terms for use for details (OpenAI, Amazon Translate, DeepL).

How to run

Our classifier v1 ($CLASSIFIER_FOLDER) can be downloaded at the following link. It contains the checkpoint and the config file (for reference, cite piergentili-etal-2023-hi). A new version of the classifier (classifier v2), used for the automatic evaluation in savoldi-etal-2024-prompt, can be downloaded at the following link. The difference between the two versions lies in the training data used (see Training data section).
To use the classifier to assess whether the translation of GeNTE generated by your system in a TXT file ($DATA) are neutral or gendered, run the following command. The tsv output containing the sentences, the true labels, the predicted labels, and label probabilities will be saved in a tsv file ($OUTPUT_FILE).

python /path/to/GeNTE/src/cli/generate.py \
        --model Musixmatch/umberto-commoncrawl-cased-v1 \
        --checkpoint $CLASSIFIER_FOLDER \
        --num-classes 2 \
        --data-file $DATA \
        --batch-size 64 \
        --max-seq-len 64 \
        --lower-case False \
        --metrics accuracy class_f1 \
        --writer tsv \
        --save-file $OUTPUT_FILE

Reproducibility

To ensure reproducibility of the results reported in our paper, we also provide the training data used to train the final classifier, the training setup, the translations that were used in our evaluation process.

Training Data

The data used to train our classifiers have been automatically generated by GPT-3.5. For more information, please refer to piergentili-etal-2023-hi. You can download the data for classifier v1 here, and the data for classifier v2 here. The latter has undergone a cleaning procedure to eliminate noise.

Training Setup

To replicate the training of our classifiers using the synthetic data ($TRAIN_DATA and $DEV_DATA, located in $DATA_FOLDER, and downloadable above) run the following command. Checkpoints will be saved in $SAVE_FOLDER.

python /path/to/GeNTE/src/cli/train.py \
        --model Musixmatch/umberto-commoncrawl-cased-v1 \
        --num-classes 2 \
        --data-root $DATA_FOLDER \
        --train $DATA_FOLDER/$TRAIN_DATA \
        --validation $DATA_FOLDER/$DEV_DATA \
        --save-dir $SAVE_FOLDER \
        --num-epochs 2 \
        --batch-size 64 \
        --max-seq-len 64 \
        --lower-case False \
        --shuffle True \
        --learning-rate 0.00005 \
        --epsilon 0.00000001

Output Translations

We provide translations generated by DeepL and Amazon Translate and used for our evaluations. These translations were generated from COMMON-SET, a portion of GeNTE consisting of 200 source sentences — 100 gendered (COMMON-SET-G) and 100 neutral (COMMON-SET-N).

As the MT systems were unable to produce neutral translations for COMMON-SET-N, three human translators manually edited the 100 COMMON-SET-N translations. They substituted the gendered forms with neutral alternatives while keeping the rest of the sentences unchanged. For each system, we obtained three sets of neutral output sentences, one from each translator: Amazon-N-PEbyTransl1, Amazon-N-PEbyTransl2, Amazon-N-PEbyTransl3.

Therefore you can download the following translations at this link:

Amazon: Amazon-G-original for COMMON-SET-G; Amazon-N-PEbyTransl1, Amazon-N-PEbyTransl2, Amazon-N-PEbyTransl3, for COMMON-SET-N
DeepL: DeepL-G-original for COMMON-SET-G; DeepL-N-PEbyTransl1, DeepL-N-PEbyTransl2, DeepL-N-PEbyTransl3, for COMMON-SET-N

How to cite

The reference paper for the classifier v1 is: Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus, published at EMNLP 2023.

@inproceedings{piergentili-etal-2023-hi,
    title = "Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus",
    author = "Piergentili, Andrea and 
        Savoldi, Beatrice and 
        Fucci, Dennis and 
        Negri, Matteo and 
        Bentivogli, Luisa},
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.873",
    doi = "10.18653/v1/2023.emnlp-main.873",
    pages = "14124--14140",
}

The reference paper for the classifier v2 is: A Prompt Response to the Demand for Automatic Gender-Neutral Translation, published at EACL 2024.

@inproceedings{savoldi-etal-2024-prompt,
    title = "A Prompt Response to the Demand for Automatic Gender-Neutral Translation",
    author = "Savoldi, Beatrice  and
      Piergentili, Andrea  and
      Fucci, Dennis  and
      Negri, Matteo  and
      Bentivogli, Luisa",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-short.23",
    pages = "256--267",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeNTE.md

GeNTE.md

Gender-Neutral Translation Evaluator

How to run

Reproducibility

Training Data

Training Setup

Output Translations

How to cite

Files

GeNTE.md

Latest commit

History

GeNTE.md

File metadata and controls

Gender-Neutral Translation Evaluator

How to run

Reproducibility

Training Data

Training Setup

Output Translations

How to cite