We propose a reference-free method to assess gender-neutral translations in
GeNTE.
In this repository you can find the code, the checkpoint, and instructions for conducting GeNTE evaluations
with our reference-free solution. It is a classifier finetuned on
UmBERTo,
a Roberta-based Language Model trained on large Italian Corpora.
The classifier will output a label for each sentence provided – 0 for neutral and 1 for gendered – along with the probability for the label.
All data are released under a Creative Commons Attribution 4.0 International license (CC BY 4.0). For data generated with proprietary models, please refer to their terms for use for details (OpenAI, Amazon Translate, DeepL).
Our classifier v1 ($CLASSIFIER_FOLDER
) can be downloaded at the following
link.
It contains the checkpoint and the config file (for reference, cite piergentili-etal-2023-hi
).
A new version of the classifier (classifier v2), used for the automatic evaluation in savoldi-etal-2024-prompt
,
can be downloaded at the following
link.
The difference between the two versions lies in the training data used (see Training data section).
To use the classifier to assess whether the translation of GeNTE generated by your system
in a TXT file ($DATA
) are neutral or gendered, run the following command.
The tsv output containing the sentences, the true labels, the predicted labels, and label
probabilities will be saved in a tsv file ($OUTPUT_FILE
).
python /path/to/GeNTE/src/cli/generate.py \
--model Musixmatch/umberto-commoncrawl-cased-v1 \
--checkpoint $CLASSIFIER_FOLDER \
--num-classes 2 \
--data-file $DATA \
--batch-size 64 \
--max-seq-len 64 \
--lower-case False \
--metrics accuracy class_f1 \
--writer tsv \
--save-file $OUTPUT_FILE
To ensure reproducibility of the results reported in our paper, we also provide the training data used to train the final classifier, the training setup, the translations that were used in our evaluation process.
The data used to train our classifiers have been automatically generated by
GPT-3.5.
For more information, please refer to piergentili-etal-2023-hi
.
You can download the data for classifier v1 here,
and the data for classifier v2 here.
The latter has undergone a cleaning procedure to eliminate noise.
To replicate the training of our classifiers using the synthetic data ($TRAIN_DATA
and $DEV_DATA
, located
in $DATA_FOLDER
, and downloadable above) run the following command. Checkpoints will be saved in $SAVE_FOLDER
.
python /path/to/GeNTE/src/cli/train.py \
--model Musixmatch/umberto-commoncrawl-cased-v1 \
--num-classes 2 \
--data-root $DATA_FOLDER \
--train $DATA_FOLDER/$TRAIN_DATA \
--validation $DATA_FOLDER/$DEV_DATA \
--save-dir $SAVE_FOLDER \
--num-epochs 2 \
--batch-size 64 \
--max-seq-len 64 \
--lower-case False \
--shuffle True \
--learning-rate 0.00005 \
--epsilon 0.00000001
We provide translations generated by DeepL and Amazon Translate and used for our evaluations. These translations were generated from COMMON-SET, a portion of GeNTE consisting of 200 source sentences — 100 gendered (COMMON-SET-G) and 100 neutral (COMMON-SET-N).
As the MT systems were unable to produce neutral translations for COMMON-SET-N, three human translators manually edited the 100 COMMON-SET-N translations. They substituted the gendered forms with neutral alternatives while keeping the rest of the sentences unchanged. For each system, we obtained three sets of neutral output sentences, one from each translator: Amazon-N-PEbyTransl1, Amazon-N-PEbyTransl2, Amazon-N-PEbyTransl3.
Therefore you can download the following translations at this link:
- Amazon:
Amazon-G-original
for COMMON-SET-G;Amazon-N-PEbyTransl1
,Amazon-N-PEbyTransl2
,Amazon-N-PEbyTransl3
, for COMMON-SET-N - DeepL:
DeepL-G-original
for COMMON-SET-G;DeepL-N-PEbyTransl1
,DeepL-N-PEbyTransl2
,DeepL-N-PEbyTransl3
, for COMMON-SET-N
The reference paper for the classifier v1 is: Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus, published at EMNLP 2023.
@inproceedings{piergentili-etal-2023-hi,
title = "Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus",
author = "Piergentili, Andrea and
Savoldi, Beatrice and
Fucci, Dennis and
Negri, Matteo and
Bentivogli, Luisa},
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.873",
doi = "10.18653/v1/2023.emnlp-main.873",
pages = "14124--14140",
}
The reference paper for the classifier v2 is: A Prompt Response to the Demand for Automatic Gender-Neutral Translation, published at EACL 2024.
@inproceedings{savoldi-etal-2024-prompt,
title = "A Prompt Response to the Demand for Automatic Gender-Neutral Translation",
author = "Savoldi, Beatrice and
Piergentili, Andrea and
Fucci, Dennis and
Negri, Matteo and
Bentivogli, Luisa",
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)",
month = mar,
year = "2024",
address = "St. Julian{'}s, Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.eacl-short.23",
pages = "256--267",
}