Skip to content

Commit

Permalink
bicleaner AI score downloads
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgtied committed Apr 23, 2024
1 parent f691911 commit d34a89a
Show file tree
Hide file tree
Showing 5 changed files with 77 additions and 0 deletions.
49 changes: 49 additions & 0 deletions BicleanerScores.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Bicleaner AI scores

Here are [bicleaner AI](https://github.com/bitextor/bicleaner-ai) scores for selected bitexts from release v2023-09-26.
The scores correspond to lines in the training data. The scores apply models from Bicleaner AI v2.3.2.

* [Tatoeba-Challenge-v2023-09-26.eng-ara.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-ara.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-bul.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-bul.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-cat.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-cat.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-ces.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-ces.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-dan.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-dan.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-deu.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-deu.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-ell.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-ell.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-est.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-est.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-eus.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-eus.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-fin.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-fin.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-fra.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-fra.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-gle.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-gle.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-glg.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-glg.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-hbs.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-hbs.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-heb.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-heb.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-hin.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-hin.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-hun.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-hun.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-isl.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-isl.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-ita.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-ita.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-jpn.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-jpn.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-lav.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-lav.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-lit.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-lit.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-mkd.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-mkd.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-mlt.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-mlt.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-nld.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-nld.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-nno.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-nno.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-nob.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-nob.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-pol.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-pol.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-por.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-por.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-ron.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-ron.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-slk.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-slk.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-slv.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-slv.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-spa.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-spa.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-sqi.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-sqi.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-swa.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-swa.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-swe.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-swe.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-tur.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-tur.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-ukr.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-ukr.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-vie.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-vie.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.eng-zho.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.eng-zho.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.spa-cat.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.spa-cat.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.spa-eus.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.spa-eus.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.spa-glg.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.spa-glg.bicleaner-scores.gz)
* [Tatoeba-Challenge-v2023-09-26.spa-zho.bicleaner-scores.gz](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/Tatoeba-Challenge-v2023-09-26.spa-zho.bicleaner-scores.gz)
23 changes: 23 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,8 @@ TESTRELEASEDIR = ${RELEASEHOME}/test/${VERSION}
DEVRELEASEDIR = ${RELEASEHOME}/dev/${VERSION}
INFODIR = ${RELEASEDIR}

BICLEANERDIR = ${RELEASEHOME}/bicleaner/${VERSION}


PREVIOUS_RELEASEDIR = ${RELEASEHOME}/${PREVIOUS_VERSION}

Expand Down Expand Up @@ -402,6 +404,22 @@ release-push:
.PHONY: released-data-counts
released-data-counts: ${DATA_COUNT_FILES}

.PHONY: release-bicleaner-scores
release-bicleaner-scores: $(patsubst %.gz,%.done,$(wildcard ${BICLEANERDIR}/*.gz))
echo ${BICLEANERDIR}

BicleanerScores.md:
@echo "# Bicleaner AI scores" > $@
@echo "" >> $@
@echo "Here are [bicleaner AI](https://github.com/bitextor/bicleaner-ai) scores for selected bitexts from release ${VERSION}." >> $@
@echo "The scores correspond to lines in the training data. The scores apply models from Bicleaner AI v2.3.2." >> $@
@echo "" >> $@
@for f in $(sort $(notdir $(wildcard ${BICLEANERDIR}/*.gz))); do \
echo "* [$$f](${TATOEBA_DATAURL}-${VERSION}/$$f)" >> $@; \
done




## generate readme file

Expand Down Expand Up @@ -2058,6 +2076,11 @@ ${RELEASEHOME}/test.done ${RELEASEHOME}/dev.done: %.done: %
a-put ${APUT_FLAGS} -b ${DEVTEST_CONTAINER} $<
touch $@

${BICLEANERDIR}/%.done: ${BICLEANERDIR}/%.gz
a-put ${APUT_FLAGS} -b ${RELEASE_CONTAINER} $<
touch $@


## released train/dev/test data
${RELEASEDIR}/%.done: ${RELEASEDIR}/%
${MAKE} $</README.md
Expand Down
2 changes: 2 additions & 0 deletions README-v2023-09-26.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ This is a challenge set for machine translation that contains 32G translation un
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](data/devtest))
* [Automatically translated monolingual data](data/Backtranslations.md)
* [Pre-trained sentence piece models](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/tatoeba/SentencePieceModels.md)
* **NEW** [Bicleaner AI scores](BicleanerScores.md)


The latest release also includes some parallel data sets in the same language in order to test paraphrase models. Note, however, that the support for paraphrasing is really limited in our data sets.

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ This is a challenge set for machine translation that contains 32G translation un
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](data/devtest))
* [Automatically translated monolingual data](data/Backtranslations.md)
* [Pre-trained sentence piece models](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/tatoeba/SentencePieceModels.md)
* **NEW** [Bicleaner AI scores](BicleanerScores.md)


The latest release also includes some parallel data sets in the same language in order to test paraphrase models. Note, however, that the support for paraphrasing is really limited in our data sets.

Expand Down
1 change: 1 addition & 0 deletions README.template
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ This is a challenge set for machine translation that contains %%TRAIN_SIZE%% tra
* [Incrementally updated development and test data](https://object.pouta.csc.fi/Tatoeba-Challenge-devtest/devtest.tar), ([here for individual language pairs](data/devtest))
* [Automatically translated monolingual data](data/Backtranslations.md)
* [Pre-trained sentence piece models](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/tatoeba/SentencePieceModels.md)
* **NEW** [Bicleaner AI scores](BicleanerScores.md)

The latest release also includes some parallel data sets in the same language in order to test paraphrase models. Note, however, that the support for paraphrasing is really limited in our data sets.

Expand Down

0 comments on commit d34a89a

Please sign in to comment.