forked from wmt-conference/wmt16-website
-
Notifications
You must be signed in to change notification settings - Fork 0
/
it-translation-task.html
95 lines (85 loc) · 5.75 KB
/
it-translation-task.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
<HTML>
<HEAD>
<title>IT Translation Task - ACL 2016 First Conference on Machine Translation</title>
<style> h3 { margin-top: 2em; } </style>
</HEAD>
<body>
<center>
<script src="title.js"></script>
<p><h2>Shared Task: Machine Translation of IT domain</h2></p>
<script src="menu.js"></script>
</center>
<p>The focus of this task is a domain adaptation of MT to the <abbr title="information technologies">IT</abbr> domain
and translation of answers in a cross-lingual help-desk service,
where hardware&software troubleshooting answers are translated from
English to the users' languages.</p>
<!--The answers are generally longer than the questions (usually one sentence, but sometimes more).
We refer to questions and answers as segments.-->
<p>
You may participate in any or all of the following language pairs:
<ul>
<li>English-to-Bulgarian (EN-BG)</li>
<li>English-to-Czech (EN-CS)</li>
<li>English-to-German (EN-DE)</li>
<li>English-to-Spanish (EN-ES)</li>
<li>English-to-Basque (EN-EU)</li>
<li>English-to-Dutch (EN-NL)</li>
<li>English-to-Portuguese (EN-PT)</li>
</ul>
</p>
<h3>DATA FOR DOWNLOAD</h3>
<dl>
<dt>out-of-domain training data</dt>
<dd>any parallel and monolingual sets from the <a href="translation-task.html">WMT16 News task</a> or previous years, including
<ul>
<li><a href="/europarl/">Europarl corpus</a> (bg, cs, de, es, nl, pt)</li>
<li><a href="http://ufal.mff.cuni.cz/czeng/">CzEng 1.0</a> (cs, avoid sections 98 and 99), or <span style="color:red">new and larger</span> <a href="http://ufal.mff.cuni.cz/czeng/czeng16pre">CzEng 1.6pre</a> (cs)</li>
<li><a href="http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz">Common crawl</a> (cs, de, es)</li>
<li><a href="http://www.statmt.org/wmt13/training-parallel-un.tgz">UN corpus</a> (es), <a href="http://194.117.45.196:7777/public.php?service=files&t=303b43f123ff6c666019d76a676bcf67&download&path=//undoc.en-de.tar.gz">MultiUN</a> (de)</li>
<li><a href="http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz">News commentary v8</a> (cs, de, es), or <a href="http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz">NC v11</a> (cs, de)</li>
<li><a href="http://komunitatea.elhuyar.org/ig/files/2016/01/PaCo_EuEn_corpus.tgz">PaCo2-EuEn</a> (eu)</li>
<li><a href="http://download.webclark.org/QTLeapBGEN/setimescleaned.zip">SE Times</a> (bg), <a href="http://bultreebank.org/clef/">Bultreebank</a> (monolingual bg)</li>
</ul>
</dd>
<dt>in-domain training data</dt>
<dd>
<ul>
<li><a href="http://ufallab.ms.mff.cuni.cz/~popel/batch1and2.zip">Batch1 and Batch2</a> (2000 answers, you can use part of this data as development test set; some answers may contain more than one sentence),
<li>localization PO files: <a href="http://194.117.45.196:7777/public.php?service=files&t=0f949b512b7102003e4db334e8026a88&download&path=//indomain_training.zip">text format</a>
(the text format was mined from the following resources
<a href="http://downloads.videolan.org/pub/videolan/vlc/2.1.5/vlc-2.1.5.tar.xz">VLC</a>,
<a href="http://download.documentfoundation.org/libreoffice/src/4.4.0/libreoffice-translations-4.4.0.3.tar.xz">LO</a>,
<a href="https://websvn.kde.org/trunk/l10n-kde4/">KDE</a> [<a href="svn://anonsvn.kde.org/home/kde/branches/stable/l10n-kde4/">svn</a>])
</li>
<li><a href="http://ufallab.ms.mff.cuni.cz/tectomt/share/data/models/gazeteer/wiki_all.zip">IT-related terms from Wikipedia</a></li>
<li><a href="http://194.117.45.196:7777/public.php?service=files&t=303b43f123ff6c666019d76a676bcf67">en-de technical documentation</a> (LibreOffice, Chromium, Ubuntu, Drupal)</li>
</ul>
</dd>
<dt>in-domain test data</dt>
<dd>
<a href="http://ufallab.ms.mff.cuni.cz/~popel/batch3.zip">Batch3</a> (1000 answers, published on April 11). The submission deadline is April 24.
</dd>
</dl>
</p>
<p>If you use additional training data (or existing translation systems that use additional training data), you must flag that your system uses additional data.
We will distinguish system submissions that used the provided training data (constrained) from submissions that used significant additional data resources.
Linguistic tools such as morphological analyzers, taggers, parsers, word-sense disambiguation or named entity recognizers are allowed in the constrained condition.
<h3><a name="submission">TEST SET SUBMISSION</a></h3>
<p>Unlike in the News translation task, punctuation in the official test sets will not be altered.</p>
<p>To submit your results, please first convert into into SGML format as
required by the NIST BLEU scorer, and then upload it to the
website <a href="http://matrix.statmt.org/">matrix.statmt.org</a>.
For the conversion of plain-text (one sentence per line) translations into SGML format you can use
<a href="http://ufallab.ms.mff.cuni.cz/~popel/txt2sgm.pl">txt2sgm.pl</a>
(or the "old" way: download <a href="http://ufallab.ms.mff.cuni.cz/~popel/it-test2016-src.en.sgm">it-test2016-src.en.sgm</a>
and follow <a href="translation-task.html#sgml">News task guidelines</a>).
</p>
<p>The translation quality will be measured by a manual evaluation and various automatic evaluation metrics.
Participants agree to contribute to the manual evaluation about four hours of work per each submitted system.
</p>
<h3>ACKNOWLEDGEMENTS</h3>
<img src="http://ufal.mff.cuni.cz/sites/default/files/styles/drupal_projects_logo_style/public/qtleap_logo.png" alt="QTLeap logo" style="float:right"/>
<p>The in-domain data were created within <a href="http://qtleap.eu">QTLeap</a> project, which sponsors this task.
In case of any questions contact <a href="mailto:[email protected]">Martin Popel</a>
or the <a href="http://groups.google.com/group/wmt-tasks">WMT mailing list</a>.</p>
</HTML>