You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
I don't understanding the meaning of this experiment.
Too many errors in the gold set.
For examples, in the europarl-v7.de-en.en.sentences.test.gold:
line 73:I am happy to try and answer, Mr Wijsenbeek. As you will certainly know,……. Here "I am happy to try and answer, Mr Wijsenbeek." is obviously a single sentence and the gold dost't mark is as.
Simliar data:
line 130,175... too much
So I don't understanding the meaning of "sentence boundary detection" in this dataset.
The text was updated successfully, but these errors were encountered:
we use this kind of dataset, because there are no 100% gold-labeled datasets available for this task. That's why we refer to it as "quasi-segmented" datasets.
However, in preliminary experiments we used Universal Dependencies (normally used for e.g. PoS tagging). These datasets contain a more sentence-segmented structure. But: the number of sentences is less than e.g. the Europarl corpora!
I don't understanding the meaning of this experiment.
Too many errors in the gold set.
For examples, in the europarl-v7.de-en.en.sentences.test.gold:
line 73:I am happy to try and answer, Mr Wijsenbeek. As you will certainly know,……. Here "I am happy to try and answer, Mr Wijsenbeek." is obviously a single sentence and the gold dost't mark is as.
Simliar data:
line 130,175... too much
So I don't understanding the meaning of "sentence boundary detection" in this dataset.
The text was updated successfully, but these errors were encountered: