You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The crash only happens if the ngram order is higher than 1, and only if the # occurs at the start of a token.
I'm guessing this is because it interprets a # at the beginning of a line in a text counts file as a comment and skips it, meaning a unigram beginning with a # is missing from the term dictionary when it's encountered in a later bigram.
What steps will reproduce the problem?
$ estimate-ngram -wc counts -text <(echo 'a #hashtag')
0.001 Loading corpus /dev/fd/63...
0.002 Smoothing[1] = ModKN
0.002 Smoothing[2] = ModKN
0.002 Smoothing[3] = ModKN
0.002 Set smoothing algorithms...
0.002 Saving counts to counts...
$ cat counts
<s> 1
a 1
#hashtag 1
<s> a 1
a #hashtag 1
#hashtag </s> 1
<s> a #hashtag 1
a #hashtag </s> 1
$ estimate-ngram -counts counts -wl lm.arpa
0.001 Loading counts counts...
estimate-ngram: src/NgramModel.cpp:800: void mitlm::NgramModel::_ComputeBackoffs(): Assertion `allTrue(backoffs != NgramVector::Invalid)' failed.
Aborted (core dumped)
What version of the product are you using? On what operating system?
Built from latest master on github. Ubuntu 14.04.1
The text was updated successfully, but these errors were encountered:
(Reporting this here as well as https://code.google.com/p/mitlm/issues/detail?id=44 in case github gets more attention these days)
The crash only happens if the ngram order is higher than 1, and only if the # occurs at the start of a token.
I'm guessing this is because it interprets a # at the beginning of a line in a text counts file as a comment and skips it, meaning a unigram beginning with a # is missing from the term dictionary when it's encountered in a later bigram.
What steps will reproduce the problem?
What version of the product are you using? On what operating system?
Built from latest master on github. Ubuntu 14.04.1
The text was updated successfully, but these errors were encountered: