You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps will reproduce the problem?
1. try to detect the language of attached input file
2. see the output is "unknown"
What is the expected output? What do you see instead?
I would expect either 'perssian' or 'arabic'
What version of the product are you using? On what operating system?
rev195 on centos 7
Please provide any additional information below.
CLD2 returns "unknown" because the reliability is lower than kMinReliableKeepPercent
(in compact_lang_det_impl.cc) :
static const int kMinReliableKeepPercent = 41; // Remove lang if reli < this
Would adding an additional parameter to the DetectLanguageXXX(...) in order to set
this threshold be acceptable ?
Regards
Reported by William.Tambellini on 2015-06-11 17:07:38
That's a good suggestion. I'd really like to see us consider an alternative scheme where
we use the builder pattern to construct a settings/config object, so that we can keep
the API as stable as possible while accommodating reasonable requests for behavioral
changes like this.
Jason/Dick, what do you think?
We are revising CLD2 internally to have a single entry point that takes an options proto. I see no reason why kMinReliableKeepPercent cannot be included as a configurable option. Once that is done and tested thoroughly, we will migrate those changes to the open source version of CLD2 here.
Originally reported on Google Code with ID 36
Reported by
William.Tambellini
on 2015-06-11 17:07:38- _Attachment: [input_ara_only.txt](https://storage.googleapis.com/google-code-attachments/cld2/issue-36/comment-0/input_ara_only.txt)_
The text was updated successfully, but these errors were encountered: