Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to set MinReliableKeepPercent #36

Open
ghost opened this issue Jul 27, 2015 · 2 comments
Open

Add possibility to set MinReliableKeepPercent #36

ghost opened this issue Jul 27, 2015 · 2 comments

Comments

@ghost
Copy link

ghost commented Jul 27, 2015

Originally reported on Google Code with ID 36

What steps will reproduce the problem?
1. try to detect the language of attached input file
2. see the output is "unknown"

What is the expected output? What do you see instead?
I would expect either 'perssian' or 'arabic'

What version of the product are you using? On what operating system?
rev195 on centos 7

Please provide any additional information below.

CLD2 returns "unknown" because the reliability is lower than kMinReliableKeepPercent
(in compact_lang_det_impl.cc) :
static const int kMinReliableKeepPercent = 41;  // Remove lang if reli < this

Would adding an additional parameter to the DetectLanguageXXX(...) in order to set
this threshold be acceptable ?

Regards

Reported by William.Tambellini on 2015-06-11 17:07:38


- _Attachment: [input_ara_only.txt](https://storage.googleapis.com/google-code-attachments/cld2/issue-36/comment-0/input_ara_only.txt)_
@ghost ghost self-assigned this Jul 27, 2015
@ghost
Copy link
Author

ghost commented Jul 27, 2015

That's a good suggestion. I'd really like to see us consider an alternative scheme where
we use the builder pattern to construct a settings/config object, so that we can keep
the API as stable as possible while accommodating reasonable requests for behavioral
changes like this.

Jason/Dick, what do you think?

Reported by [email protected] on 2015-06-11 20:17:48

@jasonriesa
Copy link
Member

We are revising CLD2 internally to have a single entry point that takes an options proto. I see no reason why kMinReliableKeepPercent cannot be included as a configurable option. Once that is done and tested thoroughly, we will migrate those changes to the open source version of CLD2 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant