-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect source language with langdetect package #37
Comments
Hey langdetect is cool! However it seems there's many options for language detection, including fasttext and langid.py. Each option will have a certain accuracy (none of them are 100%) and speed - so I feel it might be difficult to choose for the end user. Also since we are now using I think a good option would be to start with a section in the user guide showing how to use any (or all) of the language detection libraries. Then from there, we could build a util function along the lines of: src = dlt.lang.detect(source_text, backend="fasttext") # or backend="langdetect" or backend="langid"
mt.translate(source_text, source=src,...) Which will throw an error that requires a user to install the library if they want to use a specific backend. |
Those are some good points, I agree it would be confusing to have the library detect a language but not translate it. I'll take a look into writing something that could potentially put into the user guide. |
Thank you. Once we have something in the user guide I'd welcome another PR that'd update |
Hi, Any updates about this issue. Is there any hint for making language source auto-detected? |
@banyous Feel free to contribute a section in the user guide about using language detection, and from there, if we feel a wrapper around fasttext would make life easier, then I'm happy to welcome a PR to add language detection to I think this is a decent starting point: https://fasttext.cc/docs/en/language-identification.html |
The langdetect has worked well for me in the past for language detection problems. How would you feel about allowing users to pass
'auto'
as an option forsource
? I could see some pros and cons:Pros
Cons
langdetect
detects these 55 languages onlyI'm a little new to open source but I would love to contribute 🙂 Of course, if you feel this doesn't fit this package's mission that's totally understandable.
The text was updated successfully, but these errors were encountered: