-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Consistent GET variables? #156
Comments
As to that, it's much harder than you think. That means we have to annotate which language family all ISO 639-3 codes are, plus what quality level each of our implementations are, and then work out a complicated weighting of paths. It may be that So it's doable, but someone needs to annotate families (easy, but boring), and figure out relative quality levels of all pairs (extremely hard), and then work out an algorithm for making best fit paths. |
I know that if N is the total number of languages then it would take a table of N*(N-1) entries. Also I noticed you used jpn as an example. Was that recently added to the database? |
The way to do this right now is basically to use vocabulary coverage over a corpus. This is the best indicator of the quality of a pair. This is something that could be automated on a rolling basis... download the latest Wikipedia (or Wikinews) dump, calculate the coverage, and store the number in a stats file in the pair. |
Is there anyway to calculate this now before something like this gets implemented? I believe /calcCoverage only does it for the sentence you enter. |
@Ryu945 I can't do it, but you could! :) I don't expect that such a script should take longer than an hour or two to write. |
I noticed when using /translate, the get variable is "langpair" and it takes a value like "eng | spa".
When using /translateChain, the get variable is "langpairs" and it takes an argument like "eng | spa | fra"
I know the input is technically different but shouldn't the variables be more consistent for usage sake?
Also shouldn't there be some way for the API to pick appropriate middle languages on its own instead of requiring it to be specified. For example, depending on the quality of a language pair, sending "eng | fra" might mean using " eng | spa | fra" automatically since spa and fra are both romance languages and spa might have a high score in accuracy between both eng and fra.
edit: While the thought of a cleaned up interface is on the mind, what if /translate and /translateChain were both done with /translate . It would run the appropiate code based on whether it is fed "eng | spa | fra" or "eng | spa"
The text was updated successfully, but these errors were encountered: