-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translation processing problem #52
Comments
So, it seems that the problem is in the language you are translating into, but this should not happen since the tool is counting the source words and compare them to the target words. It may be a bug and needs to be fixed. |
I don't seem to be able to reproduce this issue, at least with the opus+bt-2021-04-13 English to Arabic model. Do you have more information in what contexts this issue occurs in? |
That looks like a fine-tuned model, so it's possible that this caused by the fine-tuning process. Since the data used for fine-tuning is generally very domain-specific, it may cause performance to degrade with source texts that don't belong to the fine-tuning domain (such as these kinds of texts where a series of periods is used as placeholder). How much data did you use to fine-tune the model with, and what sort of data was it? Another complicating factor is that the Arabic models are multilingual models, i.e. they support multiple variants of Arabic, which might affect fine-tuning. |
Over one million segments |
Most of the data is almost in the main domain |
Ok, that's a lot of data. It does sound like the problem with the repeated periods is caused by the fine-tuning. If there are other errors in the translations besides the problem with the repeated periods, I would advise fine-tuning with smaller, more targeted set of segments. If the model translates OK otherwise, it's also possible to use a pre-edit rule to edit those problematic sentences automatically before they are translated. For instance, you could use a rule like this: This rule would truncate all series of repeated periods to five periods, which might be easier for a MT model to handle. |
When you have a sentence and dots are found in the middle, the sentence cannot be completed and only the first part is translated, ignoring the last portion after dots. for example
The officers and employees of the Bank, who are not local nationals of the Kingdom of ................... shall be exempt from customs duties and other levies, prohibitions and restrictions on the importation of motor vehicles and spare parts thereof, and household effects, equipment and furniture.
The result comes only for the first part until Kingdom of
The text was updated successfully, but these errors were encountered: