Translation processing problem #52

Khalid-kamal · 2022-10-24T18:11:00Z

When you have a sentence and dots are found in the middle, the sentence cannot be completed and only the first part is translated, ignoring the last portion after dots. for example
The officers and employees of the Bank, who are not local nationals of the Kingdom of ................... shall be exempt from customs duties and other levies, prohibitions and restrictions on the importation of motor vehicles and spare parts thereof, and household effects, equipment and furniture.
The result comes only for the first part until Kingdom of

SafeTex · 2022-10-26T18:25:07Z

Hello Khalid

Are you translating into Arabic by any chance?

I wouldn't be surprised if this has something to do with right to left languages but I'm only guessing of course

The thing is that when I tested what you said in one of my language pairs (Swedish to English), Opus CAT translated everything (see attached file)

Khalid-kamal · 2022-10-27T07:15:01Z

So, it seems that the problem is in the language you are translating into, but this should not happen since the tool is counting the source words and compare them to the target words. It may be a bug and needs to be fixed.
Thanks for your guressing

TommiNieminen · 2022-10-28T12:37:58Z

I don't seem to be able to reproduce this issue, at least with the opus+bt-2021-04-13 English to Arabic model. Do you have more information in what contexts this issue occurs in?

Khalid-kamal · 2022-10-30T05:07:21Z

Tommi,
Would you try this sentence and see the result:
1996 ................... among certain African states and international organizations;

Khalid-kamal · 2022-10-30T05:09:45Z

Here is the database

TommiNieminen · 2022-11-03T09:20:48Z

That looks like a fine-tuned model, so it's possible that this caused by the fine-tuning process. Since the data used for fine-tuning is generally very domain-specific, it may cause performance to degrade with source texts that don't belong to the fine-tuning domain (such as these kinds of texts where a series of periods is used as placeholder).

How much data did you use to fine-tune the model with, and what sort of data was it? Another complicating factor is that the Arabic models are multilingual models, i.e. they support multiple variants of Arabic, which might affect fine-tuning.

Khalid-kamal · 2022-11-03T09:26:02Z

Over one million segments

Khalid-kamal · 2022-11-03T09:27:03Z

Most of the data is almost in the main domain

TommiNieminen · 2022-11-03T09:38:16Z

Ok, that's a lot of data. It does sound like the problem with the repeated periods is caused by the fine-tuning. If there are other errors in the translations besides the problem with the repeated periods, I would advise fine-tuning with smaller, more targeted set of segments.

If the model translates OK otherwise, it's also possible to use a pre-edit rule to edit those problematic sentences automatically before they are translated. For instance, you could use a rule like this:

This rule would truncate all series of repeated periods to five periods, which might be easier for a MT model to handle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation processing problem #52

Translation processing problem #52

Khalid-kamal commented Oct 24, 2022

SafeTex commented Oct 26, 2022

Khalid-kamal commented Oct 27, 2022

TommiNieminen commented Oct 28, 2022

Khalid-kamal commented Oct 30, 2022

Khalid-kamal commented Oct 30, 2022

TommiNieminen commented Nov 3, 2022

Khalid-kamal commented Nov 3, 2022

Khalid-kamal commented Nov 3, 2022

TommiNieminen commented Nov 3, 2022

Translation processing problem #52

Translation processing problem #52

Comments

Khalid-kamal commented Oct 24, 2022

SafeTex commented Oct 26, 2022

Khalid-kamal commented Oct 27, 2022

TommiNieminen commented Oct 28, 2022

Khalid-kamal commented Oct 30, 2022

Khalid-kamal commented Oct 30, 2022

TommiNieminen commented Nov 3, 2022

Khalid-kamal commented Nov 3, 2022

Khalid-kamal commented Nov 3, 2022

TommiNieminen commented Nov 3, 2022