Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DITA XML format #111

Open
tristanmaccana opened this issue Mar 3, 2023 · 9 comments
Open

DITA XML format #111

tristanmaccana opened this issue Mar 3, 2023 · 9 comments

Comments

@tristanmaccana
Copy link

Any plans to add DITA to translatable format?

@chriswendt1
Copy link
Member

Before you sent this request: No. Do you have a hint on any libraries or existing processing logic? The key is to extract exactly the translatable elements, escaping any sentence-internal tags, reinserting the internal tag values at the right places, and then replacing the translated segments into the original markup. This is best done by proper DOM parsing.
Do you have a set of DITA documents you want to translate and can share a link to?
If there was an existing DITA to HTML and HTML to DITA converter, that could work without new code to write.

@chriswendt1
Copy link
Member

chriswendt1 commented Mar 6, 2023

Hi Tristan,
I think the appropriate process will be to use Fluenta (https://github.com/rmraya/Fluenta) to extract the translatable elements from DITA files referenced in a DITA map into XLIFF. You then use Document Translator to translate the XLIFF. Then use Fluenta again to re-insert the translated elements into the DITA files.
Let us know if you used this successfully.
For convenience, this could be arranged in a workflow controlled by Document Translator. Document Translator could invoke Fluenta as an external process before and after translating the XLIFF.

@tristanmaccana
Copy link
Author

Hi Chris,
Thanks very much for getting back to me. Yes we have a very large set of manuals/ installer guides etc. that we would like to translate from DITA using Microsoft Translator so would like to get an efficient workflow created. Using Fluenta we had an issue with inline elements in the target. See the image for ph ids in random order and some are empty. Can I confirm if you are using XLIFF 1.2 only?
inline errors-image

@chriswendt1
Copy link
Member

Hi Tristan,
It seems to me as if Fluenta should not have processed the ph element as translatable. Can you teach Fluenta to exclude it from translation?
The content of the ph element is not translatable. You may get fairly random output in translation.

@tristanmaccana
Copy link
Author

HI Chris,
We nearly always have to translate inline elements such as uicontrol elements which are localized. Perhaps we have to translate in a separate process?
Here is our xliff file after Translator [https://drive.google.com/file/d/1ZUqIoGAnOceNCN_Cubmxo5sPF_zrG2tS/view?usp=share_link]
I will try and share the source file with you later

@chriswendt1
Copy link
Member

chriswendt1 commented Mar 6, 2023

Thanks for sharing the sample. Looking at the first segment with <ph> elements inside, the segment to translate is this::
You can configure images and text for each space level in the <ph ctype="x-other" id="0">&lt;uicontrol class="+ topic/ph ui-d/uicontrol "&gt;</ph>Green Hub<ph id="1">&lt;/uicontrol&gt;</ph> section in the <ph ctype="x-other" id="2">&lt;ph keyref="brand" status="removeContent" class="- topic/ph "&gt;</ph><ph ctype="x-other" id="3">&lt;keyword class="- topic/keyword "&gt;</ph>OpenBlue<ph id="4">&lt;/keyword&gt;</ph><ph id="5">&lt;/ph&gt;</ph> Enterprise Manager <ph ctype="x-other" id="6">&lt;uicontrol class="+ topic/ph ui-d/uicontrol "&gt;</ph>Setup<ph id="7">&lt;/uicontrol&gt;</ph> page.
It's a horrible mess that won't make much sense to Translator. There is too much markup inside the translatable string, which is escaped via &lt; and &gt;. Have you tried unescaping this as if it was proper markup? Alternatively you could compress the untranslatable spans to something like #markup1 and see if that makes it better.

@chriswendt1
Copy link
Member

chriswendt1 commented Mar 7, 2023

I see what you mean. The <ph> is a legal XLIFF element and should have passed through as is. I tried unescaping the internal markup and it fails. No surprise, it wouldn't be legal XLIFF.
I'll try a few more things.

@tristanmaccana
Copy link
Author

tristanmaccana commented Mar 7, 2023

Hi Chris I'll share a link to the ditamap with you. I would appreciate your expert eye on this and any insights on how to convert correctly if you get a chance https://drive.google.com/file/d/1PDgIXgGq1qUjKbnMzovpC8flIiIYdkaT/view?usp=share_link
Thanks (ps. Just saw your earlier replies, thank you!)

@chriswendt1
Copy link
Member

Hi @tristanmaccana, I am currently working on adding SRT and VTT file format to Document Translation. Strategy is to transform client-side to a supported document format, packing the additional information in a comment, translating the supported format regularly, and then unpacking at the client. MD is a candidate, because it is so simple to parse. HTML could work as well. Maybe something like this could help here as well, packing additional information into the untranslatable XLIFF markup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants