Create alignment.md #459

liashahnazaryan · 2023-03-28T21:23:47Z

Description

Fixes # 71

Type of PR

Creates the article [Alignment]

Checklist:

I have read the contributing guidelines.
I have followed the style guide.

liashahnazaryan · 2023-03-29T17:45:21Z

customisation/parallel-data.md

@@ -21,8 +21,7 @@ Parallel data sets can be created manually, automatically, or created synthetica
 - Human [post-editing](../workflows/post-editing.md)
 - [Crawling](crawling.md)
 - [Alignment](alignment.md)
-
-Parallel data can be created by crawling and aligned monolingual test, and by [back-translation](back-translation.md) or [back-copying](back-translation.md).


I suggest removing the whole sentence here, as it mostly repeats the previously mentioned points. As for creating parallel data by aligned monolingual text, I'm not sure if it's relevant here, as monolingual data alignment is used to create comparable corpora in a single language.

Yes, you are right!

cefoo

Thank you so much for your PR, @liashahnazaryan!

I've added some comments, especially to try to avoid repetitions. Let me know what you think!

cefoo · 2023-03-30T12:40:24Z

customisation/alignment.md

+
+Alignment can be used to create [parallel data](/customisation/parallel-data.md).
+The aligned parallel corpora are then used to train machine translation models.
+The goal is to help the machine translation system accurately translate text from one language to another by recognising patterns and regularities in the data. 


This sentence may be too long. Perhaps we can rephrase it to something like this (it doesn't have to be exactly like this):

The goal of this task is to allow the machine translation system to recognize patterns and regularities, and its equivalents.

cefoo · 2023-03-30T12:42:39Z

customisation/alignment.md

+The aligned parallel corpora are then used to train machine translation models.
+The goal is to help the machine translation system accurately translate text from one language to another by recognising patterns and regularities in the data. 
+
+#### Example


We tend not to use ####.
Perhaps it's better to have titles introduced with ## and examples, with ###?

cefoo · 2023-03-30T12:49:21Z

customisation/alignment.md

+
+German: `Das` `Buch` `liegt` `auf` `dem` `Tisch` `.`
+
+By identifying the corresponding words, such as `book` and `Buch` or `table` and `Tisch`, the two example sentences are aligned and used as [training data](/customisation/training-data.md) for the machine translation system.


This may be a repetition of what we have said in the previous part, when defining alignment.

Perhaps, we can rephrase it so that it just introduces new info:

In word-level alignment, the corresponding words, such as book and Buch, or table and Tisch are identified, aligned and used as training data.

Should we explain phrase- and sentence-level alignment with this sentence too?

cefoo · 2023-03-30T12:50:47Z

customisation/alignment.md

 ---
+
+**Alignment** is the process of identifying and linking the corresponding text units in the source and target languages.
+Data sets can be aligned at the word, phrase, or sentence level.


I think the dash is necessary:

"... at the word-, phrase-, or sentence-level"

I didn't use a hyphen in that sentence, as I wanted "word", "phrase" and "sentence" to modify the word "level", while in other cases where I use, e.g., "word-, phrase-, and sentence-level alignment", "word-, phrase-, and sentence-level" are used as adjectives to modify "alignment".

Yes, I think no hyphen is correct in this case.

cefoo · 2023-03-30T12:53:18Z

customisation/alignment.md

+
+### Approaches
+
+Machine translation systems use various alignment approaches to link two data sets at different granularity levels.


Perhaps just "Alignment approaches are based on different granularity levels."? Or does it delete important information?

"Granularity levels" here are used to describe the two data sets that should be aligned. But now that I think about this, the whole sentence sounds redundant, as we wouldn't need an alignment if we knew that the data sets were identical. So I think we can delete the sentence and just pass to enumerating the approaches. What do you think?

Less is more, haha :)
Sure, go ahead.

cefoo · 2023-03-30T12:53:55Z

customisation/alignment.md

+
+Machine translation systems use various alignment approaches to link two data sets at different granularity levels.
+
+- In manual alignment, bilingual human translators align corresponding text [segments](/concepts/segment.md) in the source and target languages.


I'd avoid the "bilingual" in "bilingual human translators", as it is implied.

In other articles, we tried to avoid "source" and "target", although it's correct, and use "input" and "output" languages.
Do you think it would be a good idea to add these term preferences to the Style Guide?

I think it's a good idea as it can help to avoid confusion around such cases where the terminology varies depending on the contributor's preferences.

cefoo · 2023-03-30T13:07:42Z

customisation/alignment.md

+Machine translation systems use various alignment approaches to link two data sets at different granularity levels.
+
+- In manual alignment, bilingual human translators align corresponding text [segments](/concepts/segment.md) in the source and target languages.
+- [Rule-based machine translation](/approaches/rule-based-machine-translation.md) uses linguistic rules and patterns to align words and phrases in two languages.


To avoid repetition:

Rule-based machine translation uses linguistic rules and patterns.

liashahnazaryan · 2023-03-30T15:10:52Z

Thank you so much for your PR, @liashahnazaryan!

I've added some comments, especially to try to avoid repetitions. Let me know what you think!

Thank you for the comments, @cefoo!
I've made several changes and responded to your comments where relevant. Hope I haven't missed anything.

bittlingmayer · 2023-03-31T16:01:00Z

I think this article should be only about aligning sentences between a pair of documents, not about aligning words within a pair of sentences.

Or, we should have 2 separate articles, Sentence alignment and Word alignment.

Made several changes to eliminate word and phrase-level alignment from the article, as sentence alignment is more relevant to machine translation.

liashahnazaryan

Deleted parts about the word and phrase-level alignment from the article to be more relevant to machine translation.

liashahnazaryan · 2024-02-23T12:37:17Z

Hi, @cefoo! I've made several minor changes to the article. Please let me know what you think :)

cefoo

Hi @liashahnazaryan!
Thank you so much for this update! The article is looking good!!
Tagging @bittlingmayer for his review as well.

cefoo · 2024-02-27T13:07:51Z

customisation/alignment.md

+**Alignment** is the process of identifying and linking the corresponding sentences in the input and output languages.
+
+Alignment can be used to create [parallel data](/parallel-data).
+The aligned parallel corpora are then used to train machine translation models.


Could we link the term "train" to training, even though it doesn't exist yet?

cefoo · 2024-02-27T13:10:50Z

customisation/alignment.md

+
+Alignment can be used to create [parallel data](/parallel-data).
+The aligned parallel corpora are then used to train machine translation models.
+The goal is to improve machine translation accuracy through pattern and regularity recognition in data.


Maybe to make it simpler:

The goal is to improve machine translation accuracy by recognizing patterns and their frequency in data.

It may be a silly update, but the term "regularity", although accurate, made me think of academic/research speech.

cefoo · 2024-02-27T13:13:30Z

customisation/alignment.md

+The statistical relationships are based on the likelihood of observing alignments in a training corpus.
+- With neural approaches, alignment is predicted automatically through [neural networks](/neural-machine-translation#neural-networks) by mapping the input and output sentences into [vectors](/vector).
+
+## Challenges


Do you think examples would be helpful? I am thinking specifically of the second and, specially, the last item in this list.

cefoo · 2024-02-27T13:13:51Z

customisation/alignment.md

+- Aligning sentences with varying lengths, punctuation, and complex structures can be challenging for alignment algorithms.
+- Many words and phrases can have multiple meanings or form idiomatic expressions.
+Semantic ambiguity can trigger inaccurate sentence alignments. 
+- Typological similarities of languages can result in sentence pairs that share highly similar linguistic properties but have different meanings and translations.


Maybe a comma before "but"?

I think there's no need for it as the subject doesn't change.

liashahnazaryan · 2024-02-29T16:34:07Z

Hey, @cefoo! Thanks for the comments. I've made several changes. Please let me know what you think about the examples. Do they need more explanation, or are they good to go as they are?

Create alignment.md

b1fbd37

liashahnazaryan mentioned this pull request Mar 28, 2023

Article: Alignment (and term extraction) #71

Open

liashahnazaryan added 3 commits March 29, 2023 10:23

Update alignment.md

17d9b3c

Minor edits

0e9e61e

Merge branch 'patch-73' into patch-72

92a8aff

liashahnazaryan commented Mar 29, 2023

View reviewed changes

Fixes

c41f547

cefoo reviewed Mar 30, 2023

View reviewed changes

New edits

219b038

liashahnazaryan marked this pull request as draft April 7, 2023 19:14

Update alignment.md

20a259b

Made several changes to eliminate word and phrase-level alignment from the article, as sentence alignment is more relevant to machine translation.

liashahnazaryan marked this pull request as ready for review June 22, 2023 17:35

liashahnazaryan commented Jun 22, 2023

View reviewed changes

Minor changes

8dd0b40

cefoo reviewed Feb 27, 2024

View reviewed changes

Adding examples.md

7157343

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create alignment.md #459

Create alignment.md #459

liashahnazaryan commented Mar 28, 2023 •

edited

Loading

liashahnazaryan Mar 29, 2023

cefoo Mar 30, 2023

cefoo left a comment

cefoo Mar 30, 2023

cefoo Mar 30, 2023

cefoo Mar 30, 2023

cefoo Mar 30, 2023

cefoo Mar 30, 2023

liashahnazaryan Mar 30, 2023

bittlingmayer Mar 31, 2023

cefoo Mar 30, 2023

liashahnazaryan Mar 30, 2023

cefoo Mar 30, 2023

cefoo Mar 30, 2023

cefoo Mar 30, 2023

liashahnazaryan Mar 30, 2023

cefoo Mar 30, 2023

liashahnazaryan commented Mar 30, 2023

bittlingmayer commented Mar 31, 2023

liashahnazaryan left a comment

liashahnazaryan commented Feb 23, 2024

cefoo left a comment

cefoo Feb 27, 2024

cefoo Feb 27, 2024

cefoo Feb 27, 2024

cefoo Feb 27, 2024

liashahnazaryan Feb 29, 2024

liashahnazaryan commented Feb 29, 2024


		German: `Das` `Buch` `liegt` `auf` `dem` `Tisch` `.`

		By identifying the corresponding words, such as `book` and `Buch` or `table` and `Tisch`, the two example sentences are aligned and used as [training data](/customisation/training-data.md) for the machine translation system.


		### Approaches

		Machine translation systems use various alignment approaches to link two data sets at different granularity levels.


		Machine translation systems use various alignment approaches to link two data sets at different granularity levels.

		- In manual alignment, bilingual human translators align corresponding text [segments](/concepts/segment.md) in the source and target languages.

Create alignment.md #459

Are you sure you want to change the base?

Create alignment.md #459

Conversation

liashahnazaryan commented Mar 28, 2023 • edited Loading

Description

Type of PR

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cefoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liashahnazaryan commented Mar 30, 2023

bittlingmayer commented Mar 31, 2023

liashahnazaryan left a comment

Choose a reason for hiding this comment

liashahnazaryan commented Feb 23, 2024

cefoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liashahnazaryan commented Feb 29, 2024

liashahnazaryan commented Mar 28, 2023 •

edited

Loading