Skip to content

Commit

Permalink
feat: mark invalid entries as fuzzy
Browse files Browse the repository at this point in the history
  • Loading branch information
OmarIthawi committed Oct 23, 2023
1 parent 39d2ed1 commit 851ba5e
Show file tree
Hide file tree
Showing 6 changed files with 318 additions and 24 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/validate-translation-files.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ jobs:
message: |
:white_check_mark: All translation files are valid.
```
${{ steps.validate_translation_files.outputs.VALIDATION_ERRORS }}
```
This comment has been posted by the `validate-translation-files.yml` GitHub Actions workflow.
message-failure: |
Expand Down
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,14 @@ test_requirements: ## Installs test.txt requirements
test: ## Run scripts tests
pytest -v -s --cov=. --cov-report=term-missing --cov-report=html scripts/tests

validate_translation_files: ## Run basic validation to ensure files are compilable
validate_translation_files: ## Run basic validation to ensure translation files are valid
python scripts/validate_translation_files.py


validate_translation_files_mark_fuzzy: ## Run basic validation and mark invalid entries as fuzzy
python scripts/validate_translation_files.py --mark-fuzzy


sync_translations: ## Syncs from the old projects to the new openedx-translations project
python scripts/sync_translations.py $(SYNC_ARGS)

Expand Down
82 changes: 82 additions & 0 deletions docs/decisions/0002-mark-fuzzy-entries.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
Mark Transifex invalid entries as fuzzy
#######################################

Problem
*******

As part of the `OEP-58`_ Transifex GitHub App bot has now syncs Translations.
Bot creates pull requests with invalid entries in the
po-files. These entries are validated by `LexiQA`_, Transifex-builtin quality
checks as well as human reviewers. However, some invalid translations still
make it into the sync translation pull requests which breaks
``django compilemessages`` command and is catchable by ``msgfmt``.

If Transifex had marked the invalid entries as fuzzy, then the
``django compilemessages`` command would have ignored them. However, that's
not the case. Therefore, we need to perform this task ourselves.


Design Decision
***************

Implement a GitHub Actions workflow that will mark the invalid entries as
``fuzzy``. Then create a commit -- editing the pull request -- to branch
of the bot pull request.

To ensure a safe and reliable workflow, the following workflows will be
combined into one single workflow with multiple jobs:

#. Run ``python-tests.yml`` to validate the python code
#. Then, run ``validate-translation-files.yml`` which performs the following:

#. Validate the po-files using ``msgfmt``
#. Mark the invalid entries as fuzzy (writes to the pofile)
#. Validate the files again to ensure the files are became valid
#. Commit the fixes if any

#. Mark invalid entires as fuzzy therefore excluded from .mo files as well as msgfmt
#. Validate the files again to ensure the files are still valid
#. Commit the fixes
#. Run the lint commits
#. Push to the branch
#. Trigger auto merge

**Pros**

* There's no need for manual intervention, therefor it's fast and won't
create a backlog of pull requests.


**Cons**

* The workflow script runs and edits the pull request, which can be
confusing for the reviewers.
* The workflow lacks a notification mechanism to notify the reviewers
that their translations are invalid.

Rejected Alternatives
*********************

- **Create a Transifex issue:** It would be possible to identify the faulty
entries and create a Transifex issue via API to the faulty entries.
However, this would require attention from the translators
team, which can take a rather long time therefore creating a backlog of
invalid entries. Additionally, Transifex issues are not understood or used
by most of the Open edX community.

- **Post errors on Slack:** Post the errors on Slack and ask the translators
to fix them. Same as the previous option, not all translators are on Slack.
Additionally, this option would have a slow feedback loop causing the pull
requests backlog to build-up.

- **Mark faulty entries as unreviewed:** Use the Transifex API to mark the
the invalid entires as unreviewed, then pull only reviewed entries into this
repository.
This option would require an extensive use the of the Transifex API,
and pull again which can be complex to implement. Additionally, this option
would have a slow feedback loop causing the pull requests backlog to
build-up.


.. _OEP-58: https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0058-arch-translations-management.html
.. _LexiQA: https://help.transifex.com/en/articles/6219179-lexiqa
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,51 @@ msgstr ""
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"Generated-By: Babel 2.8.0\n"


#: default_data.py:5
msgid ""
"An isosceles triangle with three layers of similar height. It is shown "
"upright, so the widest layer is located at the bottom, and the narrowest "
"layer is located at the top."
msgstr ""

#, python-brace-format
msgid "{action} invalid actions."
msgstr "{something_else} an invalid action."

#: default_data.py:20
msgid ""
"Use this zone to associate an item with the bottom layer of the triangle."
msgstr ""

#: default_data.py:22
msgid "Correct! This one belongs to The Top Zone."
msgstr ""

#, python-brace-format
msgid "{action} is not a valid action."
msgstr "{कार्रवाई} एक वैध कार्रवाई नहीं है।" # Invalid brace-format with encoding errors
msgid ""
"{action2} another invalid action."
msgstr "{कार्रवाई2} एक वैध कार्रवाई नहीं है।"

#: default_data.py:23
msgid "Correct! This one belongs to The Middle Zone."
msgstr ""

#: default_data.py:24
msgid "Correct! This one belongs to The Bottom Zone."
msgstr ""

#: default_data.py:25
msgid "No, this item does not belong here. Try again."
msgstr "नहीं, यह आइटम यहाँ नहीं है। पुनः प्रयास करें।"

#: default_data.py:26
msgid "You silly, there are no zones for this one."
msgstr "आप मूर्खतापूर्ण हैं, इसके लिए कोई क्षेत्र नहीं है।"

#, python-brace-format
msgid "{action3} one more time!"
msgstr ""
"{कार्रवाई3} एक वैध कार्रवाई नहीं है।"
""

58 changes: 53 additions & 5 deletions scripts/tests/test_validate_translation_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
"""

import os.path
import re
import shutil

from ..validate_translation_files import (
get_translation_files,
main,
validate_translation_files,
)

SCRIPT_DIR = os.path.dirname(__file__)
Expand Down Expand Up @@ -35,7 +37,7 @@ def test_main_on_invalid_files(capsys):
Integration test for the `main` function on some invalid files.
"""
mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir')
exit_code = main(mock_translations_dir)
exit_code = validate_translation_files(mock_translations_dir)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout'
Expand All @@ -44,8 +46,8 @@ def test_main_on_invalid_files(capsys):
assert 'hi/LC_MESSAGES/django.po' not in out, 'Invalid file should be printed in stderr'
assert 'en/LC_MESSAGES/django.po' not in out, 'Source file should not be validated'

assert 'INVALID:' in err
assert 'hi/LC_MESSAGES/django.po' in err
assert re.match(r'INVALID: .*hi/LC_MESSAGES/django.po', err)
assert 'Errors:' in err
assert '\'msgstr\' is not a valid Python brace format string, unlike \'msgid\'' in err
assert 'FAILURE: Some translations are invalid.' in err

Expand All @@ -57,11 +59,57 @@ def test_main_on_valid_files(capsys):
Integration test for the `main` function but only for the Arabic translations which is valid.
"""
mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir/demo-xblock/conf/locale/ar')
exit_code = main(mock_translations_dir)
exit_code = validate_translation_files(mock_translations_dir)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout'
assert 'ar/LC_MESSAGES/django.po' in out, 'Arabic translation file is valid'
assert 'SUCCESS: All translation files are valid.' in out
assert 'Errors:' not in out, 'There should be no errors'
assert not err.strip(), 'The stderr should be empty'
assert exit_code == 0, 'Should succeed due in validating the ar/LC_MESSAGES/django.po file'


def test_main_on_valid_files_with_mark_fuzzy(capsys, tmp_path):
"""
Test the `main` function with the --mark-fuzzy option.
"""
original_mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir')
mock_translations_dir = tmp_path / 'translations'
shutil.copytree(original_mock_translations_dir, mock_translations_dir)

exit_code = validate_translation_files(translations_dir=mock_translations_dir, mark_fuzzy=True)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout'
assert re.match(r'FIXED: .*hi/LC_MESSAGES/django.po', err), 'Should print the FIXED files in stderr'
assert 'ar/LC_MESSAGES/django.po' in out, 'Arabic translation file is valid regardless'
assert 'NOTICE: Some translations were fixed.' in err, 'Should print NOTICE in stderr'
assert 'SUCCESS: All translation files are now valid.' in err, 'Should print SUCCESS in stderr'
assert 'Previous errors:' in err, 'Original errors should be printed'
assert 'New output:' in err, 'New output should be printed after the fix'
assert exit_code == 0, (
'Should succeed even though the in/LC_MESSAGES/django.po is invalid, because it was marked as fuzzy'
)


def test_main_on_valid_files_with_mark_fuzzy_unfixable_files(capsys, tmp_path):
"""
Test the `main` function with the --mark-fuzzy option but the file is broken in an unknown way.
"""
original_mock_translations_dir = os.path.join(SCRIPT_DIR, 'mock_translations_dir')
mock_translations_dir = tmp_path / 'translations'
shutil.copytree(original_mock_translations_dir, mock_translations_dir)
hi_language_file = mock_translations_dir / 'demo-xblock/conf/locale/hi/LC_MESSAGES/django.po'

with open(hi_language_file, 'a') as f:
f.write('<<<<Some random string to break the file>>>>')

exit_code = validate_translation_files(translations_dir=mock_translations_dir, mark_fuzzy=True)
out, err = capsys.readouterr()

assert 'VALID:' in out, 'Valid files should be printed in stdout regardless'
assert re.match(r'INVALID: .*hi/LC_MESSAGES/django.po', err), 'Should print the Error files in stderr'
assert 'Previous errors:' in err, 'Original errors should be printed'
assert 'New errors:' in err, 'New errors should be printed after the fix'
assert exit_code == 1, 'Should fail due to unfixable hi/LC_MESSAGES/django.po file'
Loading

0 comments on commit 851ba5e

Please sign in to comment.