Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure all strings are valid and don't break builds #549

Open
2 of 3 tasks
brian-smith-tcril opened this issue Aug 17, 2023 · 10 comments
Open
2 of 3 tasks

Ensure all strings are valid and don't break builds #549

brian-smith-tcril opened this issue Aug 17, 2023 · 10 comments
Assignees

Comments

@brian-smith-tcril
Copy link
Contributor

brian-smith-tcril commented Aug 17, 2023

@OmarIthawi mentioned an example where invalid strings could break a build here.

what do you think should happen if a po file fails to compile due to Translator writing invalid strings Hi {user} --> Merhaba {kullanci}?

When the strings were checked into the repos directly, this would fail the individual repo CI. With the switch to openedx-translations/openedx-atlas, strings updated in transifex and synced to openedx-translations will not trigger a CI build on any given repo.

We'll want to ensure we don't have strings that break builds checked into openedx-translations, so we should add validation somewhere.

In the same comment, @OmarIthawi mentioned

I'm thinking we should add validation to the https://github.com/openedx/openedx-translations

That seems like a good plan to me

Tasks

Preview Give feedback
@OmarIthawi
Copy link
Member

@brian-smith-tcril I looked up for tools to validate JSON translation files, but found none online.

Does it make sense to convert json to po-files and validate them? I'm not sure if there's a two-way conversion between the two format.

I know that Apache superset has a po2json utility, but I haven't tested it.

@brian-smith-tcril
Copy link
Contributor Author

@OmarIthawi it looks like translate-toolkit could work, it supports converting to and from json.

I was hoping there would be a way to validate without converting out of json by just using the formatjs cli directly, but that doesn't seem to be possible.

I'm not sure if there are any strange edge cases that will trip up the validation with the conversion back and forth, but I think trying out validation with translate-toolkit's json2po and po2json is a reasonable path forward.

@timmc-edx
Copy link
Contributor

It seems that there has been a regression in validation. For example, see this translation where the slot/param name was translated:

https://github.com/openedx/openedx-translations/blob/171142ac6ca8eaabcfea7940d2be935f2264d074/translations/edx-platform/conf/locale/ar/LC_MESSAGES/django.po#L23603-L23605

We have tooling in https://github.com/openedx/i18n-tools that's supposed to catch this sort of thing. It currently picks up about 300 errors in edx-platform alone. Is there some reason this is no longer in use?

timmc-edx added a commit to openedx/edx-platform that referenced this issue Jul 31, 2024
We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. By catching all errors
raised during error-page render and substituting in a hardcoded string, we
can reduce server resources, avoid logging massive sequences of recursive
stack traces, and still give the user *some* indication that yes, there was
a problem.

This should help address #35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

Without the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
timmc-edx added a commit to openedx/edx-platform that referenced this issue Jul 31, 2024
We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. I'm not sure _exactly_
how the loop is occurring, but it looks something like this:

1. An error is raised in a view or middleware and is not caught by
   application code
2. Django catches the error and calls the registered uncaught error
   handler
3. Our handler tries to render an error page
4. The rendering code raises an error
5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in
a hardcoded string, we can reduce server resources, avoid logging massive
sequences of recursive stack traces, and still give the user *some*
indication that yes, there was a problem.

This should help address #35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

Without the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
timmc-edx added a commit to openedx/edx-platform that referenced this issue Jul 31, 2024
We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. I'm not sure _exactly_
how the loop is occurring, but it looks something like this:

1. An error is raised in a view or middleware and is not caught by
   application code
2. Django catches the error and calls the registered uncaught error
   handler
3. Our handler tries to render an error page
4. The rendering code raises an error
5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in
a hardcoded string, we can reduce server resources, avoid logging massive
sequences of recursive stack traces, and still give the user *some*
indication that yes, there was a problem.

This should help address #35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

With the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
@OmarIthawi
Copy link
Member

@timmc-edx We have tests in place that is supposed to match i18n-tools, however it sounds like it needs more work.

completed_process = subprocess.run(
['msgfmt', '-v', '--strict', '--check', po_file],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)

I couldn't make i18n-tools work with non-standard file structure such as translations/edx-platform/... therefore I used msgfmt directly.

What do you suggest?

@timmc-edx
Copy link
Contributor

I don't know exactly what msgfmt checks, but I think it's just basic file structure. i18n_tool checks a variety of other things, including whether there are the same parameter slots in the inputs and outputs. You can see all the things it checks here: https://github.com/openedx/i18n-tools/blob/master/i18n/validate.py

I think it would be appropriate to modify that tool to allow different directory structures (or even just expect a different one).

@OmarIthawi
Copy link
Member

OmarIthawi commented Aug 1, 2024

Yes, it seems like msgfmt is only a part of i18n-tools/blob/master/i18n/validate.py.

I'll keep this issue open for now. Unless someone works on it, I plan to fit within my Open Source contributions late August / early September.

@timmc-edx
Copy link
Contributor

For reference, here are the strings that i18n_tool currently picks out as broken in edx-platform: https://gist.github.com/timmc-edx/eeaa4c1a40ee07871049cba53b30b4fa

@OmarIthawi
Copy link
Member

Thanks @timmc-edx. That's mostly machine translation issues. I recall the Arabic strings to be of high quality and haven't had such issues in the past.

The solution I have in mind is to:

  • add the i18n_tool checks into the validate-translations GitHub action
  • delete the faulty translation files from the openedx-translations repositories
  • let Transifex automatically add the strings again and the CI usual workflow would kick-in and keep the pull requests open until they're all valid

Fixing them by hand all at once might be prohibitively costly.

What do you think?

@timmc-edx
Copy link
Contributor

I think we could do it by hand -- I bet there's a way to bulk find-and-unapprove translations in Transifex, and we could use search strings like &lt; and notranslate to find them. I'd be fine doing that work in between other tasks, and I'd really like to avoid having the site untranslated in those languages for the duration.

But yeah, I think the first step would be adding i18n_tool to the CI step so that translation updates are blocked until they get cleaned up.

@OmarIthawi
Copy link
Member

OmarIthawi commented Aug 7, 2024

@timmc-edx there's a way to bulk review/unreview/delete transaltions. But it'll be one language at a time.

I have capacity issues at the moment regarding my contributions time i.e. I've done much more than planned this month and I need to focus on other projects at hand.

But yeah, I think the first step would be adding i18n_tool to the CI step so that translation updates are blocked until they get cleaned up.

Yes.

If help is there, I'd really appreciate it. Otherwise I'd have to clear my plate a bit.

cmltaWt0 pushed a commit to raccoongang/edx-platform that referenced this issue Aug 8, 2024
We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. I'm not sure _exactly_
how the loop is occurring, but it looks something like this:

1. An error is raised in a view or middleware and is not caught by
   application code
2. Django catches the error and calls the registered uncaught error
   handler
3. Our handler tries to render an error page
4. The rendering code raises an error
5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in
a hardcoded string, we can reduce server resources, avoid logging massive
sequences of recursive stack traces, and still give the user *some*
indication that yes, there was a problem.

This should help address openedx#35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

With the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants