Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Start adding multi-language support #38

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zarch
Copy link

@zarch zarch commented Oct 10, 2024

PR that tried to fix #7.

The main changes are:

  • Start introducing 'gettext' on dashboard and variables to traslate the name expose on the streamlit interface,
  • Add also files to translate from english to german,
  • Remove line on '.gitignore' file to not ignore translation files

The current state is not fully functional, in particular I had issues with st.subheader like for instance:

st.subheader("Log(p)-h Diagram")

If I change them to:

st.subheader(_("Log(p)-h Diagram"))

When I execute the code I get:

Traceback (most recent call last):
  File "/data/src/zheatpumps/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
             ^^^^^^
  File "/data/src/zheatpumps/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
    exec(code, module.__dict__)
  File "/data/src/zheatpumps/src/heatpumps/hp_dashboard.py", line 1025, in <module>
    st.subheader(_("Log(p)-h Diagram"))
                 ^^^^^^^^^^^^^^^^^^^^^
TypeError: 'DeltaGenerator' object is not callable

That I don't know exactly how to fix.

I tried to build a minimal example:

import pathlib
import gettext
import streamlit as st

PRJDIR = pathlib.Path(__file__).parent.absolute()
LOCDIR = PRJDIR / "locales"
STADIR = PRJDIR / "src" / "heatpumps" / "static"
IMGDIR = STADIR / "img"

print(f"{LOCDIR=}")
print(f"{STADIR=}")
print(f"{IMGDIR=}")

# Initialize gettext
lang = gettext.translation("dashboard", localedir=LOCDIR, languages=["de"])
lang.install()
_ = lang.gettext

print(f"Translate `T-s Diagram`: {_("T-s Diagram")}")
print(f"Translate `Log(p)-h Diagram`: {_("Log(p)-h Diagram")}")

# Streamlit code
st.set_page_config(
    layout="wide",
    page_title=_("Heat Pump Dashboard - test translation"),
    page_icon=(IMGDIR / "page_icon_ZNES.png").as_posix(),
)

with st.sidebar:
    st.subheader(_("T-s Diagram"))
    st.subheader(_("Log(p)-h Diagram"))

But if I run the code it works:

❯ uv run streamlit run loc.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.1.35:8501

LOCDIR=PosixPath('/data/src/syn/tespy_test/heatpumps/locales')
STADIR=PosixPath('/data/src/syn/tespy_test/heatpumps/src/heatpumps/static')
IMGDIR=PosixPath('/data/src/syn/tespy_test/heatpumps/src/heatpumps/static/img')
Translate `T-s Diagram`: T-s Diagramm
Translate `Log(p)-h Diagram`: Log(p)-h Diagramm
^[^C  Stopping...

I'm not a streamlit expert do you have any idea suggestion on how should I fix this?

Start introducing 'gettext' on dashboard and variables to traslate the name expose on the streamlit interface, add also files to translate from english to german, remove line on '.gitignore' file to not ignore translation files

fix jfreissmann#7
@zarch zarch marked this pull request as draft October 10, 2024 08:36
@fwitte
Copy link
Contributor

fwitte commented Oct 15, 2024

Hi @zarch,

interesting approach you suggest (I have just seen this as I opened my PR, so I thought I'd jump on it).

I have read a little bit about the streamlit and gettext approaches (https://readmedium.com/how-to-build-a-multi-language-dashboard-with-streamlit-9bc087dd4243). What I find a little bit strange, is the fact that apparently you have to regenerate the files in case the code changes? Or does that happen automatically?

Best

Francesco

@zarch
Copy link
Author

zarch commented Oct 16, 2024

Hi @fwitte ,

What I find a little bit strange, is the fact that apparently you have to regenerate the files
in case the code changes? Or does that happen automatically?

I'm not an expert on translation, but from other projects that I've seen, every time that a message msgid is changed the translation should be changed too, otherwise it will display the original msgid string. The msgid is the string contains in _("<msgid>") within the python code.

So to answer to your question, it is possible to automatically update/generate the template using a tool and command like:

$ pygettext3.12 -d base -o locales/dashboard.pot src/heatpumps/hp_dashboard.py

However, for each msgid that has been changed in the code you also need to manually update the files in locales/{lang-code}/LC_MESSAGES/*.po and then build the binary used by gettext with:

$ msgfmt -o locales/de/LC_MESSAGES/dashboard.mo locales/de/LC_MESSAGES/dashboard.po

To me this process/check should be performed before a release, I do not know if there are any github actions that perform these kind of checks automatically.
Having a short look preparing this answer I see potranslator that automatically translate using google translator the msgid but I do not know if the output is good enough and I do not see github actions ready to use that can automatically triggered and generate / update all the related files.

A project that I follow grass use weblate that, in case of grass is self-hosted on OSGeo.
From their website I see that for opensource project it is possible to apply for free.

@zarch zarch changed the title feat: feat: Start adding multi-language support feat: Start adding multi-language support Oct 16, 2024
@jfreissmann
Copy link
Owner

jfreissmann commented Oct 16, 2024

Hey @zarch and @fwitte,

thank you for your proposals regarding this issue. We are very glad to welcome anyone who wants to help build great software. My first instinct to tackle this issue were dictionaries, but you are right that gettext seems to be the recommended tool. To me, the necessary workflow seems a bit more unintuitive and therefore possibly a hurdle for external contributors, if it is not documented thoroughly and straight forward. This holds especially true, if there is no automation like GitHub Actions to automate the building of the binaries and so forth. Furthermore, I think we should refrain from using automatic translations and rather do those by hand, as we deal with technical language, which for now seems to be a mixed bag in the quality of the translations. If that reliably improves at some point, I would not object to using automation any more.

Two hints regarding the issues with your minimal example and the PR:

  1. A quick search of the error you are getting (TypeError: 'DeltaGenerator' object is not callable) seems to lead to a streamlit object being called, i.e. is followed by a pair of paratheses, that is not intended to be called. This example from the official forum has a streamlit.column object being called like a function, when it should not.
  2. The gettext module can not and will not handle f-Strings, as those can not be evaluated before runtime. This issue I found talks about that and gives the alternative to first let gettext translate the part without the formatted string and fill it in afterwards (e.g. _("my string {}").format(variable)). This is a bit annoying, as we like using f-strings and they are therefore found quite often inside the dashboard codebase.

Best regards,
Jonas

@zarch
Copy link
Author

zarch commented Oct 16, 2024

Hi @jfreissmann ,

Thank you for your feedback.

My first instinct to tackle this issue were dictionaries, but you are right that gettext seems to be the recommended tool.

I'm afraid that using a custom dictionary mechanism will not improve a lot.
You still need to:

  • extract from the code or specify from the code the string that can be translated, similar on what gettext is doing with _();
  • make the translation persistent in a python or json file;
  • make a mechanism that update the keys;
  • and you still have to keep the translation file update once on a while.

More or less is what gettext is doing already.
Using a "standard" has a plus that contributors can used other tools to support the translation e.g. poedit or the previous mentioned weblate.

To me, the necessary workflow seems a bit more unintuitive and therefore possibly a
hurdle for external contributors, if it is not documented thoroughly and straight forward.
This holds especially true, if there is no automation like GitHub Actions to automate
the building of the binaries and so forth.

I agree, the full workflow is not clear to me too 🤣
From my quick research I did not found a Github Actions that is ready to be used for this use case, but I believe that once the workflow is clear it should be possible to implement an Action to support on this task.

I think we should refrain from using automatic translations and rather do those by
hand, as we deal with technical language, which for now seems to be a mixed bag
in the quality of the translations.

It is fine for me, I just found the project and I bring it into the discussion I have no strong opinion on that.

A quick search of the error you are getting (TypeError: 'DeltaGenerator' object is not callable)
seems to lead to a streamlit object being called, i.e. is followed by a pair of paratheses,
that is not intended to be called. This example from the official forum
has a streamlit.column object being called like a function, when it should not.

I saw the discussion but I was not able to understand how to apply on our case. Especially because I tried to build a minimum reproducible examples and it works... So I'm not sure that I properly understood where the problem is.

The gettext module can not and will not handle f-Strings, as those can not be evaluated before runtime.
QubesOS/qubes-issues#7824 talks about that and gives the alternative
to first let gettext translate the part without the formatted string and fill it in afterwards
(e.g. _("my string {}").format(variable)).
This is a bit annoying, as we like using f-strings and they are therefore found quite
often inside the dashboard codebase.

I'm aware of this and the current version is already dealing / removing / splitting the text to avoid this kind of issues.
😄

@jfreissmann
Copy link
Owner

I'm afraid that using a custom dictionary mechanism will not improve a lot.
You still need to:

extract from the code or specify from the code the string that can be translated, similar on what gettext is doing with _();
make the translation persistent in a python or json file;
make a mechanism that update the keys;
and you still have to keep the translation file update once on a while.

More or less is what gettext is doing already.

Full agreeance.

I agree, the full workflow is not clear to me too 🤣
From my quick research I did not found a Github Actions that is ready to be used for this use case, but I believe that once the workflow is clear it should be possible to implement an Action to support on this task.

I think, a well documented workflow is necessary, but also sufficient to make the decision to move on and use gettext.

I'm aware of this and the current version is already dealing / removing / splitting the text to avoid this kind of issues.
😄

Sorry, i must have missed that between all the changes. My bad!

@jfreissmann
Copy link
Owner

Meanwhile, I think, I found the cause of the problem your PR is facing. The error occured in line 1025, which is preceeded by the creation of multiple columns (your line 1015), some of which are only used as padding and were therefore not needed in the further code. In accordance with best practices, we used the underscore _ character to signal to others, that these columns are indeed unused. This usage overwrites the gettext underscore method and the error occurs, as the column object is not callable.

# %% State Diagrams
col_left, _, col_right = st.columns([0.495, 0.01, 0.495])
_, slider_left, _, slider_right, _ = st.columns([0.5, 8, 1, 8, 0.5])

if is_dark:
    state_diagram_style = "dark"
else:
    state_diagram_style = "light"

with col_left:
    # %% Log(p)-h-Diagram
    st.subheader(_("Log(p)-h Diagram"))

We could either rename these unused columns. On the other hand, streamlit now allows to pass a keyword argument to chose between a small, medium or large gap between columns. Maybe this option makes the use of padding columns obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multi language support
3 participants