Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragmented sentences in translatable strings #6482

Closed
aitap opened this issue Sep 6, 2024 · 4 comments
Closed

Fragmented sentences in translatable strings #6482

aitap opened this issue Sep 6, 2024 · 4 comments
Labels
translation issues/PRs related to message translation projects

Comments

@aitap
Copy link
Contributor

aitap commented Sep 6, 2024

Splitting a sentence into fragments may make it harder to translate because the parts and the whole have different grammatical cases.

For example, here "target vector" would be "целевой вектор", but later "Assigning factor numbers to %s", "target vector" must be "Присваиваю фактор целевому вектору". I think that this is limited to targetDesc() in assign.c, all such cases are in the dative case and so can be translated into Russian pre-inflected, but I'm not sure this would remain the case for Arabic and Hindi.

Also a problem for languages with strict word order might be concatenation of messages in fread.c [1, 2], used here. This, on the other hand, is already slightly broken because msg is a full sentence.

Originally posted by @aitap in #6194 (comment)

@aitap aitap added the translation issues/PRs related to message translation projects label Sep 6, 2024
@rikivillalba
Copy link
Contributor

Can those be enumerated in some site i.e. a branch in which those are flagged with a comment // [fragmented] or alike?

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 6, 2024

The targetDesc() case in assign.c is tough. Basically, there are a series of operations that can either affect a simple vector, or a "special" vector, namely one where we have extra info that it's a certain column in a data.frame. When we have that extra info, it's nice to present to the user, otherwise generic "target vector" is the description. It's used for 7 messages now (perma-linked in details, there's too much noise from the elements there compared to plain copy-paste):

"Assigning factor numbers to %s. But %d is outside the level range [1,%d]"
                             ^^
"Assigning factor numbers to %s. But %f is outside the level range [1,%d], or is not a whole number."
                             ^^
"Coercing 'character' RHS to '%s' to match the type of %s."
                                                       ^^
"Cannot coerce 'list' RHS to 'integer64' to match the type of %s."
                                                              ^^
"Coercing 'list' RHS to '%s' to match the type of %s."
                                                  ^^
"Zero-copy coerce when assigning '%s' to '%s' %s.\n"
                                              ^^
"%"FMT" (type '%s') at RHS position %d "TO" when assigning to type '%s' (%s)"
                                                                         ^^

I don't really see a simple way forward that balances code readability with general translatability -- any suggestions?

@aitap, do I understand that the key is the grammatical case? Maybe we can use a 3rd argument int case in targetDesc() to distinguish "to {target vector,column %d}" from "of {target vector, column %d}", is that sufficient?

https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L789 https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L798 https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L890 https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L897 https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L902 https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L910 https://github.com/Rdatatable/data.table/blob/c616cb9c9a37ad7d3adfcd1e3fedda2c3b1e527a/src/assign.c#L924

@aitap
Copy link
Contributor Author

aitap commented Sep 8, 2024

@rikivillalba So far I've only found targetDesc() and copyFile() to pose problems like this. We should be able to find the rest of the cases (if they exist) as we go through the translatable strings and their context.

@MichaelChirico Very good idea to combine with a preposition, thank you!

I've experimented a bit with automatic translation to Arabic and Hindi and I can't see any differences in inflection of the rest of the sentence (Assigning factor numbers to %s) depending on whether it's a vector or the column. So solving the grammatical case problem could be just enough.

@MichaelChirico
Copy link
Member

Fixed by #6489 and #6483, though there are probably other cases buried in the catalogue somewhere -- please feel free to file more issues & thanks for calling these out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translation issues/PRs related to message translation projects
Projects
None yet
Development

No branches or pull requests

3 participants