-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we expect all strings to have direction metadata? #66
Comments
I mostly agree with what you said above, but I want to call out that machines are not just limited in their ability to identify which strings need individual metadata but also in their ability to decide when that data is needed. In designing (for example) a resource file format, a good optimization is to have a file-level default direction so that only specifically directional strings need to include or override the value. In that case, within that file metadata can be omitted where it does not differ from the file-level base direction. However, when the file is sent for translation, most translation systems break out each individual string resource into a "segment". The segments are sent to translators individually and thus each segment needs to include a slot for the base direction (so that machines aren't guessing what the direction is at the end). When the slot is unfilled "because it is using the default value", you have to know what the default value is. You're expecting humans to make these decisions ("this item has unusual directionality") and then have machines decide whether the metadata was meaningful. This can work in the example of a resource file I give above: the machine might only keep per-string direction metadata when assembling the Arabic translated file that does not match the file-level base direction (which is presumably Notice that a lot of formats work differently. We see a lot of specs that use "language maps" as a form of localization support. So we see stuff like:
And the need to set the direction in this case does not depend on the first-strong requirement. The consumer (let's say it's in an HTML context) needs to bidi isolate and set the direction on the So what I guess I'm saying is that the situation is more complex. We will tend to recommend that specifications:
If the Arabic translation in your comment doesn't have a file-level base direction, then, yeah, every string would have String-meta as it currently sits I think allows for reasonable omission of direction metadata for when it is all the same or when it can be reasonably defaulted and it would be crazy to expect every string to actually populate language and direction (so long at the language and direction are identified somewhere). But it should be possible to compute the language and direction for any string as-if the string had local-to-the-string metadata, right? |
My summary of this conversation is that we are not disagreeing with each other. It should always be possible to associate metadata with each string. It may not be necessary to actually associate a language/direction value to all strings in storage if resource-wide default metadata values can be associated with each string when the consumer tries to use it. That may mean filling in the property values for each string during transmission, or it may i suppose mean sending some information about the expected default direction, which is then applied to each string as the consumer renders it (unless, of course, it already carries its own metadata). So much for strings for which metadata has been set. Speaking of direction, some sets of strings may either: In both cases we have a rule that says that consumers should use first-strong heuristics to determine the direction of each string if there is no metadata available. This should allow correct rendering for the majority of strings. As we discussed in last week's telecon, passing or storing an |
Let's consider a practical scenario where we have a message file containing all the 2,000 natural language strings needed for an Arabic translation of an application's UI and error messages. Let's imagine that the message set contains 10 strings which would produce the wrong direction if first-strong heuristics were applied, or if all strings were expected to have a RTL base direction (eg. Arabic strings that begin with Latin characters, or Mac addresses, or untranslated strings, etc.)
My understanding of what we say in string-meta is that it should be possible to associate direction metadata with all strings in a string set. However, we don't require, or expect every string to have direction metadata. We do, however, expect every string that differs from the default to have direction metadata explicitly assigned to it.
This applies if the message set as a whole has a way to set a default direction for all strings. In this case, this would probably be a file-wide field near the top of the file setting the default base direction for all strings as RTL. Strings that shouldn't have a RTL base direction, must each be labelled for direction (LTR), to override the default setting.
I think that we also say this for strings when there is no default declared at the top of the file. This is on the assumption that, in the absence of direction information, the consumer will use first-strong heuristics to determine the direction. Again, any string that won't produce the correct result via those heuristics will need to have direction metadata associated with it.
While it is not a problem if every string is labelled for direction, i think that the reason for not requiring that is as follows: if every string in the resource has to be labelled, that has to be done by human intervention. A machine is not capable of identifying all 10 strings that should have a LTR base direction. (If a machine could do that, we wouldn't need direction metadata anyway, because the consumer would be able to simply apply the appropriate heuristics.) Therefore, correct labelling requires human intervention. It seems to me that requiring 1,990 strings in a set of 2,000 to be explicitly labelled by hand is too much to ask. Labelling just the 10 strings that would produce incorrect results, however, is achievable and essential.
Note also that requiring every string to have direction metadata explicitly assigned would also invalidate the usefulness of a resource-wide rule or field that sets the default direction (by making it redundant).
The text was updated successfully, but these errors were encountered: