-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reformat semicolons in names #4755
Comments
First of all i am glad there is a renewed discussion about how OSM should treat multilingual names. Back in 2017/2018 when several of the maintainers of this style tried to suggest changes in mapping practice to address some of the fundamental difficulties in proper labeling of names (see here, here and here) the overall consensus among mappers seemed to be that people were fairly content with the status quo. The relevant issue regarding multilingual name rendering in general by the way is #4404 - where i already commented on (including contemplating the idea to explicitly remove support for the free form compound label painting in the name tag - like with This issue is about the specific suggestion to interpret name tags containing a semicolon separate list of names. This would depend on how widespread this tagging is in the database. I have never seen this being used in the wild so far. If anyone could run through a planet file to see how many cases of this we have so far that would be helpful. For comparison: Having anything other than a single name in a name tag is exotic in general but the free form compound labels mentioned are moderately widespread in some cases:
To be clear: Since we never explicitly supported that kind of compound labeling string in name tag mapping, support for the semicolon separated list would IMO not compete with this to be supported by OSM-Carto. It would still need to have sufficient (and sufficiently widespread) use to be considered a tagging method that has consensus support by the global mapper community. Or to say it with different words: Back in 2014 @matkoniecz on #1086 concluded that storing anything other than a single name in the name tag is not correct tagging. This might have changed since then - but it still would require evidence that it actually has IMO. Also consensus support for the semicolon separated list would IMO mandate explicitly removing support for the free form compound labeling strings as discussed in #4404. Independent of the question if the semicolon separated lists in the name tag have wide support from the world wide mapper community - there is also the other big problem of multilingual names that unfortunately would not be solved by adopting semicolon separated lists in the name tag, that is the Han unification problem - see #2208. Contrary to the name this is not specific to CJK but also occurs with Arabic and Cyrillic scripts - see the issues referencing #2208. This is largely why - as mentioned in the beginning - several maintainers of this style suggested different approaches to the problem of multilingual names that would address both the problem of specifying more than one name as the locally used name (and potentially their order) and to provide information on what languages these names are actually in to allow using the correct typefaces to render them (without resorting to error prone double tagging and name matching heuristics across multiple tags) As said - this matter is independent of the specific suggestion of this issue but i would find it unfortunate if mappers form an opinion on if and how to store multiple names in the name tag without being aware of and having had a chance to consider this other big problem of multilingual names. I would even go as far as saying that the Han unification problem is the larger problem of the two because it affects a much larger number of potential map users (the number of people living in countries where CJK/Arabic/Cyrillic are used is probably much larger than the number of people living in multilingual areas) |
Regarding the specific question of prevalance of semi-colon separations, I ran the following overpass query today:
This returned the following output:
I have not done any further analysis to characterize precisely how those 32,000 objects are distributed in the database. |
it is also possible that I was wrong in 2014. At the very least, it is much more complex in areas where multiple languages are in active use, sometimes without ability to single language dominating over others. (ad to that political complexity of declaring one language dominating over other) |
Link to additional discussions in the community forums on this topic can be found here: |
32k occurrences of Based on a few quick looks around central Europe (including some actual multilingual regions) - my impression is that the three main cases of semicolon in name tag in those areas are:
In the US there also seems to be cases of this coming from Tiger imports: https://www.openstreetmap.org/way/5353554 At a quick look i could not find any occurrence of multilingual names tagged this way. Can anyone point to an area where this is common locally? |
I think there is no need to remove any support, it would be totally fine just to add support for Of course you wont find many |
https://community.openstreetmap.org/t/multiple-delimited-names-in-the-name-tag/6803 has some examples that convinced me otherwise Yes, we have almost no Given that |
Because i am not sure if i have made that clear enough: The question that we need to discuss here is not primarily if the semicolon separated list is a more suitable form to record multiple names in a single name tag than the various free form compound labeling strings entered into name tags of certain features (see #4755 (comment)). That is (a) a discussion for a different venue and (b) something that is not really relevant here at all because we never decided to support the free form compound labeling strings and would probably never have done so if we have had the opportunity to choose. What i would like to know if there is adoption of the semicolon separated list of names idea in tagging multiple names in cases where there is consensus among mappers that having multiple names is a suitable use of the name tag and what these names then actually mean. I listed a few cases where that might be the case in #4755 (comment) but also mentioned that it is not clear if these represent consensus tagging. I in particular so far have not seen any evidence that there is consensus among mappers in multilingual regions, especially those subject to the Han unification problem, that the semicolon separated list is a suitable, let along the desirable form to record multilingual names. I consider this a relevant question, because - as explained above - we know that there are more elegant (in the sense of less error prone and more flexible) ways to handle these cases. If mappers in regions where this is an important matter decide to go with the semicolon separated lists despite the known disadvantages i would consider it our obligation to support that. But so far i see the discussion dominated by people exclusively using latin script and predominantly not from multilingual regions and i am hesitant to consider their view as representative on this matter. The other, more formal technical question is if a semicolon separated list in the name tag is an ordered or an unordered list. If it is ordered it would compete with the In any case - if we should decide to support this it would be essential that QA tools support checking if this tagging is applied in a consistent manner. That means - for multilingual names - if all the components of the semicolon separated list in the name tag are also found in individual name: tags. Because only if there are practically usable QA tools that support checking this can mappers successfully implement and maintain such a tagging scheme in their area. Does anyone know if any of the commonly used QA tools and the verifiers of editors include this kind of consistency check on name tagging? To summarize my current understanding of the use/potential use of semicolon separated lists in the name tag, there seem to be the following subtypes of this:
TL;DR: Main questions that i would appreciate input on are:
|
In multi-lingual areas to me it seems to be the consensus, that Based on my knowledge regarding Germany areas, multiple values in |
Expected behavior
Some features have
name
tags that contain multiple values separated by a semicolon. Some examples from the United States:place=town
node’sname
tag contains an English name and a Yiddish name separated by a semicolon. It is also tagged withname:en
andname:yi
, but the dualname
is appropriate because of the widespread use of a language within the town that would be considered a minority language elsewhere.amenity=place_of_worship
area’sname
tag contains an Amharic name and an English name separated by a semicolon. Both names are signposted equally prominently and used interchangeably. Nodefault_language
tag applies in this case, because that key is intended for administrative boundaries, whereas this is a one-off feature.name
tag contains two English names separated by a semicolon. The road is maintained jointly by two highway departments that disagree on the name for political reasons, going as far as to post competing street name signs up and down the road. As a result, local residents also disagree on the name.Unlike in some countries, there hasn’t historically been a consensus to separate dual names with an ad hoc delimiter such as a hyphen or slash. Instead, it’s not uncommon for mappers to use a standard semicolon value separator as they would with any other key. Apart from consistency with other keys, a semicolon is much less likely to occur within a name in reality.
A mapper who uses the semicolon delimiter would expect a renderer to reformat the semicolon in some fashion. For example, Mapbox-based maps replace each semicolon with a fancy em dash. But perhaps a more language-agnostic treatment would be to replace each semicolon with a newline, just as with
ref
s in #750. A newline would be less ambiguous because it isn’t possible for a raw tag value to contain a newline.Actual behavior
Unfortunately, openstreetmap-carto renders the raw
name
tag verbatim, including the semicolon:Without support for a semicolon delimiter, openstreetmap-carto encourages mappers to choose unpredictable delimiters instead. A previous version of the Kaser node used a slash, indistinguishable from an individual place name or POI name that contains a slash in reality. This is problematic for other data consumers, such as the router GraphHopper, that reasonably expect a semicolon delimiter.
Implementation notes
#750 splits
ref
on;
and recombines it with\n
, primarily to choose a shield image based on the length of the longest name. However, a simplereplace()
could suffice forname
on a point-placed label such as a place or POI.openstreetmap-carto/project.mml
Lines 1775 to 1781 in 62e8d54
There’s also a very rare
;;
escape sequence for cases where a single name legitimately contains a semicolon. To handle this case, thereplace()
call can be nested inside anotherreplace()
call that replaces\n\n
with;
, orregexp_replace()
can be called instead.A newline may not be suitable within line-placed labels (roads, rivers, etc.). In these cases, perhaps an em dash could be used. Though slightly less language-agnostic than a newline, it’s still independent of the writing direction and no less ambiguous to the viewer than a hyphen or slash that’s hardcoded in the database.
/ref #1086 #4404
The text was updated successfully, but these errors were encountered: