Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the list of place=[value] values that are included in search results #646

Open
jeffreyameyer opened this issue Dec 6, 2023 · 18 comments

Comments

@jeffreyameyer
Copy link
Member

What's your idea for a cool feature that would help you use OHM better.
The current list of place types used by Nominatim in search result is very US/UK-centric, modern, and does not enable the richness of place types known throughout the world.

If we could expand this list to something with more place values, we might be able to support local place types, and improve search results. See: #243

An updated address-levels.json file is included as an example of this. I haven't put together a PR, as I don't have local testing set up, but perhaps it could be reviewed for consideration.

cc: @1ec5

@jeffreyameyer
Copy link
Member Author

Related to #640, perhaps the search results should include only the links to chronology relations, when they are available? Users could then refine/choose the relevant time for their search target from the chronology relation?

@1ec5
Copy link
Member

1ec5 commented Dec 9, 2023

I’m skeptical that such regionally specific values as rancho should really be place=* values. Nominatim is capable of indexing features based on secondary tags, so there’s no need to muddle the global place classification ontology by making it more specific than what data consumers would generally work with. (In California, I’ve been using boundary=parcel border_type=rancho on the rancho boundaries.) There’s certainly room for coining new place=* values for types of places that were once relevant in history but are no longer as relevant. But it’s not as though place=* was ever trying to be strongly tied to local customs or government structures anyhow.

@jeffreyameyer
Copy link
Member Author

jeffreyameyer commented Dec 9, 2023

Understood on the skepticism & that expanding place may cause some problems, so I'm trying to understand them & also the role of the current place=* tag in the OSM-world taxonomy. Also understood that these tags may be hidden and representative of other things, but, given that metadata travels with the object & may be used outside of OSM, I think we should be careful to avoid factually misleading tags.

For example, Puerto Rico, Guam (ty!), and the US Virgin Islands are not US states, but all are tagged with place=state and show up in search results as "states".

Guam, PR, and USVI are all also tagged as admin_level=4, which is used for rendering. Wikidata, interestingly neither calls Guam a state nor assigns any sort of admin level.

Many prefectures in Japan avoid the use of place=* altogether, which means the only taxonomic categorization is in the text of the name, which seems like a missed opportunity for cultural support and accuracy.

"Place" is such a culturally specific term, it seems like a place where we would want to be expansive in our use of the term, especially if it is a locally-understood definition.

Also, if you look at something like Russian oblasts, which shows up as a "state" in search results, you might miss the fact that there are several other types of Russian places at the same level administratively that are not oblasts (such as republics). And, you'd have no way of knowing that - in the case of Russia, that there is a level in between some (not all...) of those places and place=country. In this case, districts, which also have no place=* tag & the place type is found only in the name.

One possible (not 100% sure) benefit of making this change in the search results is that I don't believe (🤞) that it impacts much else in the stack, although it might be good to update the editors with some additional values, but it wouldn't have to be.

Maybe we should start a forum thread?

@jeffreyameyer
Copy link
Member Author

Also - another nice thing about expanding the place=* taxonomy is that it is orthogonal (afaict) to the boundary=* and admin_level=* tag functions, which also don't feel right to me (tight coupling), but that's a topic for a different ticket / forum post.

@Rub21
Copy link

Rub21 commented Apr 9, 2024

@jeffreyameyer Can you drop some examples which places should be included in nominatim with the new configuration OpenHistoricalMap/nominatim-ui#2

@Rub21
Copy link

Rub21 commented Apr 9, 2024

I did a re-import in nominatim(development) and the address-levels.json file works fine, we can test later this ins staging.

@jeffreyameyer
Copy link
Member Author

Hi @Rub21 - can you search on "Oregon" and show me what pops up in your search results?

@jeffreyameyer
Copy link
Member Author

@Rub21 - I know you've been cranking on other stuff - ty! Any eta for an update here?

@1ec5
Copy link
Member

1ec5 commented Apr 24, 2024

Also - another nice thing about expanding the place=* taxonomy is that it is orthogonal (afaict) to the boundary=* and admin_level=* tag functions

In my view, place=* doesn’t need to be expanded much at all, precisely because it is orthogonal to the boundary=*, admin_level=*, and border_type=* keys. For place=* points that represent populated places (human settlements), values like town and city give us an opportunity to take a more holistic measure of a place’s importance irrespective of its governance model. This allows us to avoid impossible hypotheticals like: Would Londinium’s political status in the 4th century qualify as a city in modern-day British English nomenclature? Whereas even with the current set of place=* values, we can determine that, with a population of up to 20,000, it was functionally a city at the time.

Also, if you look at something like Russian oblasts, which shows up as a "state" in search results, you might miss the fact that there are several other types of Russian places at the same level administratively that are not oblasts (such as republics). And, you'd have no way of knowing that - in the case of Russia, that there is a level in between some (not all...) of those places and place=country. In this case, districts, which also have no place=* tag & the place type is found only in the name.

The search results currently give the place=* values verbatim, at least in the website’s English localization, but it doesn’t have to be that way. Some localizations already hedge these labels, and English could follow suit if necessary:

https://github.com/OpenHistoricalMap/ohm-website/blob/6b61d2b1c37276d0ad3df6f1231a42a84f4f54a9/config/locales/en.yml#L1196

Hedging is probably necessary, because the label in search results isn’t intended to give the full measure of a place. If we want the website to show something more colloquial or official in that part of the interface, then this code should consider some of the raw tags in the extratags field of Nominatim’s response:

https://github.com/OpenHistoricalMap/ohm-website/blob/6b61d2b1c37276d0ad3df6f1231a42a84f4f54a9/app/controllers/geocoder_controller.rb#L109-L121

If we are going to expand the set of place=* values supported by Nominatim, then I think we should do so based on trends on the database or a discussion on the forum. At least that way we can be sure that we aren’t biasing the ontology toward a particular perspective while simultaneously coupling it to local terminology.

@jeffreyameyer
Copy link
Member Author

jeffreyameyer commented Apr 24, 2024

@1ec5 - can you help me better understand the costs of expanding the range of place=* values supported in Nominatim? @Rub21 has already implemented this & afaict, it shouldn't actually break anything while giving improved support for local placenames. Also, I'm not sure how this ticket would affect tagging for Londinium, as there's no hard and fast rule that population is the only indicator of settlement status, unless I've missed something. Clearly, modern population limits only exist relativistically. I'm not sure what hedging means, so I cannot speak to that - is it just a string assingment based on tabe value?

Here are the types of search results I think @Rub21's dev demo will help address.

From OSM:

I don't think anyone calls provinces "states," except when explaining what role provinces play:
Monosnap british columbia | OpenStreetMap 2024-04-24 09-41-21

This is just incorrect:
Monosnap puerto rico | OpenStreetMap 2024-04-24 09-04-04

I'm also not sold on the use of border_type=* as a place designator, as you get weird behavior and oddities like this in Italy:

Regions are called states and provinces are referred to... by their boundary? Why the modal shift? No one searches for a boundary, they search for a place, even if what is returned is a boundary and a label, admin centre, etc.

Monosnap lombardy | OpenStreetMap 2024-04-24 09-35-15 Monosnap pordenone | OpenStreetMap 2024-04-24 09-34-00

From OHM:

This is unsatisfying (no Oregon Territory or Oregon Country):
Monosnap oregon | OpenHistoricalMap 2024-04-24 09-05-51

And this shows "Administrative Boundary" instead of "Territory" (not passing place=territory through verbatim):
Monosnap oregon territory | OpenHistoricalMap 2024-04-24 09-06-46

As for waiting for trends in the database, I think this is a chicken & egg problem that we should take the lead on. Per @ZeLonewolf's related comment, a test or demo might be what is required to help encourage the trend. I believe there's plenty of data in the Newberry territories and with other place values to start using these other values.

This ticket won't break anything, still supports old workarounds (see the place=* history for this Bulgaria relation), which was changed as a workaround to the existing / status quo limitation of OHM's Nominatim config). Also, it's unclear that there are any downstream costs off expanding this list of supported place names or that we're prevented from doing anything in the future with this support.

the label in search results isn’t intended to give the full measure of a place.

Then why is it included at all? What is the "full measure" of a place? My view is that it should at least be reflective of common consensus and not be US-centric. And, why don't we take a look at what @Rub21 has already done looks like? Or, should we remove it?

@1ec5
Copy link
Member

1ec5 commented Apr 24, 2024

Also, it's unclear that there are any downstream costs off expanding this list of supported place names or that we're prevented from doing anything in the future with this support.

There are plenty of downstream costs. Any tags the software supports should be documented so they don’t linger as unused cruft or, worse, come to mean multiple things in the database depending on who mapped it. We will need to maintain the expanded list in multiple forks. If we merge OpenHistoricalMap/nominatim-ui#2 without corresponding changes to ohm-website, then a Singapore tagged as place=city-state will result in a label of “City-state” in English and “City-state” in Chinese, which is not as usable as “国家”. Our stylesheets will need to account for each of the new values in filters and label sizes, even though they reflect governance structures rather than functional importance. Overpass and QLever queries that work well for OSM data will become much more verbose for OHM data.

If there are kinds of places that have no analogue or near-analogue in the modern era, then we have no choice but to expand the list to include them. But to the extent that OSM has been able to get away with stretching certain keywords – that’s all these are, just keywords – I would rather make do with OSM’s compromises and rely on other keys such as border_type to express finer-grained distinctions.

I'm not sure what hedging means, so I cannot speak to that - is it just a string assingment based on tabe value?

Yes. We could change one string today and it would result in “State” becoming “State or Province” everywhere, no other changes needed to the database or software. The question is whether that would be a good change to make. It already says “State or Province” in other languages like French, German, Japanese, and Spanish. In Chinese, it only says “Province”. This indicates that we have some leeway, but it’s also a good sign that the bug should be fixed upstream.

I don't think anyone calls provinces "states," except when explaining what role provinces play:

This is just as true in OSM as it is in OHM. place=province is well-established in the OSM database, and it was already present in Nominatim’s address-levels.json. Why are Canada’s provinces tagged place=state instead of place=province in OSM? I’m guessing it’s because Canada has federal provinces, more like a U.S. state than a French province (for example). But this is something to ask the Canadians.

I'm also not sold on the use of border_type=* as a place designator, as you get weird behavior and oddities like this in Italy:

designation=* might be another option, for places that are represented only by a central point rather than a boundary.

Then why is it included at all? What is the "full measure" of a place? My view is that it should at least be reflective of common consensus and not be US-centric.

place=* is for making generalizations that affect software behavior, not primarily for end-user display. The fact that openstreetmap-website regurgitates this value as the descriptor of a Nominatim search result is a programmer’s convenient shortcut, nothing more. If we align place=* to local colloquial or official notions of place classification, then there still needs to be some other key for those generalizations. But then we’d just be confusing everyone with two keys that are backwards from how OSM generally defines them.

@1ec5
Copy link
Member

1ec5 commented Apr 24, 2024

And, why don't we take a look at what @Rub21 has already done looks like? Or, should we remove it?

Nominatim uses address-levels.json to place places in a hierarchy. You see this hierarchy in the fully qualified addresses that it comes up with. This is a fool’s errand in countries like the U.S. and UK that don’t assign addresses based on a strict hierarchy, but OSM mappers put up with it because they don’t expect the site’s search engine to be anything more polished than a raw querying or QA tool. (That’s the job of external geocoders like Pelias and Proton.)

Unfortunately, the hierarchy functionality is completely broken in OHM because of our overlapping boundaries representing different time periods: #693. This is why you can’t search for “Santa Fe, New Mexico”, which would rely on the hierarchy Nominatim comes up with.

Assuming that can be fixed, if we align place=* values to colloquial or official definitions, we have a different problem on our hands: some countries divide counties into districts; others divide districts into counties. Nominatim can no longer assign an accurate set of values across countries.

@jeffreyameyer
Copy link
Member Author

jeffreyameyer commented Apr 24, 2024

Ok - this is super helpful, as it's a little clearer what the obstacles are (and helps me explain why I haven't been valuing them as highly as perhaps I should be):

Any tags the software supports should be documented so they don’t linger as unused cruft or, worse, come to mean multiple things in the database depending on who mapped it.

Seems like this could be solved with documentation

We will need to maintain the expanded list in multiple forks.

Isn't there a single fork for our Nominatim?

If we merge OpenHistoricalMap/nominatim-ui#2 without corresponding changes to ohm-website, then a Singapore tagged as place=city-state will result in a label of “City-state” in English and “City-state” in Chinese, which is not as usable as “国家”.

I'm not sure what 国家 means, or whether the English version should be translated from that, instead of vice versa. But, couldn't we hedge the English "City-state" to be 国家 in Chinese? Also, I'm assuming we're using the English terminology as the de facto language of reference for place names? This is an interesting example, because both place=country and place=city-state would be valid, so neither is incorrect,, but I don't think we support this dual designation right now.

Our stylesheets will need to account for each of the new values in filters and label sizes, even though they reflect governance structures rather than functional importance.

I don't believe our stylesheets currently use place=* - the layers are built from admin_level=* in config.toml (this is probably the wrong pointer, as I believe this has been refactored, but I don't think the SQL has been rebuilt around place, but I could be wrong.

Overpass and QLever queries that work well for OSM data will become much more verbose for OHM data.

Couldn't these be easily rebuilt around admin_level=* which might be a more meaningful distinction, more aligned with Wikidata's nth-level administrative division designation?

I would rather make do with OSM’s compromises and rely on other keys such as border_type to express finer-grained distinctions.

If this is the best path, I'd rather figure out an alternate approach than this, as it seems to be stuffing unintended values into a key that doesn't reflect (imo) clarity of place designation.

I’m guessing it’s because Canada has federal provinces, more like a U.S. state than a French province (for example). But this is something to ask the Canadians.

Maybe, but the Italian example is also called place=county instead of place=province. To me, this is an example of an OSM oddity being extended, rather than improved, resulting in a false binding of place=state to admin_level=4 (Canadian Provinces) and place=county to admin_level=6 (Italian Provinces) that strips us of an opportunity for richer localization in our metadata.

designation=* might be another option, for places that are represented only by a central point rather than a boundary.

A separate key might be the best answer for this, although I'd suggest place:type=* or `place:local=*

place=* is for making generalizations that affect software behavior, not primarily for end-user display.

But who knows this besides coders? Even if not "primarily", it is used for end-user display, both in search results and in the inspector. Pretty confusing, imo, esp. as it doesn't match the description in the OSM wiki:

Used to indicate that a particular location is known by a particular name, to indicate what sort of "place" it is. A place tag should exist for every significant human settlement (city, town, suburb, etc.) regardless of administrative status, and also for notable unpopulated, named places.

If we align place=* to local colloquial or official notions of place classification, then there still needs to be some other key for those generalizations.

Doesn't admin_level already generalize them?

But then we’d just be confusing everyone with two keys that are backwards from how OSM generally defines them.

If we use admin_level, there would be one key that is consistent and one that is expanded and overlapping, no?

Assuming that can be fixed, if we align place=* values to colloquial or official definitions, we have a different problem on our hands: some countries divide counties into districts; others divide districts into counties. Nominatim can no longer assign an accurate set of values across countries.

Agreed that hierarchy is a problem assuming there's no workaround for Nominatim to handle per-country distinctions. The time-based thing seems more daunting, but separate, assuming we could solve per country custom hierarchies.

One other thought: should we just add to our wikimedia-querying inspector modification and have it pull place type from Wikidata? That way, couldn't we tune the query to have Singapore show up both as a country and a city-state?

@1ec5
Copy link
Member

1ec5 commented Apr 24, 2024

If we merge OpenHistoricalMap/nominatim-ui#2 without corresponding changes to ohm-website, then a Singapore tagged as place=city-state will result in a label of “City-state” in English and “City-state” in Chinese, which is not as usable as “国家”.

I'm not sure what 国家 means, or whether the English version should be translated from that, instead of vice versa. But, couldn't we hedge the English "City-state" to be 国家 in Chinese? Also, I'm assuming we're using the English terminology as the de facto language of reference for place names? This is an interesting example, because both place=country and place=city-state would be valid, so neither is incorrect,, but I don't think we support this dual designation right now.

国家 means “country”. My point is that the website gets the Nominatim result’s type and looks it up in the interface localization (the YAML file). If the key isn’t present, it falls back to the raw keyword, which will be in Snake_case_english regardless of the user’s language. This impairs the website’s usability more than any imprecision around place classification.

There’s a straightforward fix, which is to define more keys in the website localization and get them translated in Translatewiki.net. However, that means we’ll be maintaining a custom list of place types in multiple files across two different forks, which we’ll need to keep in sync with each other and with upstream changes from OSM. As I said, we have no choice but to do this to some extent, but I’d rather focus our attention on place types that are relevant only to historical geography.

Our stylesheets will need to account for each of the new values in filters and label sizes, even though they reflect governance structures rather than functional importance.

I don't believe our stylesheets currently use place=* - the layers are built from admin_level=* in config.toml (this is probably the wrong pointer, as I believe this has been refactored, but I don't think the SQL has been rebuilt around place, but I could be wrong.

Our stylesheets’ place labels rely exclusively on place=* tags in the database, via the class property in the vector tiles: #579. Even after we implement #543 for political subdivisions (administrative areas), place=* will still be important for labeling populated places (human settlements).

I’m guessing it’s because Canada has federal provinces, more like a U.S. state than a French province (for example). But this is something to ask the Canadians.

Maybe, but the Italian example is also called place=county instead of place=province. To me, this is an example of an OSM oddity being extended, rather than improved, resulting in a false binding of place=state to admin_level=4 (Canadian Provinces) and place=county to admin_level=6 (Italian Provinces) that strips us of an opportunity for richer localization in our metadata.

Italy uses admin_level=4 for regional boundaries and admin_level=6 for provincial boundaries. Regions also have centroid points tagged place=state, but few provinces have been mapped as points. In the absence of a place=* tag on either the boundary relation or the place point, Nominatim reports only the admin_level=*, which ohm-website maps to a set of generic boundary descriptions that probably isn’t correct in any modern-day country:

https://github.com/OpenHistoricalMap/ohm-website/blob/a753c1350699b3b27e23643ad87f2e64d2829641/config/locales/en.yml#L1358-L1368

openstreetmap/openstreetmap-website#1683 was closed because it would be impractical for the website to maintain a lookup table of admin_level descriptions by country and get all those descriptions translated into every language. I see two possible paths forward:

  • ohm-website can consult the border_type property of the extratags property in Nominatim’s response, corresponding to the border_type=* tag. ohm-website could make these border_type=* values translatable, but there will be quite a bit of awkwardness in some languages, particularly with overloaded terms like “county” and “district”.
  • ohm-website’s generic boundary descriptions could be less descriptive, such as “Sixth-Level Administrative Division”, borrowing Wikidata terminology. However, we customarily skip levels to allow for exceptions, such as New York City containing counties, so don’t expect the numbers to line up with Wikidata.

In any case, Pordenone is not a province officially: the province was abolished in 2017; in 2020 it was replaced by a “regional decentralization entity” at the provincial level. I don’t think we should bother building in support for place=regional_decentralisation_entity throughout the software stack. Similarly, in the U.S., Connecticut abolished its counties in favor of planning regions – place=planning_region? There are countless edge cases like this waiting to ambush any attempt at replacing the place=* classification scheme with machine-readable freeform text.

Assuming that can be fixed, if we align place=* values to colloquial or official definitions, we have a different problem on our hands: some countries divide counties into districts; others divide districts into counties. Nominatim can no longer assign an accurate set of values across countries.

Agreed that hierarchy is a problem assuming there's no workaround for Nominatim to handle per-country distinctions.

Even within a country, we cannot necessarily shoehorn customary or official place designations into a neat hierarchy. In the PRC, a “district” can be located in another “district” and a “city” inside another “city”:

Structural hierarchy of the administrative divisions and basic level autonomies of the People's Republic of China

In Vietnam, a “town” can be equal to or part of a “district”:

Administrative subdivisions of Vietnam

One other thought: should we just add to our wikimedia-querying inspector modification and have it pull place type from Wikidata? That way, couldn't we tune the query to have Singapore show up both as a country and a city-state?

This is feasible. The Wikidata API would allow us to request statements for multiple items at a time. However, Wikidata lacks a consistent naming convention for place types, and a place can be classified as multiple concepts by design. (The statements are ordered, but the order is arbitrary and nondeterministic.) How do these sound as labels? 😎

I love how Wikidata never shies away from nuanced, multifaceted classification, but I don’t think it was ever meant for this kind of interface element.

@jeffreyameyer
Copy link
Member Author

jeffreyameyer commented Apr 25, 2024

@Rub21 - let's hold off on this for now, pending some further discussion. Appreciate your testing this locally.

@1ec5 - replies below!

国家 means “country”.

Sorry - I should have been more clear. I wasn't sure if there was any subtlety beyond "country" that might be Singapore-specific.

Our stylesheets’ place labels rely exclusively on place=*

Again, this could be expanded to include admin_level fairly easily for the higher-level political divisions in each country without impacting use of place names for settlements. This would also make the label tagging consistent with the boundary area tagging for these entities. My expectation is that our .toml and styles are always going to vary enough from OSM without these differences, so including them will be a small incremental cost.

config/locales/en.yaml

Ugh! Again... unnecessary binding of admin_level and place when people say they're not the same thing and not redundant...

In any case, Pordenone is not a province officially

Yes, I'm familiar, as I lived there from 78-81, which is why I chose it. But there are other Italian Provinces that still exist today & it was a province when I lived there, so how to tag its history?

Even more interesting is how to tag over time Friuli, a historical region that's now part of an autonomous region, but also a region with a modern/lingering identity distinct from the autonomous region?

Even within a country, we cannot necessarily shoehorn customary or official place designations into a neat hierarchy.

No arguments from me - requiring any static hierarchies sucks. Hierarchies are fluid in schema over time and entities are fluid in where they belong in the schema over time. I'd love it if we could abandon any necessity of belonging to a hierarchy outside of part of for a particular time range in history.

In Vietnam, a “town” can be equal to or part of a “district”:

Understood - my take is that this sort of flexible hierarchy is fairly common, which is another reason why I think have an entity local specificity to place is worth some extra work.

How do these sound as labels? 😎

Point well taken, but that's why I like some sort of Sparql query that we could change / control fairly discretely. A simple WHERE * IN ([args]) could pare that list down pretty quickly to:

  • San Francisco - City
  • Saint Petersburg - City
  • Kyoto - City

and also

  • Singapore - Country, City-State

Seems like a meeting / discussion / checkin with @lonvia might be in order?

@1ec5
Copy link
Member

1ec5 commented Apr 25, 2024

Our stylesheets’ place labels rely exclusively on place=*

Again, this could be expanded to include admin_level fairly easily for the higher-level political divisions in each country without impacting use of place names for settlements. This would also make the label tagging consistent with the boundary area tagging for these entities. My expectation is that our .toml and styles are always going to vary enough from OSM without these differences, so including them will be a small incremental cost.

Would a place point for a city be tagged with admin_level? This further conflates the concepts of populated places and their governments and administrative boundaries. I think we would both agree that it’s important to conflate these concepts in a geocoder but just as important not to conflate them in a renderer or in some other analysis use cases – and therefore mappers should not be conflating them.

Even more interesting is how to tag over time Friuli, a historical region that's now part of an autonomous region, but also a region with a modern/lingering identity distinct from the autonomous region?

Two features, one representing the autonomous region as an administrative entity, and another representing the region as a cultural entity.

Even within a country, we cannot necessarily shoehorn customary or official place designations into a neat hierarchy.

No arguments from me - requiring any static hierarchies sucks. Hierarchies are fluid in schema over time and entities are fluid in where they belong in the schema over time. I'd love it if we could abandon any necessity of belonging to a hierarchy outside of part of for a particular time range in history.

We will have a hierarchy, whether we like it or not. Users will enter “City, Province”, or “City, Country”, and expect it to be interpreted hierarchically. If a geocoder can’t infer this hierarchy via predictable place=* values or predictable admin_level=* values, then the only tool left from OSM is subarea members of boundary relations, which would be impractical in OHM due to the time dimension.

The immediate problem is not the existence of a hierarchy but rather the notion of tightly coupling this hierarchy to official designations when tagging places in OHM. I think it’s OK, generally speaking, that Nominatim is configured to treat place=city as something that sits inside an admin_level=4 boundary, but I don’t think this is the same as expecting mappers to use place=city if and only if the place is/was known as a city in everyday speech.

By the way, this discussion is skirting past many problems that I’d consider more serious than ohm-website’s labeling but that only affect languages besides English. If you’re a French-speaking mapper, your editor presets for place=town and place=city currently look like this:

Ville (de 10 000 à 100 000 habitants) Grande ville (plus de 100 000 habitants)

I hope mappers are roundly ignoring that guidance as they map France’s past.

Point well taken, but that's why I like some sort of Sparql query that we could change / control fairly discretely.

This is true, but I’m a bit wary of making such a basic part of the website hit the Wikidata Query Service or QLever without a caching layer. And it seems perverse to use either service to implement geocoding functionality to annotate another geocoder’s results. If we really need this kind of functionality, better to build it into Nominatim, which already consults Wikidata to some extent.

To sum things up, my position is currently that we should take one or both of the following steps:

Neither step would require throwing out OSM’s place=* classification scheme, for all its warts, by coupling the values to official designations in English.

Then, to the extent that any of the keywords in OpenHistoricalMap/nominatim-ui#2 have no counterpart in modern geography, we can add them to both Nominatim and ohm-website. But I think this is probably just for structural needs like empire (assuming an empire contains country), not values like republic that are distinguished only by form of government or regional terms like major divisions.

@jeffreyameyer
Copy link
Member Author

jeffreyameyer commented May 29, 2024

Ok... revisiting this, given our labeling updates. :)

Is it safe to say:

  • admin_level is hierarchical, local to country, and decoupled from place?
  • place is not strictly hierarchical, and also local to country?

Also, where are boundary labels and place labels used in the app / exposed to users? Isn't it pretty minimal? My untrustworthy code review indicates as such.

I'm also not sold on border_type as a place for storing place-related attributes, as the attributes are about the place itself, not the border. Plus, it would block actual information about the border such as border_type=postulated or border_type=indeterminate, etc.

What about place:localname or placename or some such variant and then employ the search result logic to use extratags.placename, etc.?

But... even if we use this extra field, won't that create unnecessary redundancy? e.g. would place=state also require placename=state or some extra logic in case there is no placename?

@1ec5
Copy link
Member

1ec5 commented May 30, 2024

Is it safe to say:

  • admin_level is hierarchical, local to country, and decoupled from place?
  • place is not strictly hierarchical, and also local to country?

Yes on the first point, but I’m not so sure about the second point. The place values form a functional hierarchy rather than a topological one – actually three hierarchies. If you’re morbidly curious, the debate about place classification in the U.S. has led me down a very deep rabbit hole.

I didn’t get into the historical aspects, but for what it’s worth, the approach I explored in that post does generalize reasonably well to the beginning of car-driven suburban development in the early-to-mid 20th century. Before that, there are analogues in place classification, such as the concept of a market town. OSM’s place values were inspired by these more traditional notions, so we can apply them further back in history based on intuitive definitions rather than relying on anything scientific.

I'm also not sold on border_type as a place for storing place-related attributes, as the attributes are about the place itself, not the border. Plus, it would block actual information about the border such as border_type=postulated or border_type=indeterminate, etc.

I agree that it isn’t a well-chosen name for a key. “Type” is just a bad name to use for anything because everyone projects their own hopes and desires onto it. For indeterminate boundaries, we already have some features tagged either indeterminate=yes or indefinite=yes. We should keep going down that direction because a boundary can be simultaneously indeterminate, indefinite, dispute, maritime, and weird. 😉

What about place:localname or placename or some such variant and then employ the search result logic to use extratags.placename, etc.?

Yes, this is the idea behind designation, which generalizes beyond places and boundaries. The OSM documentation is all about roads and paths, but it’s also very commonly used on boundaries outside the U.S. The key is often misunderstood by new mappers to mean something like an official name, but this is easily solved with better presets.

But... even if we use this extra field, won't that create unnecessary redundancy? e.g. would place=state also require placename=state or some extra logic in case there is no placename?

There is necessarily some redundancy between, “This is what it’s legally designated as,” and, “This is practically speaking what it is in a general sense.” Even in Wikidata, there’s been a general trend away from hyper-specific classes to use as “instance of” values, in favor of additional properties. Similarly, we should limit place=* to as few values as practically possible, so that people don’t get the impression that one key can describe all the nuances at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo - Known Path
Development

No branches or pull requests

4 participants