Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Score for each location? #21

Closed
0AlphaZero0 opened this issue Oct 9, 2020 · 4 comments
Closed

Score for each location? #21

0AlphaZero0 opened this issue Oct 9, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@0AlphaZero0
Copy link

Could it be possible to have a score for each location found?
Indeed sometimes it could be good to know why some locations are in the results for example :
`

import geograpy
places = geograpy.get_geoPlace_context(text="This sentence mention UK as country and London as city.")
places.countries
['United Kingdom', 'United States', 'Canada']
places.cities
['London']
places = geograpy.get_geoPlace_context(text="Jin Yin-tan Hospital, Wuhan, China.")
places.countries
['China', 'Mexico', 'United States']
places.cities
['China']
`

Something like the following score could be interesting :
[('United Kingdom',0.99), ('United States',0.56), ('Canada',0.45)]

A score of confidence could help to avoid those results.

@WolfgangFahl WolfgangFahl self-assigned this Oct 9, 2020
@WolfgangFahl WolfgangFahl added the enhancement New feature or request label Oct 9, 2020
@WolfgangFahl
Copy link
Collaborator

How would you like to calculate the score? Currently there are a few possible strategies:

  • use the population as an indicator e.g. Vienna Austria has a higher population as Vienna Illinois
  • use the combination of fields as an indicator Wuhan China should limit the location as a grid search just in that area

Please note that the disambiguation is currently only possible with the Locator API.

@0AlphaZero0
Copy link
Author

I think a combination of fields should be the best approach.

@WolfgangFahl
Copy link
Collaborator

@WolfgangFahl
Copy link
Collaborator

#52 now addresses this - the default is to order by population. For our own use case we'll use a more sophisticated version and see what the likelyhood of a location is in our context given how often it is already in our corpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants