Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search algorithms for Elasticsearch index and PostgreSQL database #70

Open
kgeographer opened this issue Aug 6, 2022 · 0 comments
Open
Labels
Elasticsearch relates to Elasticsearch functionality enhancement New feature or request help wanted Extra attention is needed SQL/PostgreSQL enhancements involving SQL

Comments

@kgeographer
Copy link
Member

kgeographer commented Aug 6, 2022

The WHG search function offers two options - search of the "union index", and search of the public records in the relational database. Pre- and post-search filters allow for narrowing search results by area or region, by broad place category and/or narrow type, and by timespan.

The most glaring shortcomings in WHG search concerns the matching of place name search terms. Currently, the name lookup attempts a match of the exact string entered with any name variant found in the WHG index (or database in that case). Because existing records may not include a name variant with the exact spelling entered, good potential matches are often missed. The search against names needs to find similar names that are within bounds entered into the "SPATIAL" filter.

This requirement overlaps with Issue #68, which deals with name matching in the Wikidata reconciliation process using Python-wrapped Elasticsearch query language. However, it also requires a similar solution for searches against the relational database, which are currently performed with a simple Django filter function. More options are possible using SQL directly and PostgreSQL 'fuzzy string matching' functionality and spatial filters.

A sandbox environment for the WHG Elasticsearch index instance is available, so knowledge of Python/Django or the WHG codebase generally is not essential. That said, WHG wraps ES queries in Python using the "official" Elasticsearch Python client, so that code would help further.

For the database requirement, knowledge of Django and/or PostgreSQL is needed. A sandbox environment could be set up in fairly short order.

@kgeographer kgeographer changed the title Search algorithms for Search page Search algorithms Aug 6, 2022
@kgeographer kgeographer changed the title Search algorithms Search algorithms for Elasticsearch index and PostgreSQL database Aug 6, 2022
@kgeographer kgeographer added enhancement New feature or request help wanted Extra attention is needed Elasticsearch relates to Elasticsearch functionality SQL/PostgreSQL enhancements involving SQL labels Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Elasticsearch relates to Elasticsearch functionality enhancement New feature or request help wanted Extra attention is needed SQL/PostgreSQL enhancements involving SQL
Projects
None yet
Development

No branches or pull requests

1 participant