Search algorithms for Elasticsearch index and PostgreSQL database #70
Labels
Elasticsearch
relates to Elasticsearch functionality
enhancement
New feature or request
help wanted
Extra attention is needed
SQL/PostgreSQL
enhancements involving SQL
The WHG search function offers two options - search of the "union index", and search of the public records in the relational database. Pre- and post-search filters allow for narrowing search results by area or region, by broad place category and/or narrow type, and by timespan.
The most glaring shortcomings in WHG search concerns the matching of place name search terms. Currently, the name lookup attempts a match of the exact string entered with any name variant found in the WHG index (or database in that case). Because existing records may not include a name variant with the exact spelling entered, good potential matches are often missed. The search against names needs to find similar names that are within bounds entered into the "SPATIAL" filter.
This requirement overlaps with Issue #68, which deals with name matching in the Wikidata reconciliation process using Python-wrapped Elasticsearch query language. However, it also requires a similar solution for searches against the relational database, which are currently performed with a simple Django filter function. More options are possible using SQL directly and PostgreSQL 'fuzzy string matching' functionality and spatial filters.
A sandbox environment for the WHG Elasticsearch index instance is available, so knowledge of Python/Django or the WHG codebase generally is not essential. That said, WHG wraps ES queries in Python using the "official" Elasticsearch Python client, so that code would help further.
For the database requirement, knowledge of Django and/or PostgreSQL is needed. A sandbox environment could be set up in fairly short order.
The text was updated successfully, but these errors were encountered: