Use explain API for term frequency analysis #1718
Labels
backend
changes to the django backend
code quality
code & performance improvements that do not affect user functionality
visualisation
changes to visualisation features
To calculate results in the term frequency graph, the backend uses the search API to get matching documents for a query, and then runs a Python function to count the number of matches within the document. (Which uses the
termvectors
andanalyze
APIs from Elasticsearch.)As an alternative, we might use the explain API which gives more detailed info behind the relevance score of a document, including the absolute number of matches for each term. (You can also use
"explain": true
in the search request for the same data.)I expect that this would be a lot more efficient (and thus faster) than our current method. However, I should note that the
_explanation
output is readable, but it's not trivial to write a function that will extract the absolute number of matches from it. (Definitely doable, though.)Example
Here is the query I made:
And here is a document in the results:
The text was updated successfully, but these errors were encountered: