This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
forked from HHS/simpler-grants-gov
-
Notifications
You must be signed in to change notification settings - Fork 0
[Issue #16] Connect the API to use the search index #63
Merged
Merged
Changes from 18 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
62ba7f1
[Issue #9] Setup opensearch locally
chouinar 1922340
Some rearranging of files
chouinar 649339c
Dependency fixes
chouinar 2126171
Trying something else for the network setup?
chouinar 8f80852
Simplify the networking/docker setup
chouinar f02f3d3
[Issue #10] Populate the search index from the opportunity tables
chouinar 49c2a2b
Slightly tidying up
chouinar 25edfab
[Issue #14] Setup utils for creating requests and parsing responses f…
chouinar 1058287
Merge branch 'main' into chouinar/14-req-resp-tools
chouinar 327f242
A lot of tests / comments / cleanup
chouinar eaba30d
Add an example
chouinar 641ebd1
[Issue #16] Connect the API to use the search index
chouinar bba9a52
Docs and logging
chouinar 3b9fec9
Update OpenAPI spec
nava-platform-bot 01a5bc0
Adjust the allow_none logic
chouinar 3d933e8
Update OpenAPI spec
nava-platform-bot 28e106b
Merge branch 'main' into chouinar/14-req-resp-tools
chouinar 354654c
Merge branch 'chouinar/14-req-resp-tools' into chouinar/16-actual-impl
chouinar 2edec64
Merge branch 'main' into chouinar/16-actual-impl
chouinar 7dfe55b
Merge branch 'main' into chouinar/16-actual-impl
chouinar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
from src.adapters.search.opensearch_client import SearchClient | ||
from src.adapters.search.opensearch_config import get_opensearch_config | ||
from src.adapters.search.opensearch_query_builder import SearchQueryBuilder | ||
|
||
__all__ = ["SearchClient", "get_opensearch_config"] | ||
__all__ = ["SearchClient", "get_opensearch_config", "SearchQueryBuilder"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
from functools import wraps | ||
from typing import Callable, Concatenate, ParamSpec, TypeVar | ||
|
||
from flask import Flask, current_app | ||
|
||
from src.adapters.search import SearchClient | ||
|
||
_SEARCH_CLIENT_KEY = "search-client" | ||
|
||
|
||
def register_search_client(search_client: SearchClient, app: Flask) -> None: | ||
app.extensions[_SEARCH_CLIENT_KEY] = search_client | ||
|
||
|
||
def get_search_client(app: Flask) -> SearchClient: | ||
return app.extensions[_SEARCH_CLIENT_KEY] | ||
|
||
|
||
P = ParamSpec("P") | ||
T = TypeVar("T") | ||
|
||
|
||
def with_search_client() -> Callable[[Callable[Concatenate[SearchClient, P], T]], Callable[P, T]]: | ||
""" | ||
Decorator for functions that need a search client. | ||
|
||
This decorator will return the shared search client object which | ||
has an internal connection pool that is shared. | ||
|
||
Usage: | ||
@with_search_client() | ||
def foo(search_client: search.SearchClient): | ||
... | ||
|
||
@with_search_client() | ||
def bar(search_client: search.SearchClient, x: int, y: int): | ||
... | ||
""" | ||
|
||
def decorator(f: Callable[Concatenate[SearchClient, P], T]) -> Callable[P, T]: | ||
@wraps(f) | ||
def wrapper(*args: P.args, **kwargs: P.kwargs) -> T: | ||
return f(get_search_client(current_app), *args, **kwargs) | ||
|
||
return wrapper | ||
|
||
return decorator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,217 @@ | ||
import typing | ||
|
||
from src.pagination.pagination_models import SortDirection | ||
|
||
|
||
class SearchQueryBuilder: | ||
""" | ||
Utility to help build queries to OpenSearch | ||
|
||
This helps with making sure everything we want in a search query goes | ||
to the right spot in the large JSON object we're building. Note that | ||
it still requires some understanding of OpenSearch (eg. when to add ".keyword" to a field name) | ||
|
||
For example, if you wanted to build a query against a search index containing | ||
books with the following: | ||
* Page size of 5, page number 1 | ||
* Sorted by relevancy score descending | ||
* Scored on titles containing "king" | ||
* Where the author is one of Brandon Sanderson or J R.R. Tolkien | ||
* Returning aggregate counts of books by those authors in the full results | ||
|
||
This query could either be built manually and look like: | ||
|
||
{ | ||
"size": 5, | ||
"from": 0, | ||
"track_scores": true, | ||
"sort": [ | ||
{ | ||
"_score": { | ||
"order": "desc" | ||
} | ||
} | ||
], | ||
"query": { | ||
"bool": { | ||
"must": [ | ||
{ | ||
"simple_query_string": { | ||
"query": "king", | ||
"fields": [ | ||
"title.keyword" | ||
], | ||
"default_operator": "AND" | ||
} | ||
} | ||
], | ||
"filter": [ | ||
{ | ||
"terms": { | ||
"author.keyword": [ | ||
"Brandon Sanderson", | ||
"J R.R. Tolkien" | ||
] | ||
} | ||
} | ||
] | ||
} | ||
}, | ||
"aggs": { | ||
"author": { | ||
"terms": { | ||
"field": "author.keyword", | ||
"size": 25, | ||
"min_doc_count": 0 | ||
} | ||
} | ||
} | ||
} | ||
|
||
Or you could use the builder and produce the same result: | ||
|
||
search_query = SearchQueryBuilder() | ||
.pagination(page_size=5, page_number=1) | ||
.sort_by([("relevancy", SortDirection.DESCENDING)]) | ||
.simple_query("king", fields=["title.keyword"]) | ||
.filter_terms("author.keyword", terms=["Brandon Sanderson", "J R.R. Tolkien"]) | ||
.aggregation_terms(aggregation_name="author", field_name="author.keyword", minimum_count=0) | ||
.build() | ||
""" | ||
|
||
def __init__(self) -> None: | ||
self.page_size = 25 | ||
self.page_number = 1 | ||
|
||
self.sort_values: list[dict[str, dict[str, str]]] = [] | ||
|
||
self.must: list[dict] = [] | ||
self.filters: list[dict] = [] | ||
|
||
self.aggregations: dict[str, dict] = {} | ||
|
||
def pagination(self, page_size: int, page_number: int) -> typing.Self: | ||
""" | ||
Set the pagination for the search request. | ||
|
||
Note that page number should be the human-readable page number | ||
and start counting from 1. | ||
""" | ||
self.page_size = page_size | ||
self.page_number = page_number | ||
return self | ||
|
||
def sort_by(self, sort_values: list[typing.Tuple[str, SortDirection]]) -> typing.Self: | ||
""" | ||
List of tuples of field name + sort direction to sort by. If you wish to sort by the relevancy | ||
score provide a field name of "relevancy". | ||
|
||
The order of the tuples matters, and the earlier values will take precedence - or put another way | ||
the first tuple is the "primary sort", the second is the "secondary sort", and so on. If | ||
all of the primary sort values are unique, then the secondary sorts won't be relevant. | ||
|
||
If this method is not called, no sort info will be added to the request, and OpenSearch | ||
will internally default to sorting by relevancy score. If there is no scores calculated, | ||
then the order is likely the IDs of the documents in the index. | ||
|
||
Note that multiple calls to this method will erase any info provided in a prior call. | ||
""" | ||
for field, sort_direction in sort_values: | ||
if field == "relevancy": | ||
field = "_score" | ||
|
||
self.sort_values.append({field: {"order": sort_direction.short_form()}}) | ||
|
||
return self | ||
|
||
def simple_query(self, query: str, fields: list[str]) -> typing.Self: | ||
""" | ||
Adds a simple_query_string which queries against the provided fields. | ||
|
||
The fields must include the full path to the object, and can include optional suffixes | ||
to adjust the weighting. For example "opportunity_title^4" would increase any scores | ||
derived from that field by 4x. | ||
|
||
See: https://opensearch.org/docs/latest/query-dsl/full-text/simple-query-string/ | ||
""" | ||
self.must.append( | ||
{"simple_query_string": {"query": query, "fields": fields, "default_operator": "AND"}} | ||
) | ||
|
||
return self | ||
|
||
def filter_terms(self, field: str, terms: list) -> typing.Self: | ||
""" | ||
For a given field, filter to a set of values. | ||
|
||
These filters do not affect the relevancy score, they are purely | ||
a binary filter on the overall results. | ||
""" | ||
self.filters.append({"terms": {field: terms}}) | ||
return self | ||
|
||
def aggregation_terms( | ||
self, aggregation_name: str, field_name: str, size: int = 25, minimum_count: int = 1 | ||
) -> typing.Self: | ||
""" | ||
Add a term aggregation to the request. Aggregations are the counts of particular fields in the | ||
full response and are often displayed next to filters in a search UI. | ||
|
||
Size determines how many different values can be returned. | ||
Minimum count determines how many occurrences need to occur to include in the response. | ||
If you pass in 0 for this, then values that don't occur at all in the full result set will be returned. | ||
|
||
see: https://opensearch.org/docs/latest/aggregations/bucket/terms/ | ||
""" | ||
self.aggregations[aggregation_name] = { | ||
"terms": {"field": field_name, "size": size, "min_doc_count": minimum_count} | ||
} | ||
return self | ||
|
||
def build(self) -> dict: | ||
""" | ||
Build the search request | ||
""" | ||
|
||
# Base request | ||
page_offset = self.page_size * (self.page_number - 1) | ||
request: dict[str, typing.Any] = { | ||
"size": self.page_size, | ||
"from": page_offset, | ||
# Always include the scores in the response objects | ||
# even if we're sorting by non-relevancy | ||
"track_scores": True, | ||
} | ||
|
||
# Add sorting if any was provided | ||
if len(self.sort_values) > 0: | ||
request["sort"] = self.sort_values | ||
|
||
# Add a bool query | ||
# | ||
# The "must" block contains anything relevant to scoring | ||
# The "filter" block contains filters that don't affect scoring and act | ||
# as just binary filters | ||
# | ||
# See: https://opensearch.org/docs/latest/query-dsl/compound/bool/ | ||
bool_query = {} | ||
if len(self.must) > 0: | ||
bool_query["must"] = self.must | ||
|
||
if len(self.filters) > 0: | ||
bool_query["filter"] = self.filters | ||
|
||
# Add the query object which wraps the bool query | ||
query_obj = {} | ||
if len(bool_query) > 0: | ||
query_obj["bool"] = bool_query | ||
|
||
if len(query_obj) > 0: | ||
request["query"] = query_obj | ||
|
||
# Add any aggregations | ||
# see: https://opensearch.org/docs/latest/aggregations/ | ||
if len(self.aggregations) > 0: | ||
request["aggs"] = self.aggregations | ||
|
||
return request |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to match the stemmer chosen in the utils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the utils, I found some issues with what I had configured in the prior PR when setting it up with our actual data and fixed it here.