-
Notifications
You must be signed in to change notification settings - Fork 0
[Issue #179] Incrementally load search data #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me so far. Would love to go over in some more detail when you return
While it's not strictly ideal, I've seen good results with Elastic (and thereby I think safe to think OpenSearch) about turning updates with no new data to "no-ops" at the Search layer. Obviously in higher volume data situations we might still want to limit using coarse methods what we send to search, but I've always taken a better safe than sorry approach and let the search code figure out when something might have been "updated" but not "changed" in terms of what is indexed. |
I do see mention of noop in the ElasticSearch docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html - but not in the OpenSearch docs: https://opensearch.org/docs/latest/api-reference/document-apis/update-document/ I wonder if you could reindex and merge two indices together into one (lets say, an opportunity and opportunity attachment index), and use that? |
So you can definitely assign to the same alias to multiple indexes (again at least in Elastic) and it somehow will query across both (not sure how this works in practice). I did in the past use aliases to allow the index to be swapped out under a running system. On a monthly data update cycle we would push data out to the DB, run a full new index under the month "search-sept" and then once the new index was fully built, flip the alias from "search-aug" to "search-sept." This represented a good way to make an effort to always sync data changes to the index, but then have a regular checkpoint where no matter what we'd know the search was up-to-date. Our data drove the monthly timeline, you could do this weekly, daily, or even hourly, depending on how expensive the data pull is to fully index. |
Fixes HHS#2038 Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it. Various changes to the search utilities to support this work Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.
Fixes HHS#2038 Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it. Various changes to the search utilities to support this work Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.
Fixes #2038 Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it. Various changes to the search utilities to support this work Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.
Summary
Fixes #179
Time to review: 10 mins
Changes proposed
Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it.
Various changes to the search utilities to support this work
Context for reviewers
Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.