[Issue #179] Incrementally load search data #180

chouinar · 2024-08-16T19:49:52Z

Summary

Fixes #179

Time to review: 10 mins

Changes proposed

Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it.

Various changes to the search utilities to support this work

Context for reviewers

Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.

Rwolfe-Nava

Looks good to me so far. Would love to go over in some more detail when you return

mdragon · 2024-09-13T15:47:10Z

Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.

While it's not strictly ideal, I've seen good results with Elastic (and thereby I think safe to think OpenSearch) about turning updates with no new data to "no-ops" at the Search layer. Obviously in higher volume data situations we might still want to limit using coarse methods what we send to search, but I've always taken a better safe than sorry approach and let the search code figure out when something might have been "updated" but not "changed" in terms of what is indexed.

chouinar · 2024-09-13T16:19:50Z

Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.

While it's not strictly ideal, I've seen good results with Elastic (and thereby I think safe to think OpenSearch) about turning updates with no new data to "no-ops" at the Search layer. Obviously in higher volume data situations we might still want to limit using coarse methods what we send to search, but I've always taken a better safe than sorry approach and let the search code figure out when something might have been "updated" but not "changed" in terms of what is indexed.

I do see mention of noop in the ElasticSearch docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html - but not in the OpenSearch docs: https://opensearch.org/docs/latest/api-reference/document-apis/update-document/

I wonder if you could reindex and merge two indices together into one (lets say, an opportunity and opportunity attachment index), and use that?

mdragon · 2024-09-13T17:33:12Z

I wonder if you could reindex and merge two indices together into one (lets say, an opportunity and opportunity attachment index), and use that?

So you can definitely assign to the same alias to multiple indexes (again at least in Elastic) and it somehow will query across both (not sure how this works in practice).

I did in the past use aliases to allow the index to be swapped out under a running system. On a monthly data update cycle we would push data out to the DB, run a full new index under the month "search-sept" and then once the new index was fully built, flip the alias from "search-aug" to "search-sept." This represented a good way to make an effort to always sync data changes to the index, but then have a regular checkpoint where no matter what we'd know the search was up-to-date. Our data drove the monthly timeline, you could do this weekly, daily, or even hourly, depending on how expensive the data pull is to fully index.

Fixes HHS#2038 Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it. Various changes to the search utilities to support this work Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.

Fixes #2038 Updated the load search data task to partially support incrementally loading + deleting records in the search index rather than just fully remaking it. Various changes to the search utilities to support this work Technically this doesn't fully support a true incremental load as it updates every record rather than just the ones with changes. I think the logic necessary to detect changes both deserves its own ticket, and may evolve when we later support indexing files to OpenSearch, so I think it makes sense to hold off on that for now.

[Issue #179] Incrementally load search data

903145d

chouinar requested a review from Rwolfe-Nava August 16, 2024 19:49

github-actions bot added api python labels Aug 16, 2024

Rwolfe-Nava approved these changes Aug 20, 2024

View reviewed changes

chouinar added 2 commits August 26, 2024 14:15

CLeanup, tests, more impl

0f25373

More docs

898e85d

chouinar marked this pull request as ready for review August 27, 2024 16:21

chouinar requested a review from jamesbursa as a code owner August 27, 2024 16:21

chouinar requested a review from Rwolfe-Nava August 27, 2024 16:21

acouch approved these changes Sep 13, 2024

View reviewed changes

chouinar merged commit 2854d43 into main Sep 13, 2024
8 checks passed

chouinar deleted the chouinar/179-incremental-search-load branch September 13, 2024 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #179] Incrementally load search data #180

[Issue #179] Incrementally load search data #180

chouinar commented Aug 16, 2024 •

edited

Loading

Rwolfe-Nava left a comment •

edited

Loading

mdragon commented Sep 13, 2024

chouinar commented Sep 13, 2024

mdragon commented Sep 13, 2024 •

edited

Loading

[Issue #179] Incrementally load search data #180

[Issue #179] Incrementally load search data #180

Conversation

chouinar commented Aug 16, 2024 • edited Loading

Summary

Time to review: 10 mins

Changes proposed

Context for reviewers

Rwolfe-Nava left a comment • edited Loading

Choose a reason for hiding this comment

mdragon commented Sep 13, 2024

chouinar commented Sep 13, 2024

mdragon commented Sep 13, 2024 • edited Loading

chouinar commented Aug 16, 2024 •

edited

Loading

Rwolfe-Nava left a comment •

edited

Loading

mdragon commented Sep 13, 2024 •

edited

Loading