Skip to content

Support adaptive refresh in Searcher Managers. #14443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

vigyasharma
Copy link
Contributor

In segment based replication systems, a large replication payload (checkpoint) can induce heavy page faults, cause thrashing for in-flight search requests, and affect overall search performance.

A potential way to handle these bursts, is to leverage multiple commit points in the Lucene index. Instead of refreshing to the latest commit for a large replication payload, searchers can intelligently select the commit point that they can safely absorb. By processing through multiple such points, searchers can eventually get to the latest commit, without incurring too many page faults.

This change lets users define a commit selection strategy, controlling which commit the searcher manager refreshes on. Addresses #14219

Usage:
To incrementally refresh through multiple commit points until searcher is current with its directory:

  • Define a commit selection strategy using the RefreshCommitSupplier interface.
  • Update searcher managers with this strategy via setRefreshCommitSupplier()
  • Invoke maybeRefresh() or maybeRefreshBlocking in a loop until isSearcherCurrent() returns true.

@jpountz
Copy link
Contributor

jpountz commented Apr 7, 2025

Thanks for tackling this!

To incrementally refresh through multiple commit points until searcher is current with its directory:

[...]
Invoke maybeRefresh() or maybeRefreshBlocking in a loop until isSearcherCurrent() returns true.

Is this the way we anticipate this to be used? I had imagined that the application would not change the way it refreshes and still call it on a schedule, but commit more frequently and retain multiple commits. E.g. commit every 30 seconds, retain commits for 300 seconds and refresh every 120 seconds (these numbers are just for the sake of the example). So every 120 seconds, SearcherManager would pick the most recent commit that differs by less than X GB (configurable based on the amount of trashing that the app can sustain between consecutive point-in-time views of the index) from the current point-in-time reader, or the commit that differs by the least amount of data if there is no such commit (typically the oldest commit). Most of the time, SearcherManager would pick the newest commit point, but under heavy merging it may decide to lag behind the latest commit point a bit for the sake of smoothing out page cache trashing.

@vigyasharma
Copy link
Contributor Author

every 120 seconds, SearcherManager would pick the most recent commit that differs by less than X GB

This is indeed how we anticipate it being used. In NRT style segment replicated setups, if searchers refresh more often than replication frequency, they will eventually catch up to the latest commit. I mentioned the while loop for cases where users want to wait and verify that their searchers are current.

The PR of course supports both patterns, I'll update the description to reflect it as well.

@jpountz
Copy link
Contributor

jpountz commented Apr 8, 2025

if searchers refresh more often than replication frequency

OK I think I misunderstood how it would be used. I had assumed that commits would always get replicated immediately, but you are suggesting that replications are infrequent and bring several commits at once to leave time to replica nodes to smoothly absorb the delta.

@vigyasharma vigyasharma changed the title Support incremental refresh in Searcher Managers. Support adaptive refresh in Searcher Managers. Apr 8, 2025
@jpountz
Copy link
Contributor

jpountz commented Apr 10, 2025

Sorry I'm still a bit confused: how is this approach better than just committing more frequently, replicating commits as soon as they are created, and refreshing searchers as soon as commits are replicated?

@msokolov
Copy link
Contributor

Sorry I'm still a bit confused: how is this approach better than just committing more frequently, replicating commits as soon as they are created, and refreshing searchers as soon as commits are replicated?

One scenario of interest is when replication becomes delayed, for example when working with cross-datacenter replication this is expected. In that case commit points may pile up, even to the extent of completely replacing the entire index. In such a case we'd like to be able to recover without undue impact to searchers.

@vigyasharma
Copy link
Contributor Author

just committing more frequently, replicating commits as soon as they are created, and refreshing searchers as soon as commits are replicated?

This is more or less the setup we have today at Amazon Product Search. We have separate indexing and search fleets that use s3 as a sink. Some fleets replicate across aws data centers. I believe this is a common architecture, for e.g. DoorDash seems to have a similar search architecture.

However, as Mike mentioned, these commits go over network hops and are vulnerable to networking lags. Our searchers periodically pull the latest commit from s3 and refresh. If replication is delayed, searchers can skip a few commits to pull the latest one available. This latest commit can have a very high delta to what searchers are currently on.

With adaptive refresh, we are experimenting with making searchers pull the last N commits and refresh on the newest commit that they can safely absorb. At the extreme, if the entire index has changed, it will be no different than refreshing on the latest commit. But for moderate delay windows, we could find "bite sized" hops for searchers to catch up safely.

@vigyasharma
Copy link
Contributor Author

Another scenario where adaptive refresh might be useful is with heterogenous search fleets. Searchers with less memory would benefit from stepping through smaller commit deltas, while high memory searchers can jump ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants