Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writable warm replica replication/recovery #17390

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

skumawat2025
Copy link
Contributor

@skumawat2025 skumawat2025 commented Feb 19, 2025

Description

This PR aims to implement the Replica Replication and Recovery flow for indices with partial store locality (also known as writable warm indices), which were introduced with the composite directory in PR #12782. It builds upon and addresses comments received on the pending PR #14670.
For writable indices, where primary shards now consist of both localDirectory and remoteDirectory, we've implemented file uploads to remote storage and maintain an LRU fileCache on the node instead of storing all files locally.
We've made several changes to the replication and recovery processes:

  1. During replication events on replicas, we now only update the NRTReplicationReaderManager with the latest CheckpointInfoResponse, avoiding unnecessary downloads of actual file diffs on replica.
  2. For index close, we are skipping doing a flush on replica shards as during re-open we will anyway sync from remote store.
  3. During remote store sync in recovery scenarios, we are skipping coping the actually segment files from remote storage to local and create a new commit with the latest commit info from remote segment metadata file.

Related Issues

Resolves #13647

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Storage:Remote v2.16.0 Issues and PRs related to version 2.16.0 labels Feb 19, 2025
@skumawat2025 skumawat2025 changed the title Writable warm replica relocation/recovery Writable warm replica replication/recovery Feb 19, 2025
Copy link
Contributor

❌ Gradle check result for 6fde641: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -368,6 +368,11 @@ public boolean shouldPeriodicallyFlush() {
@Override
public void flush(boolean force, boolean waitIfOngoing) throws EngineException {
ensureOpen();
// Skip flushing for indices with partial locality (warm indices)
// For these indices, we don't need to commit as we will sync from the remote store on re-open
if (engineConfig.getIndexSettings().isStoreLocalityPartial() == false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be this engineConfig.getIndexSettings().isStoreLocalityPartial() ?

Copy link
Contributor Author

@skumawat2025 skumawat2025 Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is the right check. Thanks

@@ -238,7 +238,8 @@ private IndexMetadata.Builder updateInSyncAllocations(
allocationId = RecoverySource.ExistingStoreRecoverySource.FORCED_ALLOCATION_ID;
} else {
assert (recoverySource instanceof RecoverySource.SnapshotRecoverySource
|| recoverySource instanceof RecoverySource.RemoteStoreRecoverySource) : recoverySource;
|| recoverySource instanceof RecoverySource.RemoteStoreRecoverySource
|| recoverySource instanceof RecoverySource.ExistingStoreRecoverySource) : recoverySource;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this required and what will be the implication ?

Comment on lines +204 to +205
// Return an empty list for warm indices, In this case, replica shards don't require downloading files from remote storage
// as they can will sync all files from remote in case failure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we even the replicas checkpoint in case of warm shards ? Also make sure to handle the replication lag metric .

@@ -190,8 +190,10 @@ public CacheStats stats() {
public void logCurrentState() {
int i = 0;
for (RefCountedCache<K, V> cache : table) {
logger.trace("SegmentedCache " + i);
((LRUCache<K, V>) cache).logCurrentState();
if (cache.size() > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we handle size() == 0 case inside logCurrentState ?

logger.trace("Composite Directory[{}]: Local Directory files - {}", this::toString, () -> Arrays.toString(localFiles));
logger.trace("Composite Directory[{}]: Remote Directory files - {}", this::toString, () -> Arrays.toString(remoteFiles));
// logger.trace("Composite Directory[{}]: Remote Directory files - {}", this::toString, () -> Arrays.toString(remoteFiles));
Set<String> nonBlockLuceneFiles = allFiles.stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we returning just the nonBlockLuceneFiles here ? won't the list be incomplete here as some files can get evicted ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Storage:Remote v2.16.0 Issues and PRs related to version 2.16.0
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Writable Warm] Recovery/Replication flow for Composite Directory
2 participants