-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage-manager): replication initial alignment #1468
Merged
J-Loudet
merged 5 commits into
main
from
refactor/storage-manager/initial-alignment-replica
Sep 26, 2024
Merged
feat(storage-manager): replication initial alignment #1468
J-Loudet
merged 5 commits into
main
from
refactor/storage-manager/initial-alignment-replica
Sep 26, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PR missing one of the required labels: {'internal', 'dependencies', 'breaking-change', 'new feature', 'enhancement', 'documentation', 'bug'} |
J-Loudet
force-pushed
the
refactor/storage-manager/initial-alignment-replica
branch
from
September 25, 2024 12:00
2d9fc3a
to
12a9186
Compare
The `StorageService` was splitting the `StorageConfig` that was used to create it. In addition to adding noise, this prevented separating the creation of the structure from spawning the subscriber and queryable associated with a Storage. This commit changes the fields of the `StorageService` structure to keep the entire `StorageConfig` -- thus allowing separating the creation from spawning the queryable and subscriber. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs: - Removed the following fields from the `StorageService` structure: - `key_expr`, - `complete`, - `strip_prefix`. - Added the field `configuration` to keep track of the associated `StorageConfig`. - Changed the signature of the `start_storage_queryable_subscriber` removing the `GarbageConfig` as it is now contained in `&self`. - Updated the calls to access `key_expr`, `complete` and `strip_prefix`. - Removed an `unwrap()` and instead log an error. Signed-off-by: Julien Loudet <[email protected]>
This commit separates creating the `StorageService` from starting it. This change is motivated by the Replication feature: when performing the initial alignment we want to delay the Storage from answering queries until after the initial alignment has been performed. In order to have this functionality we need to be able to dissociate creating the `StorageService` from starting it. As the method `start_storage_queryable_subscriber` takes ownership of the `StorageService`, it became mandatory to first create the `StorageService`, then start the Replication and lastly start the Storage. Because of this, as the Replication code was inside a task, the code to create and start the Storage was also moved inside the task. * plugins/zenoh-plugin-storage-manager/src/replication/service.rs: - Take a reference over the `StorageService` structure as it is only needed before spawning the different Replication tasks. The StorageService is still needed to call `process_sample`. - Clone the `Arc` of the underlying Storage before spawning the Replication tasks. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Move the logging until starting the Storage. - Move the code starting the Storage inside the task. - Start the `StorageService` after having started the `ReplicationService`. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs: - Renamed the function `start` to `new` as it now only creates an instance of the `StorageService`. - Removed the parameter `rx` from the call to `new` as it no longer also starts it. - Removed the call to `start_storage_queryable_subscriber` from `new`. - Changed the visibility of the method `start_storage_queryable_subscriber` to `pub(crate)` as it is called from outside the `service` module. - Added logging information before the Storage "loop" is started (to help confirm, with the logs, the order in which the different elements are started). Signed-off-by: Julien Loudet <[email protected]>
J-Loudet
force-pushed
the
refactor/storage-manager/initial-alignment-replica
branch
from
September 25, 2024 12:00
12a9186
to
9410194
Compare
As we were using the key expression of the Storage to generate the key expressions used in the Replication, it was possible to receive Digest emitted by Replicas that were operating on a subset of the key space of the Storage. This commit changes the way the key expressions for the Replication are generated by using the hash of the configuration of the Replication: this renders these key expressions unique, hence avoiding the issue just described. This property is interesting for the initial Alignment: had we not made that change, we would have had to ensure that we perform that Alignment on a Replica operating on exactly the same key space (and not a subset) and the same configuration (in particular, the `strip_prefix`). NOTE: This does not solve the initial alignment step that will still contact Storage that are operating on a subset (if there is no better match on the network). * plugins/zenoh-plugin-storage-manager/src/replication/core.rs: - Renamed `storage_ke` to `hash_configuration` in the key expression formatters of the Digest and the Aligner. - Removed the unnecessary clones when spawning the Digest Publisher + fixed the different call sites. - Removed the scope to access the configuration as we clone it earlier in the code + fixed the different call sites. - Used the hash of the configuration to generate the key expression for the Digest. - Removed the unnecessary clones when spawning the Digest Subscriber + fixed the different call sites. - Used the hash of the configuration to generate the key expression for the Digest. - Removed the unnecessary clones when spawning the Digest Publisher + fixed the different call sites. - Used the hash of the configuration to generate the key expression for the Digest. - Removed the unnecessary clones when spawning the Aligner + fixed the different call sites. - Used the hash of the configuration to generate the key expression for the Aligner Queryable. Signed-off-by: Julien Loudet <[email protected]>
It does not bring anything to wait and retry on error when attempting to declare a Queryable or a Subscriber: either the Session is established and these operations will succeed or the Session is no longer existing in which case we should terminate. * plugins/zenoh-plugin-storage-manager/src/replication/core.rs: - The `MAX_RETRY` and `WAIT_PERIODS_SECS` constants are no longer needed. - Removed the wait/retry loops when creating the Digest Subscriber and Aligner Queryable. Signed-off-by: Julien Loudet <[email protected]>
J-Loudet
force-pushed
the
refactor/storage-manager/initial-alignment-replica
branch
from
September 25, 2024 15:04
9410194
to
45eddd6
Compare
J-Loudet
force-pushed
the
refactor/storage-manager/initial-alignment-replica
branch
2 times, most recently
from
September 26, 2024 14:07
a73ce15
to
b00835b
Compare
This commit changes the way a replicated Storage starts: if it is empty and configured to be replicated, it will attempt to align with an active and compatible (i.e. same configuration) Replica before anything. The previous behaviour made a query on the key expression of the Storage. Although, it could, in some cases, actually perform the same initial alignment, it could not guarantee to only query a Storage that was configured to be replicated. To perform this controlled initial alignment, new variants to the `AlignmentQuery` and `AlignmentReply` enumerations were added: `Discovery` to discover an active Replica and reply with its `ZenohId`, `All` to request the data from the discovered Replica. To avoid contention, this transfer is performed by batch, one `Interval` at a time. * plugins/zenoh-plugin-storage-manager/src/replication/aligner.rs: - Added new variants `AlignmentQuery::All` and `AlignmentQuery::Discovery`. - Added new variant `AlignmentReply::Discovery`. - Updated the `aligner` method to: - Send the `ZenohId` as a reply to an `AlignmentQuery::Discovery`. - Send all the data of the Storage as a reply to an `AlignmentQuery::All`. This leverages the already existing `reply_events` method. - Updated the `reply_events` method to not attempt to fetch the content of the Storage if the action is set to `delete`. Before this commit, the only time this method was called was during an alignment which filters out the deleted events (hence not requiring this code). - Updated the `spawn_query_replica_aligner` method: - It now returns the handle of the newly created task as we want to wait for it to finish when performing the initial alignment. - Changed the consolidation to `ConsolidationMode::Monotonic` when sending an `AlignmentQuery::Discovery`: we want to contact the fastest answering Replica (hopefully the closest). - Stopped processing replies when processing an `AlignmentQuery::Discovery` as we only want to perform the initial alignment once. - Updated the `process_alignment_reply`: - Process an `AlignmentReply::Discovery` by sending a follow-up `AlignmentQuery::All` to retrieve the content of the Storage of the discovered Replica. - It does not attempt to delete an entry in the Storage when processing an `AlignmentReply::Retrieval`. This could only happen when performing an initial alignment in which case the receiving Storage is empty. We basically only need to record the fact that a delete was performed in the Replication Log. * plugins/zenoh-plugin-storage-manager/src/replication/core.rs: implemented the `initial_alignment` method that attempts to discover a Replica by sending out an `AlignmentQuery::Discovery` on the Aligner Queryable for all Replicas. Before making this query we wait a small delay to give enough time for Zenoh to propagate the routing tables. * plugins/zenoh-plugin-storage-manager/src/replication/service.rs: - Removed the constants `MAX_RETRY` and `WAIT_PERIOD_SECS` as they were no longer needed. - Updated the documentation of the `spawn_start` function. - Removed the previous implementation of the initial alignment that made a query on the key expression of the Storage. - Added a check after creating the `Replication` structure: if the Replication Log is empty, which indicates an empty Storage, then perform an initial alignment. Signed-off-by: Julien Loudet <[email protected]>
J-Loudet
force-pushed
the
refactor/storage-manager/initial-alignment-replica
branch
from
September 26, 2024 14:08
b00835b
to
b21bb1e
Compare
Charles-Schleich
approved these changes
Sep 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These commits change the way a replicated Storage starts: if it is empty
and configured to be replicated, it will attempt to align with an active
and compatible (i.e. same configuration) Replica before anything.
The previous behaviour made a query on the key expression of the
Storage. Although, it could, in some cases, actually perform the same
initial alignment, it could not guarantee to only query a Storage that
was configured to be replicated.