-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage-manager): enforce History::Latest based on Timestamp #1367
Conversation
PR missing one of the required labels: {'documentation', 'breaking-change', 'dependencies', 'internal', 'bug', 'enhancement', 'new feature'} |
@Mallets @JEnoch |
aa79fcb
to
1ac5ccb
Compare
This commit forces all Storage that have the capability `History::Latest` to only keep the Latest value based on their Timestamp. This is achieved by (i) keeping track of the Timestamp associated to all key expressions (for which a publication was received) and (ii) discarding received publications that have an older Timestamp (compared to what is tracked for the same key expression). The main motivation is the Replication (or Storage Alignment) feature: the algorithm we use rely on the assumption that a Storage with the capability `History::Latest` only keeps the latest publication based on the Timestamp. If that property is not upheld, no guarantee can be made on the Replication algorithm. Hence, to avoid leaving this responsibility on Storage developers, it has been decided to enforce that behaviour at the Storage Manager level. This is what this commit does. Additionally, to avoid having the `latest_updates` structure grow infinitely, it was included in the `GarbageCollectionEvent` structure to be garbage collected at regular interval. This garbage collection is controlled by the `GarbageCollectionConfig` which defaults to removing entries that are more than 24h old. * plugins/zenoh-plugin-storage-manager/src/replica/storage.rs: - Add a new field `latest_updates` to the `StorageService` structure, which is used to keep track of the latest publication timestamp for all stripped key expression for which the Storage received a publication. - In the `process_sample` function, add a check to ensure that the Sample being processed is more recent than what is currently stored in the Storage. If the Sample is less recent, it is discarded. - If the `process_sample` function, after the Sample was successfully inserted, update the `latest_updates` structure with the more recent Timestamp. Note that the `StorageInsertionResult::Outdated` could still be used as an escape hatch for a Storage to filter Samples on something else than the Timestamp. - In the `GarbageCollectionEvent`, add the `last_updates` structure such that it is garbage collected. - Updated the `run` method of the `GarbageCollectionEvent` to retain only entries that are less than `GarbageCollectionConfig::lifespan` old. Signed-off-by: Julien Loudet <[email protected]>
1ac5ccb
to
87f4791
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I checked relevant backends: none was enforcing the timestamps.
Commit 87f4791 (#1367) introduced a cache to keep track of the timestamps of the publication for each key expression reaching a Storage. The purpose of this cache is to only allow more recent publication to reach the Storage. For lack of a better understanding / reading of the code base, this verification was already performed through the `is_latest` function. Hence, only one of the two verification should remain. This commit removes the `is_latest` function in favour of the more efficient cache: `is_latest` was calling the Storage for every publication to retrieve the latest value associated to the key expression. Note that, in its current state, the cache strategy is not enough: in the unlikely scenario where no publication was made for the `lifespan` of a cached entry (i.e. the entry was garbage collected) and a new publication with an older timestamp is received, then this older publication will reach the Storage. * plugins/zenoh-plugin-storage-manager/src/replica/storage.rs: remove the `is_latest` function. Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches: the method `is_latest` will check if there is an entry in the cache and, if not, it will query the Storage for the latest value. In addition, the cache is now filled at initialisation. This approach has one main advantage: the Storage will only be queried for missing key expressions (because of garbage collection for old keys), hence reducing its load. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches: the method `is_latest` will check if there is an entry in the cache and, if not, it will query the Storage for the latest value. In addition, the cache is now filled at initialisation. This approach has one main advantage: the Storage will only be queried for missing key expressions (because of garbage collection for old keys), hence reducing its load. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-backend-example/src/lib.rs: * plugins/zenoh-backend-traits/src/lib.rs: * plugins/zenoh-plugin-storage-manager/src/memory_backend/mod.rs: removed the obsolete `read_cost` parameter. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-backend-example/src/lib.rs: * plugins/zenoh-backend-traits/src/lib.rs: * plugins/zenoh-plugin-storage-manager/src/memory_backend/mod.rs: removed the obsolete `read_cost` parameter. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-backend-example/src/lib.rs: * plugins/zenoh-backend-traits/src/lib.rs: * plugins/zenoh-plugin-storage-manager/src/memory_backend/mod.rs: removed the obsolete `read_cost` parameter. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-backend-example/src/lib.rs: * plugins/zenoh-backend-traits/src/lib.rs: * plugins/zenoh-plugin-storage-manager/src/memory_backend/mod.rs: removed the obsolete `read_cost` parameter. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-backend-example/src/lib.rs: * plugins/zenoh-backend-traits/src/lib.rs: * plugins/zenoh-plugin-storage-manager/src/memory_backend/mod.rs: removed the obsolete `read_cost` parameter. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
The changes introduced in #1367 were actually redundant (and incomplete): the method `is_latest` was checking that a received Sample was more recent than what is currently stored. This commit removes the redundant checks and implements a correct caching strategy by leveraging both approaches. The changes consist mostly in: 1. Changing the outer test by continuing early if the Sample is detected as deleted. This allows lowering the overall indentation of the `process_sample` method by one level. 2. Modifying the behaviour of the `is_latest` method to check the cache first and only query the Storage if there is a cache miss. To help make potential concurrency issues visible, the method `is_latest` was renamed to `guard_cache_if_latest` as the lock on the Cache should only be released when the Sample has been processed *and* the Cache updated. Additionally, the cache is filled at initialisation. The `read_cost` configuration parameter was removed as it was not used and made obsolete with this change. * plugins/zenoh-backend-example/src/lib.rs: * plugins/zenoh-backend-traits/src/lib.rs: * plugins/zenoh-plugin-storage-manager/src/memory_backend/mod.rs: removed the obsolete `read_cost` parameter. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs: - Populate the Cache at initialisation. - Updated the call to `StorageService::start` to also pass the Cache. * plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs - Updated the call to `StorageService::start` to also pass the Cache. - "Reversed" the logic when checking if a Sample is deleted: if it is continue to the next one. This allows reducing the indentation level. - Keep the guard over the cache if a Sample is more recent and only release it once it has been updated (after the Sample has been successfully processed). Signed-off-by: Julien Loudet <[email protected]>
) This commit forces all Storage that have the capability `History::Latest` to only keep the Latest value based on their Timestamp. This is achieved by (i) keeping track of the Timestamp associated to all key expressions (for which a publication was received) and (ii) discarding received publications that have an older Timestamp (compared to what is tracked for the same key expression). The main motivation is the Replication (or Storage Alignment) feature: the algorithm we use rely on the assumption that a Storage with the capability `History::Latest` only keeps the latest publication based on the Timestamp. If that property is not upheld, no guarantee can be made on the Replication algorithm. Hence, to avoid leaving this responsibility on Storage developers, it has been decided to enforce that behaviour at the Storage Manager level. This is what this commit does. Additionally, to avoid having the `latest_updates` structure grow infinitely, it was included in the `GarbageCollectionEvent` structure to be garbage collected at regular interval. This garbage collection is controlled by the `GarbageCollectionConfig` which defaults to removing entries that are more than 24h old. * plugins/zenoh-plugin-storage-manager/src/replica/storage.rs: - Add a new field `latest_updates` to the `StorageService` structure, which is used to keep track of the latest publication timestamp for all stripped key expression for which the Storage received a publication. - In the `process_sample` function, add a check to ensure that the Sample being processed is more recent than what is currently stored in the Storage. If the Sample is less recent, it is discarded. - If the `process_sample` function, after the Sample was successfully inserted, update the `latest_updates` structure with the more recent Timestamp. Note that the `StorageInsertionResult::Outdated` could still be used as an escape hatch for a Storage to filter Samples on something else than the Timestamp. - In the `GarbageCollectionEvent`, add the `last_updates` structure such that it is garbage collected. - Updated the `run` method of the `GarbageCollectionEvent` to retain only entries that are less than `GarbageCollectionConfig::lifespan` old. Signed-off-by: Julien Loudet <[email protected]>
) This commit forces all Storage that have the capability `History::Latest` to only keep the Latest value based on their Timestamp. This is achieved by (i) keeping track of the Timestamp associated to all key expressions (for which a publication was received) and (ii) discarding received publications that have an older Timestamp (compared to what is tracked for the same key expression). The main motivation is the Replication (or Storage Alignment) feature: the algorithm we use rely on the assumption that a Storage with the capability `History::Latest` only keeps the latest publication based on the Timestamp. If that property is not upheld, no guarantee can be made on the Replication algorithm. Hence, to avoid leaving this responsibility on Storage developers, it has been decided to enforce that behaviour at the Storage Manager level. This is what this commit does. Additionally, to avoid having the `latest_updates` structure grow infinitely, it was included in the `GarbageCollectionEvent` structure to be garbage collected at regular interval. This garbage collection is controlled by the `GarbageCollectionConfig` which defaults to removing entries that are more than 24h old. * plugins/zenoh-plugin-storage-manager/src/replica/storage.rs: - Add a new field `latest_updates` to the `StorageService` structure, which is used to keep track of the latest publication timestamp for all stripped key expression for which the Storage received a publication. - In the `process_sample` function, add a check to ensure that the Sample being processed is more recent than what is currently stored in the Storage. If the Sample is less recent, it is discarded. - If the `process_sample` function, after the Sample was successfully inserted, update the `latest_updates` structure with the more recent Timestamp. Note that the `StorageInsertionResult::Outdated` could still be used as an escape hatch for a Storage to filter Samples on something else than the Timestamp. - In the `GarbageCollectionEvent`, add the `last_updates` structure such that it is garbage collected. - Updated the `run` method of the `GarbageCollectionEvent` to retain only entries that are less than `GarbageCollectionConfig::lifespan` old. Signed-off-by: Julien Loudet <[email protected]>
This commit forces all Storage that have the capability
History::Latest
to only keep the Latest value based on their Timestamp. This is achieved by (i) keeping track of the Timestamp associated to all key expressions (for which a publication was received) and (ii) discarding received publications that have an older Timestamp (compared to what is tracked for the same key expression).The main motivation is the Replication (or Storage Alignment) feature: the algorithm we use rely on the assumption that a Storage with the capability
History::Latest
only keeps the latest publication based on the Timestamp. If that property is not upheld, no guarantee can be made on the Replication algorithm.Hence, to avoid leaving this responsibility on Storage developers, it has been decided to enforce that behaviour at the Storage Manager level. This is what this commit does.
Additionally, to avoid having the
latest_updates
structure grow infinitely, it was included in theGarbageCollectionEvent
structure to be garbage collected at regular interval. This garbage collection is controlled by theGarbageCollectionConfig
which defaults to removing entries that are more than 24h old.latest_updates
to theStorageService
structure, which is used to keep track of the latest publication timestamp for all stripped key expression for which the Storage received a publication.process_sample
function, add a check to ensure that the Sample being processed is more recent than what is currently stored in the Storage. If the Sample is less recent, it is discarded.process_sample
function, after the Sample was successfully inserted, update thelatest_updates
structure with the more recent Timestamp. Note that theStorageInsertionResult::Outdated
could still be used as an escape hatch for a Storage to filter Samples on something else than the Timestamp.GarbageCollectionEvent
, add thelast_updates
structure such that it is garbage collected.run
method of theGarbageCollectionEvent
to retain only entries that are less thanGarbageCollectionConfig::lifespan
old.