Storage Replication Align Queryable improvements #886

chachi · 2024-03-30T22:23:50Z

2 commits:

Incrementally replies with digest replies in order to avoid queries timing out w/o sending any data, especially in the case of very outdated storages.
Replace the unbounded queue for digests with a bounded queue to prevent infinitely growing digest queues when digest updates take a long time.

#1 is definitely valuable for long queries to make incremental progress. #2 is a bit less obvious if it's what y'all desire upstream. I'd strongly suggest something that's not unbounded as that's a recipe for disaster, but the sizing of that is where I don't know what y'all are thinking.

eclipse-zenoh-bot · 2024-03-30T22:24:02Z

@chachi If this pull request contains a bugfix or a new feature, then please consider using Closes #ISSUE-NUMBER syntax to link it to an issue.

J-Loudet · 2024-04-02T10:49:13Z

plugins/zenoh-plugin-storage-manager/src/replica/mod.rs

@@ -119,7 +119,7 @@ impl Replica {

        // Create channels for communication between components
        // channel to queue digests to be aligned
-        let (tx_digest, rx_digest) = flume::unbounded();
+        let (tx_digest, rx_digest) = flume::bounded(10);


@chachi Could you give us more insight as to this value of 10? Was it providing a good trade-off when you tried the replication for your use-case?

In all cases, hard-coding a value is not something we are particularly keen on doing (there are rare cases of "one size fits all"). It should instead be part of the configuration of the replication.

If you want to add to your PR I would happily review, otherwise I will create a dedicated issue such that I can address this when I will rework the replication.

No, this is entirely a speculative first stab at an improvement. I certainly agree that a configurable value would be better than hardcoding anything.

Truthfully, it feels like it would be better for this buffer management to be done entirely at the Zenoh level so that the backpressure happens when a Publisher tries to send its digest and it gets dropped because there's no space to receive.

Even better, frankly, would be to remove this middle queue entirely and just have all the digest processing happen on recv from the digest-sub. I'm not entirely sure what the value of doing just the JSON parsing and is_processed checking separately from the rest of the handling is.

J-Loudet · 2024-04-02T11:19:03Z

plugins/zenoh-plugin-storage-manager/src/replica/mod.rs

+                match tx.try_send((from.to_string(), digest)) {
                    Ok(()) => {}
-                    Err(e) => log::error!("[DIGEST_SUB] Error sending digest to aligner: {}", e),
+                    Err(e) => {
+                        // Trace because this can happen _a lot_ on busy channels.
+                        log::trace!("[DIGEST_SUB] Error sending digest to aligner: {}", e)
+                    }


I need to further investigate the implications of this change. As of today, I do not know if skipping digests could have unforeseen consequences on the replication (my first guess is that it doesn't but I want to make sure).

Yup, it's not a small change. Ultimately this system needs to have some sort of backpressure and dropping because as a network of storages grows, if anything is out of sync it's impossible to parse and process every digest w/o dropping.

chachi · 2024-04-02T13:13:12Z

Overall, this is more of a proof-of-concept PR than anything else. @J-Loudet if you'd like to just close it and make an issue, feel free to.

J-Loudet · 2024-04-16T10:58:23Z

See #937

Jack Morrison added 2 commits March 30, 2024 16:17

Queryable: Incrementally reply with alignments.

8272a88

Zenoh Storage: Prevent unbounded growth of replica queue.

cbd2717

J-Loudet requested changes Apr 2, 2024

View reviewed changes

J-Loudet mentioned this pull request Apr 16, 2024

Storage Replication refactoring #937

Open

J-Loudet closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage Replication Align Queryable improvements #886

Storage Replication Align Queryable improvements #886

chachi commented Mar 30, 2024

eclipse-zenoh-bot commented Mar 30, 2024

J-Loudet Apr 2, 2024

chachi Apr 2, 2024

J-Loudet Apr 2, 2024

chachi Apr 2, 2024

chachi commented Apr 2, 2024

J-Loudet commented Apr 16, 2024

Storage Replication Align Queryable improvements #886

Storage Replication Align Queryable improvements #886

Conversation

chachi commented Mar 30, 2024

eclipse-zenoh-bot commented Mar 30, 2024

J-Loudet Apr 2, 2024

Choose a reason for hiding this comment

chachi Apr 2, 2024

Choose a reason for hiding this comment

J-Loudet Apr 2, 2024

Choose a reason for hiding this comment

chachi Apr 2, 2024

Choose a reason for hiding this comment

chachi commented Apr 2, 2024

J-Loudet commented Apr 16, 2024