-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NBFT reliability #669
Comments
New requirement added: allow to use unbounded channel in lomac — Yesterday at 8:57 AM Basically, I am using a session.put(xxx) in an async way but it is blocking nevertheless (because of local subscriber and zenoh try to put the sample in its local flume channel). Pierre 'Fox' Avital — Yesterday at 9:30 AM Your analysis is correct. Specifically, the core of the issue is that Zenoh doesn't have a way to run asynchronous callbacks, so pushing onto the channel can only be done synchronously. With a bounded channel, if the channel is full, the synchronous implementation has no choice but to block the thread (either parking it, or looping on some condition), since it can't return before the operation succeeds the receiver dies out, needing another task to be performed before it can return in both cases (reading from the channel or destroying its receiver end). While we are working at supporting asynchronous callbacks, I think using unbounded channels is the correct approach here regardless: since we run callbacks on the read task to minimize latency, the trade-off is that a callback (even asynchronous) not returning will block the reading task, preventing other messages on the same link from being read until the callback returns. Channels let you do the inverse trade-off of allowing parallel processing of messages at the cost of a bit of latency, so we do expect them to be often used by default. Conclusion: yes, unbounded channels will solve your issue, and if you expect bursty traffic, are probably the most well-suited channels :) My personal opinion on bounded channels is that they tend to be overused. Mostly because rather than to apply backpressure (the reason why they're Zenoh's default, and the only valid use for them IMO), they're often used as a premature optimization or chasing the delusion that some people have that this will let them control how much memory their programs use more finely (usually while pushing into an ever growing hashmap somewhere else to use as cache :p) lomac — Today at 10:09 AM It seems I have no control on the internal flume channel and the publication cache use the bounded version from the DefaultHandler. |
Regarding last comment, I think it would be even better to implement the publication cache with a callback subscriber rather than a flume subscriber. It would imply some mutex but could be better overall. |
Describe the feature
NBFT means Non-Blocking Fault-Tolerance
Based on PublicationCache and QueryingSubscriber. Newer implementation is in branch #401 (@OlivierHecart, is it correct one?)
Goals:
Subscriber when started pulls history of packets from publisher
Subscriber when detects missing packet makes query to publisher to restore them (by sequence number)
Subscriber keeps order of packets when giving them to user (holds onguing packets until query is done)
Subscriber detects new publishers and pulls history from them (through liveliness)
Subscriber makes sure that packets are not repeated (existing impl. have this problem: same packet from query and sub)
Pulling history, missing packets detection, late-comers publisher detection are orthogonal and can be enabled/disabled when constructing
Additional requirements:
allow to use unbounded channel in
PublicationCache
allow to use liveliness support without caching, just for monitoring
The text was updated successfully, but these errors were encountered: