Support for (optional) ack timeout and nack delay for consumers #67

dionjansen · 2020-12-21T15:37:39Z

Implements: #45 , #46

General comments

I've added a reference to Microsoft.Bcl.AsyncInterfaces to support the IDisposable interface in net5.0. I only had this issue in VSCode for the test and sample projects. This does not influence the tests or running samples on VSC (on mac).
This is still a draft mostly to get the discussion going.
I would like to add some more unit tests though this is a bit hard without a proper mocking framework, would you be open to introducing something like Moq to allow testing classes in a shallow way? Or do you rather have I follow a different unit test strategy?

Discussion

I think we can split up the discussion in 2 parts.

The integration in the rest of the lib: where what is created, started and what is optional, general structure/ logic of the lib
The implementation of the tracker: performance concerns, disposing

@blankensteiner I think firstly I would like to get some thoughts on the current implementation, I mostly focussed now on setting things up so I don't break existing processes:

Integration

Right now I create a IMessageAcksTracker instance in the PulsarClient, though when clients do not have the consumer configured to use the tracking, a dummy InactiveMessageAcksTracker is passed. This instance is passed to the ConsumerChannelFactory and used when a channel is created. Does this setup makes sense to you?
The ConsumerChannel now expects a MessageQueue which wraps the tracker and the AsyncQueue. The channel now informs the tracker through this queue when acking (and later nacking which I can add later) and the dequeue method automatically starts tracking messages received.
When the tracker is started it runs indefinitely I find it hard to find a good place to start this thread, also since I need the consumer to call RedeliverUnacknowledgedMessages. Any ideas on how to improve this pattern? Perhaps a static method like some of the StateMonitor methods.
I wonder how batched messages fit into all this, does a batch have a single message id? And should I be able to just redeliver that message id to release the batch back to the broker?

Implementation

The MessageAcksTracker uses a polling mechanism to re-check for timed out messages (either due to being unacked for too long or the nack delay has been exceeded). Is this (generally) what you were thinking about too? Alternatively I could think of an approach where polling is done through a Timer. Or we could create individual Tasks for each added message to the tracker but I'm concerned of the overhead created by this.
Atm I haven't considered the scenario yet that only a nack delay is configured and not an ack timeout in which case we will not have to track all dequeued messages.

…nd tests

…nsumer channel and factory

blankensteiner · 2020-12-22T13:55:39Z

Hi @dionjansen
Thanks for the PR!
I'll try and answer the best I can :-)

I've added a reference to Microsoft.Bcl.AsyncInterfaces to support the IDisposable interface in net5.0. I only had this issue in VSCode for the test and sample projects. This does not influence the tests or running samples on VSC (on mac).

We have tried adding support for Visual Studio Code before, but sadly it's just not a pleasant road to go down. I have no idea why Microsoft has created two IDE's for C# and doesn't ensure that they behave the same.
Visual Studio Code will create nonsense warnings and require unnecessary changes to the code-base and therefore it's not supported for developing DotPulsar. Visual Studio Community Edition and the commercial offerings and Rider are supported, so you have to use one of those.

I would like to add some more unit tests though this is a bit hard without a proper mocking framework, would you be open to introducing something like Moq to allow testing classes in a shallow way? Or do you rather have I follow a different unit test strategy?

Feel free to add one or more of these (if you need them):

AutoFixture
AutoFixture.AutoNSubstitute
AutoFixture.Xunit2
NSubstitute

Right now I create a IMessageAcksTracker instance in the PulsarClient, though when clients do not have the consumer configured to use the tracking, a dummy InactiveMessageAcksTracker is passed. This instance is passed to the ConsumerChannelFactory and used when a channel is created. Does this setup makes sense to you?

Yes.

When the tracker is started it runs indefinitely I find it hard to find a good place to start this thread, also since I need the consumer to call RedeliverUnacknowledgedMessages. Any ideas on how to improve this pattern? Perhaps a static method like some of the StateMonitor methods.

The question is if you actually need to consumer or just the consumer channel. The tracking should start and end together with the MessageQueue/ConsumerChannel.

I wonder how batched messages fit into all this, does a batch have a single message id? And should I be able to just redeliver that message id to release the batch back to the broker?

That's a really good question. Bookkeeper stores batched messages as one, so if you have a batched message consisting of 5 messages and you ack 4 of them, but the last times out and you ask the broker to redeliver, you will get the entire batch again. You could keep track of this, but it will hurt performance. I don't know what the other clients do here, but maybe you could test that?

The MessageAcksTracker uses a polling mechanism to re-check for timed out messages (either due to being unacked for too long or the nack delay has been exceeded). Is this (generally) what you were thinking about too? Alternatively I could think of an approach where polling is done through a Timer. Or we could create individual Tasks for each added message to the tracker but I'm concerned of the overhead created by this.

I agree that a timer/task per message will hurt performance too much. Have one task for the entire Consumer/MessageQueue is the right solution. Waking up and looking at what needs to be redelivered. Here we need to find a thread-safe and performant way of storing and accessing this information.

Atm I haven't considered the scenario yet that only a nack delay is configured and not an ack timeout in which case we will not have to track all dequeued messages.

Just a boolean check on dequeue and ack? We also need to handle cumulative acknowledgment.

Consider this ackTimeout implementation. First ackTimeout should be giving as a TimeSpan (instead of a an int or long as milli- or micro-seconds).

When a message is dequeued, and if we have an ackTimeout, then we store the MessageId and StopWatch.GetTimestamp() in an "AwaitingAck" struct in a ConcurrencyQueue, let's call it "AwaitingAcks". Other suggestions for concurrent collections with fast insertion are welcome.

When a message is acknowledged, and if we have an ackTimeout, then we store the acks instead of removing them from "AwaitingAcks". If we want to remove them right away, then we need the "AwaitingAcks" collection to support both iteration and random deletion. We only need to store the highest cumulative ack we see (if there is one) and the MessageIds not included by that cumulative ack.

When the ackTracker wakes up and has calculated what ackTimeout is in StopWatch ticks (those are not the same as TimeSpan ticks). It will call StopWatch.GetTimestamp(). We will now TryPeek and Dequeue from "AwaitingAck" for as long as the tracker timestamp - AwaitingAck.Timestamp is larger than the calculated timeout.
If the MessageId is not acknowledged, it's added to a CommandRedeliverUnacknowledgedMessages (that we are reusing) and then send it (if MessageIds were added).
If the MessageId was acknowledged, then we can remove that MessageId from the AckedMessageIds.

Consider this nackTimeout implementation.

When a message is nacked, we added the MessageId(s) to a CommandRedeliverUnacknowledgedMessages that we are reusing (should our "RedeliverUnacknowledgedMessages" taking an enumerable of messageId actually have been called "NegativeAcknowledge"?).

When the nackTracker wakes up it will check if the CommandRedeliverUnacknowledgedMessages has MessageIds and if yes, then send it.

Writing such a detailed implementation description was not what I intended, but when I first get going.... :-D
Anyway, if it is unclear, then I can try and make those classes/structs and push them to master.

dionjansen · 2021-01-03T18:25:49Z

@blankensteiner I've tried to follow your suggestions and simplify the implementation a bit: I focussed just on the unacked message tracker. Perhaps it is anyway a good idea to separate these two mechanisms (nack tracking and unacked tracking) since they are configured independently of each other.

Feel free to add one or more of these (if you need them):

AutoFixture

AutoFixture.AutoNSubstitute

AutoFixture.Xunit2

NSubstitute

Added NSubstitute and AutoFixture.AutoNSubstitute. I started testing the unacked tracker with this but I'm not sure I'm using the correct pattern to verify if messages are being redelivered under different test conditions. Since I don't see any other unit tests that tests internal classes in this way I'm not sure if this strategy agrees with the rest of the lib, so let me know what you think.

The question is if you actually need to consumer or just the consumer channel. The tracking should start and end together with the MessageQueue/ConsumerChannel.

I am starting the thread now in the Pulsar client

pulsar-dotpulsar/src/DotPulsar/PulsarClient.cs

Lines 81 to 84 in 26cd957

    
           IUnackedMessageTracker unackedTracker = options.AckTimeoutMillis > 0 
        
               ? new UnackedMessageTracker(TimeSpan.FromMilliseconds(options.AckTimeoutMillis), TimeSpan.FromSeconds(1)) 
        
               : new InactiveUnackedMessageTracker(); 
        
           unackedTracker.Start(consumer);

The tracker is then passed to the channel factory, so it can be passed to a message queue that is passed to the channel when created. The tracker loop is stopped when disposed which occurs when the message queue is disposed, which in terms occurs when the channel is disposed. I'm still using IConsumer for start in order to avoid duplicate implementation of RedeliverUnacknowledgedMessages in the consumer.

That's a really good question. Bookkeeper stores batched messages as one, so if you have a batched message consisting of 5 messages and you ack 4 of them, but the last times out and you ask the broker to redeliver, you will get the entire batch again. You could keep track of this, but it will hurt performance. I don't know what the other clients do here, but maybe you could test that?

From what I can see in the ConsumerImpl in java in case of batch messages only one item is put in the tracker for a batch message using (ledgerId, entryId, partitionIndex). Then when acking it looks like the batch is again treated as a single message, but only if markAckForBatchMessage returns false which indicates not all messages in the batch have been acked yet. I don't really see from this implementation how they "keep track" of what to ack within the batch in this way, but there is a lot going one here that I can't make sense of. Does the adding removing mechanism I point out here to the tracker make any sense to you?

Consider this ackTimeout implementation. First ackTimeout should be giving as a TimeSpan (instead of a an int or long as milli- or micro-seconds).

done, I kept the configuration options of the consumer to milliseconds though.

When a message is dequeued, and if we have an ackTimeout, then we store the MessageId and StopWatch.GetTimestamp() in an "AwaitingAck" struct in a ConcurrencyQueue, let's call it "AwaitingAcks". Other suggestions for concurrent collections with fast insertion are welcome.

I was struggling a bit to create a comparable TimeSpan (since as you point out stopwatch ticks != timespan ticks). I followed this article and concluded:

pulsar-dotpulsar/src/DotPulsar/Internal/UnackedMessageTracker.cs

Lines 24 to 25 in f4725f5

    
           public TimeSpan Elapsed => TimeSpan.FromTicks( 
        
               (long) ((Stopwatch.GetTimestamp() - Timestamp) / (double)Stopwatch.Frequency * TimeSpan.TicksPerSecond));

Not casting frequency explicitly to double results in considerable loss of accuracy.

When a message is acknowledged, and if we have an ackTimeout, then we store the acks instead of removing them from "AwaitingAcks". If we want to remove them right away, then we need the "AwaitingAcks" collection to support both iteration and random deletion. We only need to store the highest cumulative ack we see (if there is one) and the MessageIds not included by that cumulative ack.

How can I determine if there is a highest cumulative ack from a MessageId instance? And if we do this wouldn't we also need some kind of removal from the unacked list that removes until this highest value, like: removeMessagesTill?

When the ackTracker wakes up and has calculated what ackTimeout is in StopWatch ticks (those are not the same as TimeSpan ticks). It will call StopWatch.GetTimestamp(). We will now TryPeek and Dequeue from "AwaitingAck" for as long as the tracker timestamp - AwaitingAck.Timestamp is larger than the calculated timeout.
If the MessageId is not acknowledged, it's added to a CommandRedeliverUnacknowledgedMessages (that we are reusing) and then send it (if MessageIds were added).
If the MessageId was acknowledged, then we can remove that MessageId from the AckedMessageIds.

I tried to capture this in

pulsar-dotpulsar/src/DotPulsar/Internal/UnackedMessageTracker.cs

Lines 81 to 91 in 51f4a98

    
           while (_awaitingAcks.TryPeek(out AwaitingAck awaiting) 
        
               && awaiting.Elapsed > _ackTimeout) 
        
           { 
        
               if (_awaitingAcks.TryDequeue(out awaiting)) 
        
               { 
        
                   if (!_acked.Contains(awaiting.MessageId)) 
        
                       result.Add(awaiting.MessageId); 
        
                   else 
        
                       _acked.Remove(awaiting.MessageId); 
        
               } 
        
           }

. The only thing I'm still unsure about is the accumulated acking.

Consider this nackTimeout implementation.

When a message is nacked, we added the MessageId(s) to a CommandRedeliverUnacknowledgedMessages that we are reusing (should our "RedeliverUnacknowledgedMessages" taking an enumerable of messageId actually have been called "NegativeAcknowledge"?).

When the nackTracker wakes up it will check if the CommandRedeliverUnacknowledgedMessages has MessageIds and if yes, then send it.

I will implement the nack tracker if you are happy with the unacked tracker as it stands (which might also be refactored into one single tracker if two trackers are a performance concern to you).

Writing such a detailed implementation description was not what I intended, but when I first get going.... :-D
Anyway, if it is unclear, then I can try and make those classes/structs and push them to master.

Let me know what you think, this is still a bit of a learning process for me both on the internals of this lib as well as C# / Pulsar details. Thanks in advance 👍 .

src/DotPulsar/ConsumerOptions.cs

src/DotPulsar/Internal/Abstractions/IMessageAcksTracker.cs

tests/DotPulsar.Tests/Internal/MessageAcksTrackerTests.cs

blankensteiner · 2021-01-06T10:04:20Z

Hi @dionjansen
Before doing a deep dive into the implementation, I have some comments on little things that can quickly be fixed :-)

dionjansen · 2021-01-06T18:45:14Z

@blankensteiner thanks for the comments I managed to fix most of it with a view side comments/ questions (see above). Let me know what you think

src/DotPulsar/Internal/ConsumerChannel.cs

src/DotPulsar/Internal/InactiveUnackedMessageTracker.cs

tests/DotPulsar.Tests/Internal/MessageAcksTrackerTests.cs

src/DotPulsar/Internal/UnackedMessageTracker.cs

src/DotPulsar/Internal/Abstractions/IUnackedMessageTracker.cs

src/DotPulsar/Internal/MessageQueue.cs

…edgedMessages using MessageIdData

dionjansen · 2021-01-15T17:04:52Z

@blankensteiner sorry for the delay, I addressed all your remarks, let me know what you think!

jbvanzuylen · 2021-07-09T14:37:54Z

@blankensteiner @dionjansen any idea when this work will be finished and merged? Looks like a lot of work has been done and reviewed but wondering what is missing to cross the finish line.

blankensteiner · 2021-07-10T20:51:27Z

Hi @jbvanzuylen
Good question :-)
@dionjansen when you feel the PR is ready, poke me and I'll review it again :-)

dionjansen · 2021-07-11T16:16:44Z

@jbvanzuylen @blankensteiner yes this dropped very far off my radar, unfortunately. I see quite a lot has changed since this implementation, I'll try to open a new PR from the latest version.

dionjansen · 2021-07-23T16:56:59Z

@blankensteiner I reworked the implementation from scratch based on version 1.1.2 and opened a new PR #83. I've also added support for negative acknowledgement delays next to the unacked tracking.

Closing this PR since it's no longer needed.

CC: @jbvanzuylen

dionjansen added 10 commits December 19, 2020 16:22

Added missing reference to Microsoft.Bcl.AsyncInterfaces in samples a…

4eb2b06

…nd tests

nack delay option

dfb864a

ack timeout config option

370e1c8

naive implementation tracker and message queue

278005f

use message package for queue

79bc906

use int for ms values

546d9e9

Non generic interface

5e3cea6

Added basic implementation for tracker, integrated messagequeue in co…

19931bc

…nsumer channel and factory

Create conditional tracker in client

295eab8

Inform queue when acking

2be5114

Cleanup references for VSCode

163ea45

blankensteiner added the enhancement New feature or request label Dec 22, 2020

dionjansen added 6 commits December 23, 2020 12:17

Merge branch 'master' into nack

6fc4550

Unacked tracker only

7da197e

Cleanup

d759c06

Added test suite for unacked message tracker

26cd957

Refactored elapsed calc

f4725f5

cleanup

51f4a98

blankensteiner reviewed Jan 6, 2021

View reviewed changes

dionjansen added 8 commits January 6, 2021 12:42

Cleanup old tracker

0119ca9

Apache header for new files

7ad89a2

Cleanup inactive message tracker

a7d5f02

Cleanup stress tests deps

98850bb

Cleanup message queue and interface

2edfa35

Formatting tracker

b78ac54

Fixed tests naming, separated tests for struct

b57668b

Renamed options, use timespans instad of micros

871dce6

blankensteiner reviewed Jan 7, 2021

View reviewed changes

dionjansen added 4 commits January 15, 2021 17:13

Moved IEquatable and IComparable into MessageIdData partial class

8841930

Renamed ack to full acknowledge, cleaned up tests

ddfd374

Refactor use for MessageId to MessageIdData, added RedeliverUnacknowl…

ad58271

…edgedMessages using MessageIdData

Refactored Start method to not use unnecessary Task.run

f02166b

Merge branch 'master' into nack

5771533

dionjansen closed this Jul 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for (optional) ack timeout and nack delay for consumers #67

Support for (optional) ack timeout and nack delay for consumers #67

dionjansen commented Dec 21, 2020

blankensteiner commented Dec 22, 2020

dionjansen commented Jan 3, 2021

blankensteiner commented Jan 6, 2021

dionjansen commented Jan 6, 2021

dionjansen commented Jan 15, 2021

jbvanzuylen commented Jul 9, 2021

blankensteiner commented Jul 10, 2021

dionjansen commented Jul 11, 2021

dionjansen commented Jul 23, 2021

Support for (optional) ack timeout and nack delay for consumers #67

Support for (optional) ack timeout and nack delay for consumers #67

Conversation

dionjansen commented Dec 21, 2020

General comments

Discussion

Integration

Implementation

blankensteiner commented Dec 22, 2020

dionjansen commented Jan 3, 2021

blankensteiner commented Jan 6, 2021

dionjansen commented Jan 6, 2021

dionjansen commented Jan 15, 2021

jbvanzuylen commented Jul 9, 2021

blankensteiner commented Jul 10, 2021

dionjansen commented Jul 11, 2021

dionjansen commented Jul 23, 2021