-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batching not working as expected #551
Comments
Hi, The batching logic is not designed to wait for batches to be full before processing - its more optimization for when there is more data to process (i.e. processing is slower than consuming from Kafka) and processing function supports batching - send in batches. So it looks like potentially your processing is faster than polling from Kafka - in that case you wont have full batches of records as there is not enough records accumulated in the queues yet. One thing to check is - consumer options for fetch size, max poll records etc - to make sure you are feeding enough records into Parallel Consumer per poll. Even with fast processing - if Topic partitions have enough data on them already and consumer is configured to return 10+ records per poll - you should get the batch filled. You could test polling parameters using plain Kafka Consumer and checking how many records per poll it actually returns - in test stub / simple application etc. |
Hi, |
In our case, we dropped the idea of using the batching mechanism in parallel consumer. But while doing analysis I have few observations given below.
max.poll.records
HTH! |
The problem appears to be that "work" (i.e., polled records) is being queued based on shards: Lines 250 to 263 in 7231f62
For processing order For both Lines 133 to 138 in 7231f62
This means that, for all modes except
As @doppelrittberger mentioned, this is counter-intuitive, as batching is most useful for multiple records in the same shard. Batching currently does not help me deal with many records of the same key, or many records in the same partition. @rkolesnev, are there any major downsides to lifting the above restriction? I think even ordered shards should be able to return multiple records, but I can't fully grasp the impact this may have on the system (e.g. offset tracking). |
Hi Team,
I mainly wanted to use the batching feature of Parallel Consumer so started doing a POC around it.
Currently the Kafka topic has 6 partitions and each partition has around 15k messages.
I wanted to consume the data in batches and each batch will be of 10 messages. Below are the code snippets of current parallel consumer configuration. Wanted the data to be consumed in a ordered way.
ParallelConsumerOptions
appProperties
Consumer Poll
preparePayload
Now despite setting the batching to 10, the data is being consumed in random size of batches(1,2,3 < 10). Could someone please help me out.
Thanks in advance.
Regards,
Dixit
The text was updated successfully, but these errors were encountered: