batching policy confusion #2967
-
i have a simple pipeline that pull messages from kafka in batch and persist to s3. The input batch size is 50 messages while the output is 20. i am expecting the output batch size is 20, which is not the case, it says 50. When i remove the input batching policy, the output batch size is 20 as expected. hope i understood the concept of batching in the context of redpanda connect correctly.
Many thanks! my connect.yaml as follow
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hey @pmak-852 👋
This is by design. Connect never shrinks batches, it only merges them if, for example, you have small batches which you want to batch together into larger batches (they get concatenated basically) and there's usually no reason to configure batching both at the input and output level. If you wish to shrink batches, then you can use a processor like |
Beta Was this translation helpful? Give feedback.
Hey @pmak-852 👋
This is by design. Connect never shrinks batches, it only merges them if, for example, you have small batches which you want to batch together into larger batches (they get concatenated basically) and there's usually no reason to configure batching both at the input and output level.
If you wish to shrink batches, then you can use a processor like
group_by
,group_by_value
,split
etc.