respect 500~1000 linger.ms for high throughput but medium latency use cases - fire-and-forget #863

ericsun2 · 2024-11-27T05:00:15Z

 ProducerLinger sets how long individual topic partitions will linger waiting
 for more records before triggering a request to be built.

 Note that this option should only be used in low volume producers. The only
 benefit of lingering is to potentially build a larger batch to reduce cpu
 usage on the brokers if you have many producers all producing small amounts.

 If a produce request is triggered by any topic partition, all partitions
 with a possible batch to be sent are used and all lingers are reset.

 As mentioned, the linger is specific to topic partition. A high volume
 producer will likely be producing to many partitions; it is both unnecessary
 to linger in this case and inefficient because the client will have many
 timers running (and stopping and restarting) unnecessarily.

Let's say we have a high-volume topic with 60 partitions and 700MiB/sec peak ingress throughput.
We want to optimize the broker efficiency with bigger bath size. In theory, we expect 11MiB/sec per partition with 1MiB per chunk or batch. But Franz-Go typically sends each chunk with only 1KB~2KB size only, even if we set linger.ms to 1000.

Is there any way we can tweak Franz-Go to better batch up events into 4~6MB chunk before compression (and 1MB after compression)?

The text was updated successfully, but these errors were encountered:

twmb · 2024-11-28T19:12:01Z

Trying to estimate batch size after compression can only be done via a heuristic and can be fraught with problems. Worst case, the client estimates poorly and creates a compressed batch that is larger than the max batch bytes. Instead, the client buffers by uncompressed size and once linger or max batch size is hit, creates a batch -- compressing in the process.

If you want to try working around this from a user perspective, you could try increasing the max batch bytes -- e.g. if you know you have a pretty consistent 50% compression ratio, you could double the max batch bytes.

baganokodo2022 · 2024-12-01T01:20:57Z

Hi @twmb,

@ericsun2 and I have been using rand.Read(data) to generate random binary payloads for our tests, resulting in a limited compression ratio. With a publishing throughput of 600K messages per second, each 1KiB in size, the Kafka broker reports an ingestion rate of approximately 600 MiB per second for a 64-partition topic. On each partition, the ingestion speed is around 9 MiB or 9K messages per second.

To optimize batching, we expanded the following producer configurations:

ProducerLinger: Increased to 1s, 2s, 5s, and 10s.
ProducerBatchMaxBytes: Set to 1 MiB.
MaxBufferedRecords: Set to 1 million.

Our goal was to achieve a batched message size of approximately 1 MiB or 1K messages per batch. However, we observed that the NumRecords in a batch is capped at 76, significantly below the expected size.

While reviewing the Franz-go source code, I noticed that whenever sink.maybeDrain() triggers a call to createReq(), all recBufs are drained simultaneously. Even if only one recBuf reaches the maxRecordBatchBytes, the remaining recBufs are prematurely added to the request, leading to an early drain.

Is this behavior intentional, perhaps to improve throughput or reduce latency? I’m curious if my interpretation is correct.

Thanks!

twmb added the waiting label Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

respect 500~1000 linger.ms for high throughput but medium latency use cases - fire-and-forget #863

respect 500~1000 linger.ms for high throughput but medium latency use cases - fire-and-forget #863

ericsun2 commented Nov 27, 2024

twmb commented Nov 28, 2024

baganokodo2022 commented Dec 1, 2024 •

edited

Loading

respect 500~1000 linger.ms for high throughput but medium latency use cases - fire-and-forget #863

respect 500~1000 linger.ms for high throughput but medium latency use cases - fire-and-forget #863

Comments

ericsun2 commented Nov 27, 2024

twmb commented Nov 28, 2024

baganokodo2022 commented Dec 1, 2024 • edited Loading

baganokodo2022 commented Dec 1, 2024 •

edited

Loading