From 07621a93c068c3a523adf063141292a9aa2bb726 Mon Sep 17 00:00:00 2001 From: "mergify[bot]" <37929162+mergify[bot]@users.noreply.github.com> Date: Mon, 6 Nov 2023 10:35:51 -0500 Subject: [PATCH] [7.17](backport #37022) Clarify the role of flush.min_events and bulk_max_size in docs (#37044) * Clarify the role of flush.min_events and bulk_max_size in docs (#37022) * Clarify the role of flush.min_events and bulk_max_size * Clarify that min_events is only max batch when > 1. * Fix another typo. * Improve flush.min_events parameter documentation. (cherry picked from commit 0f8fc26ca766d44e9bf8adb2a7e299917867b03e) # Conflicts: # libbeat/outputs/elasticsearch/docs/elasticsearch.asciidoc * Resolve conflict --------- Co-authored-by: Craig MacKenzie --- libbeat/docs/queueconfig.asciidoc | 41 +++++++++++-------- .../elasticsearch/docs/elasticsearch.asciidoc | 6 ++- .../outputs/logstash/docs/logstash.asciidoc | 7 ++-- libbeat/outputs/redis/docs/redis.asciidoc | 7 ++-- 4 files changed, 36 insertions(+), 25 deletions(-) diff --git a/libbeat/docs/queueconfig.asciidoc b/libbeat/docs/queueconfig.asciidoc index 054379e1b20..52d962a0cf2 100644 --- a/libbeat/docs/queueconfig.asciidoc +++ b/libbeat/docs/queueconfig.asciidoc @@ -28,20 +28,24 @@ queue.mem: The memory queue keeps all events in memory. -If no flush interval and no number of events to flush is configured, -all events published to this queue will be directly consumed by the outputs. -To enforce spooling in the queue, set the `flush.min_events` and `flush.timeout` options. - -By default `flush.min_events` is set to 2048 and `flush.timeout` is set to 1s. - -The output's `bulk_max_size` setting limits the number of events being processed at once. - The memory queue waits for the output to acknowledge or drop events. If the queue is full, no new events can be inserted into the memory queue. Only after the signal from the output will the queue free up space for more events to be accepted. -This sample configuration forwards events to the output if 512 events are -available or the oldest available event has been waiting for 5s in the queue: +The memory queue is controlled by the parameters `flush.min_events` and `flush.timeout`. If +`flush.timeout` is `0s` or `flush.min_events` is `0` or `1` then events can be sent by the output as +soon as they are available. If the output supports a `bulk_max_size` parameter it controls the +maximum batch size that can be sent. + +If `flush.min_events` is greater than `1` and `flush.timeout` is greater than `0s`, events will only +be sent to the output when the queue contains at least `flush.min_events` events or the +`flush.timeout` period has expired. In this mode the maximum size batch that that can be sent by the +output is `flush.min_events`. If the output supports a `bulk_max_size` parameter, values of +`bulk_max_size` greater than `flush.min_events` have no effect. The value of `flush.min_events` +should be evenly divisible by `bulk_max_size` to avoid sending partial batches to the output. + +This sample configuration forwards events to the output if 512 events are available or the oldest +available event has been waiting for 5s in the queue: [source,yaml] ------------------------------------------------------------------------------ @@ -52,31 +56,34 @@ queue.mem: ------------------------------------------------------------------------------ [float] -==== Configuration options +=== Configuration options You can specify the following options in the `queue.mem` section of the +{beatname_lc}.yml+ config file: [float] ===== `events` -Number of events the queue can store. +Number of events the queue can store. This value should be evenly divisible by `flush.min_events` to +avoid sending partial batches to the output. The default value is 4096 events. [float] ===== `flush.min_events` -Minimum number of events required for publishing. If this value is set to 0, the -output can start publishing events without additional waiting times. Otherwise -the output has to wait for more events to become available. +Minimum number of events required for publishing. If this value is set to 0 or 1, events are +available to the output immediately. If this value is greater than 1 the output must wait for the +queue to accumulate this minimum number of events or for `flush.timeout` to expire before +publishing. When greater than `1` this value also defines the maximum possible batch that can be +sent by the output. The default value is 2048. [float] ===== `flush.timeout` -Maximum wait time for `flush.min_events` to be fulfilled. If set to 0s, events -will be immediately available for consumption. +Maximum wait time for `flush.min_events` to be fulfilled. If set to 0s, events are available to the +output immediately. The default value is 1s. diff --git a/libbeat/outputs/elasticsearch/docs/elasticsearch.asciidoc b/libbeat/outputs/elasticsearch/docs/elasticsearch.asciidoc index 9ecc9972917..15a923b6f44 100644 --- a/libbeat/outputs/elasticsearch/docs/elasticsearch.asciidoc +++ b/libbeat/outputs/elasticsearch/docs/elasticsearch.asciidoc @@ -637,8 +637,10 @@ endif::[] The maximum number of events to bulk in a single Elasticsearch bulk API index request. The default is 50. -Events can be collected into batches. {beatname_uc} will split batches larger than `bulk_max_size` -into multiple batches. +Events can be collected into batches. When using the memory queue with `queue.mem.flush.min_events` +set to a value greater than `1`, the maximum batch is is the value of `queue.mem.flush.min_events`. +{beatname_uc} will split batches read from the queue which are larger than `bulk_max_size` into +multiple batches. Specifying a larger batch size can improve performance by lowering the overhead of sending events. However big batch sizes can also increase processing times, which might result in diff --git a/libbeat/outputs/logstash/docs/logstash.asciidoc b/libbeat/outputs/logstash/docs/logstash.asciidoc index 054c74fd69d..3d0151d0762 100644 --- a/libbeat/outputs/logstash/docs/logstash.asciidoc +++ b/libbeat/outputs/logstash/docs/logstash.asciidoc @@ -354,9 +354,10 @@ endif::[] The maximum number of events to bulk in a single {ls} request. The default is 2048. -If the Beat sends single events, the events are collected into batches. If the Beat publishes -a large batch of events (larger than the value specified by `bulk_max_size`), the batch is -split. +Events can be collected into batches. When using the memory queue with `queue.mem.flush.min_events` +set to a value greater than `1`, the maximum batch is is the value of `queue.mem.flush.min_events`. +{beatname_uc} will split batches read from the queue which are larger than `bulk_max_size` into +multiple batches. Specifying a larger batch size can improve performance by lowering the overhead of sending events. However big batch sizes can also increase processing times, which might result in diff --git a/libbeat/outputs/redis/docs/redis.asciidoc b/libbeat/outputs/redis/docs/redis.asciidoc index 8483067f3ed..090430cc779 100644 --- a/libbeat/outputs/redis/docs/redis.asciidoc +++ b/libbeat/outputs/redis/docs/redis.asciidoc @@ -214,9 +214,10 @@ endif::[] The maximum number of events to bulk in a single Redis request or pipeline. The default is 2048. -If the Beat sends single events, the events are collected into batches. If the -Beat publishes a large batch of events (larger than the value specified by -`bulk_max_size`), the batch is split. +Events can be collected into batches. When using the memory queue with `queue.mem.flush.min_events` +set to a value greater than `1`, the maximum batch is is the value of `queue.mem.flush.min_events`. +{beatname_uc} will split batches read from the queue which are larger than `bulk_max_size` into +multiple batches. Specifying a larger batch size can improve performance by lowering the overhead of sending events. However big batch sizes can also increase processing times,