Skip to content

Commit

Permalink
address review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
1pkg committed Jul 31, 2024
1 parent e8de32d commit d76bb44
Showing 1 changed file with 18 additions and 11 deletions.
29 changes: 18 additions & 11 deletions docs/en/observability/apm/apm-performance-diagnostic.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
==== Diagnosing backpressure from {es}

When {es} is under excessive load or indexing pressure, APM Server could experience the downstream backpressure when indexing new documents into {es}.
Most commonly backpressure from {es} will manifest itself in a form of higher indexing latency and/or rejected requests, which in return could lead to denied incoming requests by APM Server.
Most commonly, backpressure from {es} will manifest itself in the form of higher indexing latency and/or rejected requests, which in return could lead APM Server to deny incoming requests.
As a result APM agents connected to the affected APM Server will suffer from throttling and/or request timeout when shipping APM events.

To quickly identify possible issues try looking for similar error logs lines in APM Server logs:
Expand All @@ -19,17 +19,24 @@ To quickly identify possible issues try looking for similar error logs lines in
...
----

To gain better insight into APM Server health and performance please consider enabling the monitoring feature by following the guide from <<apm-monitor-apm-self-install>>.
To gain better insight into APM Server health and performance, consider enabling the monitoring feature by following the steps in <<apm-monitor-apm-self-install,Monitor a Fleet-managed APM Server>>.
When enabled APM Server will additionally report a set of vital metrics to help you identify any performance degradation.
Pay careful attention to the next metric fields:
- `beats_stats.metrics.libbeat.output.events.active` that represents the number of buffered pending documents waiting for indexing;
- `beats_stats.metrics.libbeat.output.events.acked` that represents the number of indexing operations that have completed successfully;
- `beats_stats.metrics.libbeat.output.events.failed` that represents the number of indexing operations that failed, it includes all failures;
- `beats_stats.metrics.libbeat.output.events.toomany` that represents the number of indexing operations that failed due to {es} responding with 429 Too many Requests;
- `beat.stats.output.elasticsearch.bulk_requests.active` that represents the number of bulk indexers available for making bulk index requests;
- `beat.stats.output.elasticsearch.bulk_requests.completed` that represents the number of already completed bulk requests;
- `beats_stats.metrics.output.elasticsearch.indexers.active` that represents the number of active bulk indexers that are concurrently processing batches;

* `beats_stats.metrics.libbeat.output.events.active` that represents the number of buffered pending documents waiting for indexing;
(_if this value is increasing rapidly indicates {es} backpressure_)
* `beats_stats.metrics.libbeat.output.events.acked` that represents the number of indexing operations that have completed successfully;
* `beats_stats.metrics.libbeat.output.events.failed` that represents the number of indexing operations that failed, it includes all failures;
(_if this value is increasing rapidly indicates {es} backpressure_)
* `beats_stats.metrics.libbeat.output.events.toomany` that represents the number of indexing operations that failed due to {es} responding with 429 Too many Requests;
(_if this value is increasing rapidly indicates {es} backpressure_)
* `beats_stats.output.elasticsearch.bulk_requests.available` that represents the number of bulk indexers available for making bulk index requests;
(_if this value is equal to 0 indicates {es} backpressure_)
* `beats_stats.output.elasticsearch.bulk_requests.completed` that represents the number of already completed bulk requests;
* `beats_stats.metrics.output.elasticsearch.indexers.active` that represents the number of active bulk indexers that are concurrently processing batches;

See https://www.elastic.co/guide/en/beats/metricbeat/current/exported-fields-beat.html[{metricbeat} documentation] for the full list of exported metric fields.

One likely cause of excessive indexing pressure or rejected requests is undersized {es}, to mitigate try following the guide from https://www.elastic.co/guide/en/elasticsearch/reference/current/rejected-requests.html.
Alternatively try adjusting `max_retries` and `timeout` settings from <<apm-elasticsearch-output>> to reduce APM Server indexing pressure, note however that reducing retries may lead to a higher rate of dropped APM events.
One likely cause of excessive indexing pressure or rejected requests is undersized {es}. To mitigate this, follow the guidance in {ref}/rejected-requests.html[Rejected requests].
If scaling {es} resources up is not an option, you can try to workaround by adjusting `flush_bytes`, `flush_interval`, `max_retries` and `timeout` settings described in <<apm-elasticsearch-output,Configure the Elasticsearch output>> to reduce APM Server indexing pressure.
However, consider that increasing number of buffered documents and/or reducing retries may lead to a higher rate of dropped APM events.

0 comments on commit d76bb44

Please sign in to comment.