elastic · bmorelli25 · Aug 6, 2024 · Jul 28, 2024 · Jul 30, 2024 · Jul 31, 2024
@@ -0,0 +1,43 @@
+[[apm-performance-diagnostic]]
+=== APM Server performance diagnostic
+
+[[apm-es-backpressure]]
+[float]
+==== Diagnosing backpressure from {es}
+
+When {es} is under excessive load or indexing pressure, APM Server could experience the downstream backpressure when indexing new documents into {es}.
+Most commonly, backpressure from {es} will manifest itself in the form of higher indexing latency and/or rejected requests, which in return could lead APM Server to deny incoming requests.
+As a result APM agents connected to the affected APM Server will suffer from throttling and/or request timeout when shipping APM events. 
+
+To quickly identify possible issues try looking for similar error logs lines in APM Server logs:
+
+[source,json]
+----
+...
+{"log.level":"error","@timestamp":"2024-07-27T23:46:28.529Z","log.origin":{"function":"github.com/elastic/go-docappender/v2.(*Appender).flush","file.name":"[email protected]/appender.go","file.line":370},"message":"bulk indexing request failed","service.name":"apm-server","error":{"message":"flush failed (429): [429 Too Many Requests]"},"ecs.version":"1.6.0"}
+{"log.level":"error","@timestamp":"2024-07-27T23:55:38.612Z","log.origin":{"function":"github.com/elastic/go-docappender/v2.(*Appender).flush","file.name":"[email protected]/appender.go","file.line":370},"message":"bulk indexing request failed","service.name":"apm-server","error":{"message":"flush failed (503): [503 Service Unavailable]"},"ecs.version":"1.6.0"}
+...
+----
+
+To gain better insight into APM Server health and performance, consider enabling the monitoring feature by following the steps in <<apm-monitor-apm-self-install,Monitor a Fleet-managed APM Server>>.
+When enabled APM Server will additionally report a set of vital metrics to help you identify any performance degradation.
+
+Pay careful attention to the next metric fields:
+
+* `beats_stats.metrics.libbeat.output.events.active` that represents the number of buffered pending documents waiting for indexing;
+(_if this value is increasing rapidly it indicates {es} backpressure_)
+* `beats_stats.metrics.libbeat.output.events.acked` that represents the number of indexing operations that have completed successfully;
+* `beats_stats.metrics.libbeat.output.events.failed` that represents the number of indexing operations that failed, it includes all failures;
+(_if this value is increasing rapidly it indicates {es} backpressure_)
+* `beats_stats.metrics.libbeat.output.events.toomany` that represents the number of indexing operations that failed due to {es} responding with 429 Too many Requests;
+(_if this value is increasing rapidly it indicates {es} backpressure_)
+* `beats_stats.output.elasticsearch.bulk_requests.available` that represents the number of bulk indexers available for making bulk index requests;
+(_if this value is equal to 0 it indicates {es} backpressure_)
+* `beats_stats.output.elasticsearch.bulk_requests.completed` that represents the number of already completed bulk requests;
+* `beats_stats.metrics.output.elasticsearch.indexers.active` that represents the number of active bulk indexers that are concurrently processing batches;
+
+See https://www.elastic.co/guide/en/beats/metricbeat/current/exported-fields-beat.html[{metricbeat} documentation] for the full list of exported metric fields.
+
+One likely cause of excessive indexing pressure or rejected requests is undersized {es}. To mitigate this, follow the guidance in {ref}/rejected-requests.html[Rejected requests].
+If scaling {es} resources up is not an option, you can try to workaround by adjusting `flush_bytes`, `flush_interval`, `max_retries` and `timeout` settings described in <<apm-elasticsearch-output,Configure the Elasticsearch output>> to reduce APM Server indexing pressure.
+However, consider that increasing number of buffered documents and/or reducing retries may lead to a higher rate of dropped APM events.
@@ -9,6 +9,7 @@ and processing and performance guidance.
 * <<apm-common-response-codes>>
 * <<apm-processing-and-performance>>
 * <<apm-enable-apm-server-debugging>>
+* <<apm-performance-diagnostic>>
 
 For additional help with other APM components, see the links below.
 
@@ -54,4 +55,6 @@ include::apm-response-codes.asciidoc[]
 
 include::processing-performance.asciidoc[]
 
-include::{observability-docs-root}/docs/en/observability/apm/debugging.asciidoc[]
+include::{observability-docs-root}/docs/en/observability/apm/debugging.asciidoc[]
+
+include::apm-performance-diagnostic.asciidoc[]