Clarify documentation of beats output errors #27763

jsoriano · 2021-09-06T11:44:48Z

There are several metrics reporting output errors from the beat module, clarify their meaning in the docs, focusing on how worrisome these errors are for users.

For example it is not clear if a non-zero beat.stats.libbeat.output.write.errors implied some data loss, though it probably didn't if beat.stats.libbeat.output.events.dropped is zero.

For confirmed bugs, please report:

Discuss Forum URL: https://discuss.elastic.co/t/what-does-output-errors-mean-in-beats-monitoring-in-kibana/283365

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-09-06T11:44:49Z

Pinging @elastic/integrations (Team:Integrations)

elasticmachine · 2021-09-06T11:44:49Z

Pinging @elastic/stack-monitoring (Stack monitoring)

nkakouros · 2021-09-08T12:07:19Z

The user facing part of this (i.e. the kibana interface) could also benefit from a bit more clarification. For instance, in the image below, what should I be worried about? Do I have dropped events that got lost forever without being logged? How does "Output Errors" relate to "Fail Rates"?

tom-x1 · 2021-10-10T17:09:44Z

After reading the discussion and the docs I still don't know about the exact meaning of those metrics in the stack monitoring. I think we all agree that one of the most important questions a user has is: "Do I lose data or not?" I still can't answer this question. For example: I'm using packetbeat to gather DNS traffic and from time to time the "Failed in Pipeline" counter is jumping from 0 to 1000 and as stated above I'm asking myself: Do I lose those DNS queries or not?

Also I don't really get the difference between the two graphs "Fail Rates" and "Output Errors". To me it sounds like they are both doing the same, but they probably don't?

sachin-frayne · 2022-05-10T14:33:15Z

I would also like to know the meaning of these values, especially beat.stats.libbeat.output.events.dropped, if that is larger than 0 am I losing data?

6fears7 · 2022-10-06T13:27:05Z

Inside the code for Libbeat:

	//
	// Output event stats
	//
	batches *monitoring.Uint // total number of batches processed by output
	events  *monitoring.Uint // total number of events processed by output

	acked      *monitoring.Uint // total number of events ACKed by output
	failed     *monitoring.Uint // total number of events failed in output
	active     *monitoring.Uint // events sent and waiting for ACK/fail from output
	duplicates *monitoring.Uint // events sent and waiting for ACK/fail from output
	dropped    *monitoring.Uint // total number of invalid events dropped by the output
	tooMany    *monitoring.Uint // total number of too many requests replies from output

	//
	// Output network connection stats
	//
	writeBytes  *monitoring.Uint // total amount of bytes written by output
	writeErrors *monitoring.Uint // total number of errors on write

	readBytes  *monitoring.Uint // total amount of bytes read
	readErrors *monitoring.Uint // total number of errors while waiting for response on output
}

When you're querying in Elastic for results of Libbeat (see below), the Output Errors is derived from the measured delta between the initial timestamp's readErrors + writeErrors and the latest timestamp's readErrors + writeErrors. According to the code commentary then, Output Errors is the number of network packets experiencing errors.

The below example is utilizing apm-server as the beat type, but you can replace it to suit your needs.

GET _search

{
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "term": {
                    "data_stream.dataset": "beats.stats"
                  }
                },
                {
                  "term": {
                    "metricset.name": "stats"
                  }
                },
                {
                  "term": {
                    "type": "beats_stats"
                  }
                }
              ]
            }
          },
          {
            "term": {
              "cluster_uuid": "CLUSTER_UUID"
            }
          },
          {
            "range": {
              "beats_stats.timestamp": {
                "format": "epoch_millis",
                "gte": 1665053615330,
                "lte": 1665054515330
              }
            }
          },
          {
            "bool": {
              "must": {
                "term": {
                  "beats_stats.beat.type": "apm-server"
                }
              }
            }
          }
        ]
      }
    },
    "collapse": {
      "field": "beats_stats.metrics.beat.info.ephemeral_id",
      "inner_hits": {
        "name": "earliest",
        "size": 1,
        "sort": [
          {
            "beats_stats.timestamp": {
              "order": "asc",
              "unmapped_type": "long"
            }
          },
          {
            "@timestamp": {
              "order": "asc",
              "unmapped_type": "long"
            }
          }
        ]
      }
    },
    "sort": [
      {
        "beats_stats.beat.uuid": {
          "order": "asc",
          "unmapped_type": "long"
        }
      },
      {
        "timestamp": {
          "order": "desc",
          "unmapped_type": "long"
        }
      }
    ]
  }

zez3 · 2022-10-06T20:32:42Z

According to the code commentary then, Output Errors is the number of network packets experiencing errors.

@6fears7
So, we have event loss or not?
I don't have any other errors, like 429

zez3 · 2022-10-06T20:36:06Z

I've added this beat.stats.libbeat.output.events.dropped to my Agent dashboard

I get an daily alarm telling me that,

eg.

monitor filebeat drops - * is in a state of ALERT

Reason:
beat.stats.libbeat.output.events.dropped is 3,313,220.83422 in the last 1 day for all hosts. Alert when > 0.

That 3 mil./day events are kind of a lot from my customer point of view

tom-x1 · 2022-10-07T05:23:00Z

@6fears7
Well that still does not answer if I have lost events or not. An "error" doesn't necessarily mean that I have lost events, I don't know if after those errors occur some kind of mechanism tries to process those faulty events again or not. Telling the users just "Error" without clearly letting them know the consequences of it does not help. A graph which says "Events lost" would be clear and helpful in my opinion.

6fears7 · 2022-10-07T12:58:04Z

I've added this beat.stats.libbeat.output.events.dropped to my Agent dashboard
* I get an daily alarm telling me that,
eg.
monitor filebeat drops - * is in a state of ALERT

Reason:
beat.stats.libbeat.output.events.dropped is 3,313,220.83422 in the last 1 day for all hosts. Alert when > 0.
That 3 mil./day events are kind of a lot from my customer point of view

In client.go, we are given an initial struct of:

type bulkResultStats struct {
	acked        int // number of events ACKed by Elasticsearch
	duplicates   int // number of events failed with `create` due to ID already being indexed
	fails        int // number of failed events (can be retried)
	nonIndexable int // number of failed events (not indexable)
	tooMany      int // number of events receiving HTTP 429 Too Many Requests
}

Later, we see what constitutes a drop:

failed := len(failedEvents)
	span.Context.SetLabel("events_failed", failed)
	if st := client.observer; st != nil {
		dropped := stats.nonIndexable 
		duplicates := stats.duplicates
		acked := len(data) - failed - dropped - duplicates

		st.Acked(acked)
		st.Failed(failed)
		st.Dropped(dropped)
		st.Duplicate(duplicates)
		st.ErrTooMany(stats.tooMany)
	}

So dropped events would be those of the nonIndexable type.

In order to determine what a "nonIndexable" type is, the code iterates through the Bulk results:

... 
		if status < 500 { 
			if status == http.StatusTooManyRequests {
				stats.tooMany++
			} else {
				// hard failure, apply policy action
				result, _ := data[i].Content.Meta.HasKey(dead_letter_marker_field)
				if result {
					stats.nonIndexable++
					client.log.Errorf("Can't deliver to dead letter index event %#v (status=%v): %s", data[i], status, msg)
					// poison pill - this will clog the pipeline if the underlying failure is non transient.
				} else if client.NonIndexableAction == dead_letter_index {
					client.log.Warnf("Cannot index event %#v (status=%v): %s, trying dead letter index", data[i], status, msg)
					if data[i].Content.Meta == nil {
						data[i].Content.Meta = common.MapStr{
							dead_letter_marker_field: true,
						}
					} else {
						data[i].Content.Meta.Put(dead_letter_marker_field, true)
					}
					data[i].Content.Fields = common.MapStr{
						"message":       data[i].Content.Fields.String(),
						"error.type":    status,
						"error.message": string(msg),
					}
				} else { // drop
					stats.nonIndexable++
					client.log.Warnf("Cannot index event %#v (status=%v): %s, dropping event!", data[i], status, msg)
					continue
				}

So any event that cannot be indexed (as an example here) will become tagged under the dead_letter_marker_field and dropped from the pipeline.

To answer your question, I do believe that those events are worth looking into. My first guess would be to check how the pipeline is being parsed and if there's some Grok that needs to be done.

zez3 · 2022-10-09T05:11:41Z

Thank you for clarifying @6fears7

I use official pipelines from Fleet integrations but also custom ones. I also use dissect and the rest of filebeat processor.(dns is still missing from elasticsearch processor)
@sachin-frayne
Can you help now to further identify those events and/or check the pipelines? as sugested above.

zez3 · 2022-10-09T05:30:09Z

Here is what I see in my Agent filbeat metrics

{
"_index": ".ds-metrics-elastic_agent.filebeat-ece-2022.09.28-000017",
"_id": "H4Apu4MBF-Kn07JyC-C-",
"_version": 1,
"_score": 0,
"_source": {
"agent": {
"name": "my_host.my.dom",
"id": "4ba5a4aa-848d-4435-a13e-ace9584cddaa",
"ephemeral_id": "2a66e283-d27a-433b-8b39-ae7105878348",
"type": "metricbeat",
"version": "8.4.2"
},
"@timestamp": "2022-10-09T05:12:51.923Z",
"ecs": {
"version": "8.0.0"
},
"service": {
"address": "http://unix/stats",
"name": "beat",
"type": "beat"
},
"data_stream": {
"namespace": "ece",
"type": "metrics",
"dataset": "elastic_agent.filebeat"
},
"beat": {
"elasticsearch": {
"cluster": {
"id": "tOaF05W8QCG1Q3z25K4WZQ"
}
},
"stats": {
"handles": {
"limit": {
"hard": 524288,
"soft": 1024
},
"open": 148
},
"system": {
"load": {
"1": 7.16,
"5": 8.6,
"15": 8.53,
"norm": {
"1": 0.0559,
"5": 0.0672,
"15": 0.0666
}
},
"cpu": {
"cores": 128
}
},
"beat": {
"host": "my_host.my.dom",
"name": "my_host.my.dom",
"type": "filebeat",
"uuid": "a97378a4-c48f-45c2-901b-7b8b5f6b6502",
"version": "8.4.2"
},
"runtime": {
"goroutines": 289
},
"cpu": {
"total": {
"ticks": 13560390,
"time": {
"ms": 13560390
},
"value": 13560390
},
"system": {
"ticks": 1185830,
"time": {
"ms": 1185830
}
},
"user": {
"ticks": 12374560,
"time": {
"ms": 12374560
}
}
},
"memstats": {
"memory": {
"total": 1581247076272,
"alloc": 286642512
},
"rss": 546992128,
"gc_next": 322126080
},
"libbeat": {
"output": {
"read": {
"bytes": 515468121,
"errors": 0
},
"type": "elasticsearch",
"write": {
"bytes": 33985770829,
"errors": 0
},
"events": {
"batches": 479155,
"duplicates": 71111,
"total": 23936376,
"dropped": 23671,
"toomany": 0,
"active": 50,
"failed": 0,
"acked": 23841544
}
},
"pipeline": {
"clients": 20,
"queue": {
"acked": 23936326
},
"events": {
"total": 23940457,
"filtered": 12,
"dropped": 0,
"active": 4119,
"failed": 0,
"published": 23940442,
"retry": 2048
}
},
"config": {
"running": 0,
"stops": 0,
"starts": 20,
"reloads": 0
}
},
"cgroup": {
"memory": {
"mem": {
"usage": {
"bytes": 2462949376
},
"limit": {
"bytes": 53687091200
}
},
"id": "elastic-agent.service"
},
"cpu": {
"cfs": {
"period": {
"us": 100000
},
"quota": {
"us": 0
}
},
"stats": {
"periods": 0,
"throttled": {
"ns": 0,
"periods": 0
}
},
"id": "elastic-agent.service"
},
"cpuacct": {
"total": {
"ns": 4050343270691067
},
"id": "elastic-agent.service"
}
},
"uptime": {
"ms": 27149727
},
"info": {
"name": "filebeat",
"ephemeral_id": "09bd12a2-0e46-4f6a-9c34-cf67b4a8ae42",
"version": "8.4.2",
"uptime": {
"ms": 27149727
}
}
},
"id": "a97378a4-c48f-45c2-901b-7b8b5f6b6502",
"type": "filebeat"
},
"host": {
"hostname": "my_host.my.dom",
"os": {
"kernel": "5.4.0-104-generic",
"codename": "focal",
"name": "Ubuntu",
"type": "linux",
"family": "debian",
"version": "20.04.5 LTS (Focal Fossa)",
"platform": "ubuntu"
},
"containerized": false,
"ip": [
"fe80::1ac0:4dff:fe7a:5796",
"1.1.1.1",
"fe80::ba59:9fff:fed2:222",
"1.1.1.1",
"192.168.43.1",
"fe80::42:73ff:fe00:5c0a",
"10.0.3.1",
"fe80::c5e:4ff:feff:61bb",
"fe80::b4c7:89ff:fe39:ef4d",
"fe80::8014:f6ff:fe19:bd2a",
"fe80::f49b:71ff:fef1:1993",
"fe80::9454:a4ff:fee7:cdbf",
"fe80::dcd6:6ff:fee9:c14d",
"fe80::34fa:29ff:fe12:af8a",
"fe80::866:4ff:fe83:8873",
"fe80::7059:d7ff:fe3d:76a4",
"fe80::446d:53ff:febc:ede3",
"fe80::841e:e9ff:feda:42e",
"fe80::6086:33ff:feee:6f81",
"fe80::d45a:b8ff:fe91:c903",
"fe80::14a4:11ff:fed1:a6a5",
"fe80::dc56:88ff:fe6d:a3ba",
"fe80::2c5d:f2ff:fec6:b0eb",
"fe80::2803:d9ff:fe72:8edd",
"fe80::8ce0:aff:fef4:f9df",
"fe80::2c9d:77ff:fed3:b27a",
"fe80::8875:6ff:fe86:64f9",
"fe80::bc5e:83ff:feb7:348a",
"fe80::d483:27ff:fecb:8415",
"fe80::c0ba:99ff:fe90:cedf",
"fe80::ac20:56ff:fe39:da07",
"fe80::7c8b:dfff:fe76:c1ee",
"fe80::7016:58ff:fef8:e362",
"fe80::5482:78ff:fefc:2464",
"fe80::586d:18ff:fe90:f306"
],
"name": "my_host.my.dom",
"id": "06d09b925a0246dcbd044c0036e01c9c",
"mac": [
"00:16:3e:00:00:00",
"02:42:73:00:5c:0a",
"0a:66:04:83:88:73",
"0e:5e:04:ff:61:bb",
"16:a4:11:d1:a6:a5",
"18:c0:4d:7a:57:96",
"18:c0:4d:7a:57:97",
"2a:03:d9:72:8e:dd",
"2a:f8:b1:d1:37:35",
"2e:5d:f2:c6:b0:eb",
"2e:9d:77:d3:b2:7a",
"36:fa:29:12:af:8a",
"46:6d:53:bc:ed:e3",
"56:82:78:fc:24:64",
"5a:6d:18:90:f3:06",
"62:86:33:ee:6f:81",
"72:16:58:f8:e3:62",
"72:59:d7:3d:76:a4",
"7e:8b:df:76:c1:ee",
"82:14:f6:19:bd:2a",
"86:1e:e9:da:04:2e",
"8a:75:06:86:64:f9",
"8e:e0:0a:f4:f9:df",
"96:54:a4:e7:cd:bf",
"ae:20:56:39:da:07",
"b6:c7:89:39:ef:4d",
"b8:59:9f:d2:02:22",
"be:5e:83:b7:34:8a",
"c2:ba:99:90:ce:df",
"d6:5a:b8:91:c9:03",
"d6:83:27:cb:84:15",
"de:56:88:6d:a3:ba",
"de:d6:06:e9:c1:4d",
"f6:9b:71:f1:19:93"
],
"architecture": "x86_64"
},
"elastic_agent": {
"id": "4ba5a4aa-848d-4435-a13e-ace9584cddaa",
"version": "8.4.2",
"snapshot": false
},
"metricset": {
"period": 10000,
"name": "stats"
},
"event": {
"duration": 28331585,
"agent_id_status": "verified",
"ingested": "2022-10-09T05:12:52Z",
"module": "beat",
"dataset": "elastic_agent.filebeat"
}
},
"fields": {
"beat.stats.cgroup.cpu.id": [
"elastic-agent.service"
],
"beat.stats.libbeat.output.write.bytes": [
33985770829
],
"beat.stats.libbeat.pipeline.events.failed": [
0
],
"elastic_agent.version": [
"8.4.2"
],
"beat.stats.libbeat.pipeline.events.published": [
23940442
],
"host.os.name.text": [
"Ubuntu"
],
"host.hostname": [
"my_host.my.dom"
],
"host.mac": [
"00:16:3e:00:00:00",
"02:42:73:00:5c:0a",
"0a:66:04:83:88:73",
"0e:5e:04:ff:61:bb",
"16:a4:11:d1:a6:a5",
"18:c0:4d:7a:57:96",
"18:c0:4d:7a:57:97",
"2a:03:d9:72:8e:dd",
"2a:f8:b1:d1:37:35",
"2e:5d:f2:c6:b0:eb",
"2e:9d:77:d3:b2:7a",
"36:fa:29:12:af:8a",
"46:6d:53:bc:ed:e3",
"56:82:78:fc:24:64",
"5a:6d:18:90:f3:06",
"62:86:33:ee:6f:81",
"72:16:58:f8:e3:62",
"72:59:d7:3d:76:a4",
"7e:8b:df:76:c1:ee",
"82:14:f6:19:bd:2a",
"86:1e:e9:da:04:2e",
"8a:75:06:86:64:f9",
"8e:e0:0a:f4:f9:df",
"96:54:a4:e7:cd:bf",
"ae:20:56:39:da:07",
"b6:c7:89:39:ef:4d",
"b8:59:9f:d2:02:22",
"be:5e:83:b7:34:8a",
"c2:ba:99:90:ce:df",
"d6:5a:b8:91:c9:03",
"d6:83:27:cb:84:15",
"de:56:88:6d:a3:ba",
"de:d6:06:e9:c1:4d",
"f6:9b:71:f1:19:93"
],
"beat.stats.cpu.user.time.ms": [
12374560
],
"service.type": [
"beat"
],
"beat.stats.beat.type": [
"filebeat"
],
"beat.stats.libbeat.config.starts": [
20
],
"host.os.version": [
"20.04.5 LTS (Focal Fossa)"
],
"beat.stats.cgroup.cpu.cfs.quota.us": [
0
],
"agent.name": [
"my_host.my.dom"
],
"beat.stats.libbeat.config.stops": [
0
],
"beat.stats.cgroup.memory.id": [
"elastic-agent.service"
],
"event.agent_id_status": [
"verified"
],
"beat.stats.cpu.total.value": [
13560390
],
"beat.stats.libbeat.output.events.duplicates": [
71111
],
"host.os.type": [
"linux"
],
"beat.stats.libbeat.output.events.batches": [
479155
],
"beat.stats.libbeat.pipeline.events.filtered": [
12
],
"host.architecture": [
"x86_64"
],
"agent.id": [
"4ba5a4aa-848d-4435-a13e-ace9584cddaa"
],
"beat.stats.cpu.user.ticks": [
12374560
],
"beat.stats.cgroup.cpu.stats.periods": [
0
],
"host.containerized": [
false
],
"service.address": [
"http://unix/stats"
],
"beat.stats.libbeat.output.events.active": [
50
],
"beat.stats.libbeat.pipeline.events.total": [
23940457
],
"beat.stats.cpu.total.ticks": [
13560390
],
"beat.stats.info.version": [
"8.4.2"
],
"beat.stats.handles.limit.hard": [
524288
],
"beat.stats.runtime.goroutines": [
289
],
"beat.stats.beat.name": [
"my_host.my.dom"
],
"beat.stats.beat.version": [
"8.4.2"
],
"beat.stats.libbeat.pipeline.queue.acked": [
23936326
],
"beat.stats.libbeat.config.running": [
0
],
"host.ip": [
"fe80::1ac0:4dff:fe7a:5796",
"1.1.1.1",
"fe80::ba59:9fff:fed2:222",
"1.1.1.1",
"192.168.43.1",
"fe80::42:73ff:fe00:5c0a",
"10.0.3.1",
"fe80::c5e:4ff:feff:61bb",
"fe80::b4c7:89ff:fe39:ef4d",
"fe80::8014:f6ff:fe19:bd2a",
"fe80::f49b:71ff:fef1:1993",
"fe80::9454:a4ff:fee7:cdbf",
"fe80::dcd6:6ff:fee9:c14d",
"fe80::34fa:29ff:fe12:af8a",
"fe80::866:4ff:fe83:8873",
"fe80::7059:d7ff:fe3d:76a4",
"fe80::446d:53ff:febc:ede3",
"fe80::841e:e9ff:feda:42e",
"fe80::6086:33ff:feee:6f81",
"fe80::d45a:b8ff:fe91:c903",
"fe80::14a4:11ff:fed1:a6a5",
"fe80::dc56:88ff:fe6d:a3ba",
"fe80::2c5d:f2ff:fec6:b0eb",
"fe80::2803:d9ff:fe72:8edd",
"fe80::8ce0:aff:fef4:f9df",
"fe80::2c9d:77ff:fed3:b27a",
"fe80::8875:6ff:fe86:64f9",
"fe80::bc5e:83ff:feb7:348a",
"fe80::d483:27ff:fecb:8415",
"fe80::c0ba:99ff:fe90:cedf",
"fe80::ac20:56ff:fe39:da07",
"fe80::7c8b:dfff:fe76:c1ee",
"fe80::7016:58ff:fef8:e362",
"fe80::5482:78ff:fefc:2464",
"fe80::586d:18ff:fe90:f306"
],
"agent.type": [
"metricbeat"
],
"beat.stats.libbeat.output.read.errors": [
0
],
"beat.stats.memstats.gc_next": [
322126080
],
"elastic_agent.snapshot": [
false
],
"beat.id": [
"a97378a4-c48f-45c2-901b-7b8b5f6b6502"
],
"host.id": [
"06d09b925a0246dcbd044c0036e01c9c"
],
"beat.stats.system.load.15": [
8.53
],
"beat.stats.libbeat.pipeline.clients": [
20
],
"elastic_agent.id": [
"4ba5a4aa-848d-4435-a13e-ace9584cddaa"
],
"beat.stats.info.uptime.ms": [
27149727
],
"beat.stats.cgroup.cpuacct.total.ns": [
4050343270691067
],
"metricset.period": [
10000
],
"host.os.codename": [
"focal"
],
"beat.stats.libbeat.output.events.dropped": [
23671
],
"event.duration": [
28331585
],
"beat.stats.system.cpu.cores": [
128
],
"event.ingested": [
"2022-10-09T05:12:52.000Z"
],
"@timestamp": [
"2022-10-09T05:12:51.923Z"
],
"host.os.platform": [
"ubuntu"
],
"data_stream.dataset": [
"elastic_agent.filebeat"
],
"beat.stats.system.load.norm.15": [
0.0666
],
"agent.ephemeral_id": [
"2a66e283-d27a-433b-8b39-ae7105878348"
],
"beat.stats.handles.limit.soft": [
1024
],
"beat.stats.info.name": [
"filebeat"
],
"beat.type": [
"filebeat"
],
"beat.stats.libbeat.config.reloads": [
0
],
"beat.stats.libbeat.output.events.acked": [
23841544
],
"beat.stats.beat.uuid": [
"a97378a4-c48f-45c2-901b-7b8b5f6b6502"
],
"host.os.name": [
"Ubuntu"
],
"beat.stats.uptime.ms": [
27149727
],
"beat.stats.libbeat.output.type": [
"elasticsearch"
],
"host.name": [
"my_host.my.dom"
],
"beat.stats.libbeat.output.events.toomany": [
0
],
"beat.stats.libbeat.output.events.failed": [
0
],
"beat.stats.libbeat.pipeline.events.dropped": [
0
],
"beat.stats.beat.host": [
"idsece02.log.unibe.ch"
],
"data_stream.type": [
"metrics"
],
"ecs.version": [
"8.0.0"
],
"beat.stats.cgroup.cpu.cfs.period.us": [
100000
],
"agent.version": [
"8.4.2"
],
"host.os.family": [
"debian"
],
"beat.stats.cpu.total.time.ms": [
13560390
],
"beat.stats.libbeat.output.write.errors": [
0
],
"beat.stats.memstats.memory.total": [
1581247076272
],
"beat.stats.handles.open": [
148
],
"beat.stats.libbeat.output.events.total": [
23936376
],
"beat.stats.memstats.memory.alloc": [
286642512
],
"beat.stats.system.load.norm.5": [
0.0672
],
"beat.stats.system.load.norm.1": [
0.0559
],
"beat.stats.libbeat.pipeline.events.active": [
4119
],
"event.module": [
"beat"
],
"host.os.kernel": [
"5.4.0-104-generic"
],
"beat.stats.cgroup.cpu.stats.throttled.ns": [
0
],
"beat.stats.info.ephemeral_id": [
"09bd12a2-0e46-4f6a-9c34-cf67b4a8ae42"
],
"beat.stats.cgroup.cpu.stats.throttled.periods": [
0
],
"beat.stats.cpu.system.time.ms": [
1185830
],
"service.name": [
"beat"
],
"beat.elasticsearch.cluster.id": [
"tOaF05W8QCG1Q3z25K4WZQ"
],
"data_stream.namespace": [
"ece"
],
"beat.stats.system.load.1": [
7.16
],
"beat.stats.cgroup.memory.mem.usage.bytes": [
2462949376
],
"beat.stats.cpu.system.ticks": [
1185830
],
"beat.stats.cgroup.memory.mem.limit.bytes": [
53687091200
],
"beat.stats.libbeat.output.read.bytes": [
515468121
],
"beat.stats.system.load.5": [
8.6
],
"beat.stats.cgroup.cpuacct.id": [
"elastic-agent.service"
],
"metricset.name": [
"stats"
],
"beat.stats.libbeat.pipeline.events.retry": [
2048
],
"beat.stats.memstats.rss": [
546992128
],
"event.dataset": [
"elastic_agent.filebeat"
]
}
}

my issue is with:

"beat.stats.libbeat.output.events.dropped": [
23671
],

and not with

"beat.stats.libbeat.pipeline.events.dropped": [
0
],

@6fears7
Does this mean that my events are not dropped in the pipelines?

zez3 · 2022-10-09T07:23:50Z

So, I get a lot of such messages:

Click to expand!

{
"_index": ".ds-logs-elastic_agent.filebeat-ece-2022.09.28-000029",
"_id": "u0iIu4MBozNqveEQYO8z",
"_version": 1,
"_score": 0,
"_source": {
"container": {
"id": "elastic-agent-d3eb3e"
},
"agent": {
"name": "myhost.my.dom",
"id": "7b48d297-3b7b-4993-9525-8d398719e062",
"type": "filebeat",
"ephemeral_id": "b841ef51-a943-47b0-b224-2aa86de88024",
"version": "8.4.2"
},
"service.name": "filebeat",
"log": {
"file": {
"path": "/opt/Elastic/Agent/data/elastic-agent-d3eb3e/logs/default/filebeat-20221009-8.ndjson"
},
"offset": 378916652
},
"elastic_agent": {
"id": "7b48d297-3b7b-4993-9525-8d398719e062",
"version": "8.4.2",
"snapshot": false
},
"message": "Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.October, 9, 8, 56, 53, 910440446, time.Local), Meta:{"input_id":"udp-fortinet_fortigate-4cd23f99-4dfb-401f-817f-5ef6f9d4ca05","raw_index":"logs-fortinet_fortigate.log-ece_forti","stream_id":"udp-fortinet_fortigate.log-4cd23f99-4dfb-401f-817f-5ef6f9d4ca05","truncated":false}, Fields:{"agent":{"ephemeral_id":"f754f48a-8030-41f9-ae3f-dd4e7a8edbd9","id":"7b48d297-3b7b-4993-9525-8d398719e062","name":"myhost.my.dom","type":"filebeat","version":"8.4.2"},"data_stream":{"dataset":"fortinet_fortigate.log","namespace":"ece_forti","type":"logs"},"ecs":{"version":"8.0.0"},"elastic_agent":{"id":"7b48d297-3b7b-4993-9525-8d398719e062","snapshot":false,"version":"8.4.2"},"event":{"dataset":"fortinet_fortigate.log"},"input":{"type":"udp"},"log":{"source":{"address":"1.1.1.1:24676"}},"message":"\u003c190\u003edate=2022-10-09 time=08:56:53 devname=\"UniLabFW\" devid=\"FG100FTK20025470\" eventtime=1665298613860494320 tz=\"+0200\" logid=\"0112053203\" type=\"event\" subtype=\"connector\" level=\"information\" vd=\"Border\" logdesc=\"Dynamic address updated.\" addr=\"FCTEMS_ALL_FORTICLOUD_SERVERS\" msg=\"Updated tag FCTEMS_ALL_FORTICLOUD_SERVERS\"","tags":["fortinet-fortigate","fortinet-firewall","forwarded"]}, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field [fortinet.firewall.addr] of type [ip] in document with id 'VkiIu4MBozNqveEQSZDG'. Preview of field's value: 'FCTEMS_ALL_FORTICLOUD_SERVERS'","caused_by":{"type":"illegal_argument_exception","reason":"'FCTEMS_ALL_FORTICLOUD_SERVERS' is not an IP string literal."}}, dropping event!",
"log.logger": "elasticsearch",
"log.origin": {
"file.line": 429,
"file.name": "elasticsearch/client.go"
},
"input": {
"type": "filestream"
},
"@timestamp": "2022-10-09T06:56:54.742Z",
"ecs": {
"version": "8.0.0"
},
"data_stream": {
"namespace": "ece",
"type": "logs",
"dataset": "elastic_agent.filebeat"
},
"host": {
"hostname": "myhost.my.dom",
"os": {
"kernel": "5.4.0-125-generic",
"codename": "focal",
"name": "Ubuntu",
"type": "linux",
"family": "debian",
"version": "20.04.5 LTS (Focal Fossa)",
"platform": "ubuntu"
},
"containerized": false,
"ip": [
"fe80::1ac0:4dff:fe7a:560a",
"1.1.1.1",
"fe80::ba59:9fff:fed2:4e6",
"10.0.3.1",
"192.168.43.1",
"fe80::42:d3ff:fe9e:1acf",
"fe80::602f:cbff:fee8:aba1",
"fe80::b09f:c2ff:feda:9a3f",
"fe80::5c79:49ff:febe:1ace",
"fe80::d850:fbff:fe99:98fe",
"fe80::f872:d1ff:fece:32cc",
"fe80::ccd5:2dff:fe5b:32f8",
"fe80::a6:65ff:fe4d:3a83",
"fe80::80c8:c2ff:fe88:3e9d",
"fe80::88a3:97ff:feb0:d411",
"fe80::6c0e:2ff:fe33:8663",
"fe80::e8cd:80ff:fe0b:94e8",
"fe80::d429:7eff:fe77:efdf",
"fe80::b882:f4ff:fefa:32dc",
"fe80::cc7d:73ff:fe22:422f",
"fe80::f0cc:c5ff:fe65:d8ec",
"fe80::82:47ff:fe79:a2f9",
"fe80::680e:d3ff:fe45:9e6",
"fe80::2490:2aff:fe9f:1157",
"fe80::944b:eaff:fef8:1ce6",
"fe80::b4c8:fdff:fe59:6a6c",
"fe80::45b:edff:fe5d:305e",
"fe80::301b:16ff:fe60:7713",
"fe80::a42d:eff:fe18:442a",
"fe80::64cb:b9ff:fe97:6305",
"fe80::600b:ebff:fedb:2e4f",
"fe80::dc91:3bff:fe29:b75b",
"fe80::5042:7eff:fe93:f008",
"fe80::605f:84ff:fea7:9128"
],
"name": "myhost.my.dom",
"id": "aa3d81edad7148488d332274f0ec28e1",
"mac": [
"00:16:3e:00:00:00",
"02:42:d3:9e:1a:cf",
"02:82:47:79:a2:f9",
"02:a6:65:4d:3a:83",
"06:5b:ed:5d:30:5e",
"18:c0:4d:7a:56:0a",
"18:c0:4d:7a:56:0b",
"26:90:2a:9f:11:57",
"32:1b:16:60:77:13",
"52:42:7e:93:f0:08",
"5e:79:49:be:1a:ce",
"62:0b:eb:db:2e:4f",
"62:2f:cb:e8:ab:a1",
"62:5f:84:a7:91:28",
"66:cb:b9:97:63:05",
"68:05:ca:13:70:e2",
"68:05:ca:13:70:e3",
"6a:0e:d3:45:09:e6",
"6e:0e:02:33:86:63",
"82:c8:c2:88:3e:9d",
"8a:a3:97:b0:d4:11",
"96:4b:ea:f8:1c:e6",
"a6:2d:0e:18:44:2a",
"b2:9f:c2:da:9a:3f",
"b6:c8:fd:59:6a:6c",
"b8:59:9f:d2:04:e6",
"ba:82:f4:fa:32:dc",
"ce:7d:73:22:42:2f",
"ce:d5:2d:5b:32:f8",
"d6:29:7e:77:ef:df",
"da:50:fb:99:98:fe",
"de:91:3b:29:b7:5b",
"ea:cd:80:0b:94:e8",
"f2:cc:c5:65:d8:ec",
"fa:72:d1:ce:32:cc"
],
"architecture": "x86_64"
},
"log.level": "warn",
"event": {
"agent_id_status": "verified",
"ingested": "2022-10-09T06:57:00Z",
"dataset": "elastic_agent.filebeat"
}
},
"fields": {
"elastic_agent.version": [
"8.4.2"
],
"host.os.name.text": [
"Ubuntu"
],
"host.hostname": [
"myhost.my.dom"
],
"host.mac": [
"00:16:3e:00:00:00",
"02:42:d3:9e:1a:cf",
"02:82:47:79:a2:f9",
"02:a6:65:4d:3a:83",
"06:5b:ed:5d:30:5e",
"18:c0:4d:7a:56:0a",
"18:c0:4d:7a:56:0b",
"26:90:2a:9f:11:57",
"32:1b:16:60:77:13",
"52:42:7e:93:f0:08",
"5e:79:49:be:1a:ce",
"62:0b:eb:db:2e:4f",
"62:2f:cb:e8:ab:a1",
"62:5f:84:a7:91:28",
"66:cb:b9:97:63:05",
"68:05:ca:13:70:e2",
"68:05:ca:13:70:e3",
"6a:0e:d3:45:09:e6",
"6e:0e:02:33:86:63",
"82:c8:c2:88:3e:9d",
"8a:a3:97:b0:d4:11",
"96:4b:ea:f8:1c:e6",
"a6:2d:0e:18:44:2a",
"b2:9f:c2:da:9a:3f",
"b6:c8:fd:59:6a:6c",
"b8:59:9f:d2:04:e6",
"ba:82:f4:fa:32:dc",
"ce:7d:73:22:42:2f",
"ce:d5:2d:5b:32:f8",
"d6:29:7e:77:ef:df",
"da:50:fb:99:98:fe",
"de:91:3b:29:b7:5b",
"ea:cd:80:0b:94:e8",
"f2:cc:c5:65:d8:ec",
"fa:72:d1:ce:32:cc"
],
"log.logger": [
"elasticsearch"
],
"container.id": [
"elastic-agent-d3eb3e"
],
"host.ip": [
"fe80::1ac0:4dff:fe7a:560a",
"1.1.1.1",
"fe80::ba59:9fff:fed2:4e6",
"10.0.3.1",
"192.168.43.1",
"fe80::42:d3ff:fe9e:1acf",
"fe80::602f:cbff:fee8:aba1",
"fe80::b09f:c2ff:feda:9a3f",
"fe80::5c79:49ff:febe:1ace",
"fe80::d850:fbff:fe99:98fe",
"fe80::f872:d1ff:fece:32cc",
"fe80::ccd5:2dff:fe5b:32f8",
"fe80::a6:65ff:fe4d:3a83",
"fe80::80c8:c2ff:fe88:3e9d",
"fe80::88a3:97ff:feb0:d411",
"fe80::6c0e:2ff:fe33:8663",
"fe80::e8cd:80ff:fe0b:94e8",
"fe80::d429:7eff:fe77:efdf",
"fe80::b882:f4ff:fefa:32dc",
"fe80::cc7d:73ff:fe22:422f",
"fe80::f0cc:c5ff:fe65:d8ec",
"fe80::82:47ff:fe79:a2f9",
"fe80::680e:d3ff:fe45:9e6",
"fe80::2490:2aff:fe9f:1157",
"fe80::944b:eaff:fef8:1ce6",
"fe80::b4c8:fdff:fe59:6a6c",
"fe80::45b:edff:fe5d:305e",
"fe80::301b:16ff:fe60:7713",
"fe80::a42d:eff:fe18:442a",
"fe80::64cb:b9ff:fe97:6305",
"fe80::600b:ebff:fedb:2e4f",
"fe80::dc91:3bff:fe29:b75b",
"fe80::5042:7eff:fe93:f008",
"fe80::605f:84ff:fea7:9128"
],
"agent.type": [
"filebeat"
],
"host.os.version": [
"20.04.5 LTS (Focal Fossa)"
],
"host.os.kernel": [
"5.4.0-125-generic"
],
"host.os.name": [
"Ubuntu"
],
"log.level": [
"warn"
],
"agent.name": [
"myhost.my.dom"
],
"elastic_agent.snapshot": [
false
],
"host.name": [
"myhost.my.dom"
],
"event.agent_id_status": [
"verified"
],
"host.id": [
"aa3d81edad7148488d332274f0ec28e1"
],
"log.origin.file.line": [
429
],
"service.name": [
"filebeat"
],
"host.os.type": [
"linux"
],
"elastic_agent.id": [
"7b48d297-3b7b-4993-9525-8d398719e062"
],
"data_stream.namespace": [
"ece"
],
"host.os.codename": [
"focal"
],
"input.type": [
"filestream"
],
"log.offset": [
378916652
],
"message": [
"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.October, 9, 8, 56, 53, 910440446, time.Local), Meta:{"input_id":"udp-fortinet_fortigate-4cd23f99-4dfb-401f-817f-5ef6f9d4ca05","raw_index":"logs-fortinet_fortigate.log-ece_forti","stream_id":"udp-fortinet_fortigate.log-4cd23f99-4dfb-401f-817f-5ef6f9d4ca05","truncated":false}, Fields:{"agent":{"ephemeral_id":"f754f48a-8030-41f9-ae3f-dd4e7a8edbd9","id":"7b48d297-3b7b-4993-9525-8d398719e062","name":"myhost.my.dom","type":"filebeat","version":"8.4.2"},"data_stream":{"dataset":"fortinet_fortigate.log","namespace":"ece_forti","type":"logs"},"ecs":{"version":"8.0.0"},"elastic_agent":{"id":"7b48d297-3b7b-4993-9525-8d398719e062","snapshot":false,"version":"8.4.2"},"event":{"dataset":"fortinet_fortigate.log"},"input":{"type":"udp"},"log":{"source":{"address":"1.1.1.1:24676"}},"message":"\u003c190\u003edate=2022-10-09 time=08:56:53 devname=\"UniLabFW\" devid=\"FG100FTK20025470\" eventtime=1665298613860494320 tz=\"+0200\" logid=\"0112053203\" type=\"event\" subtype=\"connector\" level=\"information\" vd=\"Border\" logdesc=\"Dynamic address updated.\" addr=\"FCTEMS_ALL_FORTICLOUD_SERVERS\" msg=\"Updated tag FCTEMS_ALL_FORTICLOUD_SERVERS\"","tags":["fortinet-fortigate","fortinet-firewall","forwarded"]}, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field [fortinet.firewall.addr] of type [ip] in document with id 'VkiIu4MBozNqveEQSZDG'. Preview of field's value: 'FCTEMS_ALL_FORTICLOUD_SERVERS'","caused_by":{"type":"illegal_argument_exception","reason":"'FCTEMS_ALL_FORTICLOUD_SERVERS' is not an IP string literal."}}, dropping event!"
],
"data_stream.type": [
"logs"
],
"host.architecture": [
"x86_64"
],
"event.ingested": [
"2022-10-09T06:57:00.000Z"
],
"@timestamp": [
"2022-10-09T06:56:54.742Z"
],
"log.origin.file.name": [
"elasticsearch/client.go"
],
"agent.id": [
"7b48d297-3b7b-4993-9525-8d398719e062"
],
"host.containerized": [
false
],
"ecs.version": [
"8.0.0"
],
"host.os.platform": [
"ubuntu"
],
"data_stream.dataset": [
"elastic_agent.filebeat"
],
"log.file.path": [
"/opt/Elastic/Agent/data/elastic-agent-d3eb3e/logs/default/filebeat-20221009-8.ndjson"
],
"agent.ephemeral_id": [
"b841ef51-a943-47b0-b224-2aa86de88024"
],
"agent.version": [
"8.4.2"
],
"host.os.family": [
"debian"
],
"event.dataset": [
"elastic_agent.filebeat"
]
}
}

In the above case they come from an official Fortinet integration

but I also see other from my custom parsing:


"message": "Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.October, 9, 7, 17, 38, 79000000, time.UTC), Meta:{\"input_id\":\"udp-udp-e455bb9c-05f0-4d42-bd13-60593992bb55\",\"raw_index\":\"logs-udp.generic-ece_wlc_logs\",\"stream_id\":\"udp-udp.generic-e455bb9c-05f0-4d42-bd13-60593992bb55\",\"truncated\":false}, Fields:{\"agent\":{\"ephemeral_id\":\"09bd12a2-0e46-4f6a-9c34-cf67b4a8ae42\",\"id\":\"4ba5a4aa-848d-4435-a13e-ace9584cddaa\",\"name\":\"myhost.my.dom\",\"type\":\"filebeat\",\"version\":\"8.4.2\"},\"client\":{\"mac\":\"22:7c:71:90:17:a5\"},\"data_stream\":{\"dataset\":\"udp.generic\",\"namespace\":\"ece_wlc_logs\",\"type\":\"logs\"},\"ecs\":{\"version\":\"8.0.0\"},\"elastic_agent\":{\"id\":\"4ba5a4aa-848d-4435-a13e-ace9584cddaa\",\"snapshot\":false,\"version\":\"8.4.2\"},\"event\":{\"action\":\"Client Authenticated\",\"dataset\":\"udp.generic\",\"provider\":\"APF-3-AUTHENTICATION_TRAP\",\"timezone\":\"CEST\"},\"input\":{\"type\":\"udp\"},\"log\":{\"logger\":\"haSSOServiceTask5\",\"source\":{\"address\":\"1.1.1.1:32837\"},\"syslog\":{\"hostname\":\"cisco-wlc-mgmt\"}},\"message\":\"\\u003c139\\u003ecisco-wlc-mgmt: *haSSOServiceTask5: Oct 09 09:17:38.079: %APF-3-AUTHENTICATION_TRAP: [PS]apf_80211.c:19558 Client Authenticated: MACAddress:22:7c:71:90:17:a5 Base Radio MAC:00:f2:8b:4d:bf:00 Slot:1 User Name:unknown Ip Address:unknown SSID:public-unibe\",\"oldwlc\":{\"orig\":{}},\"orig\":{\"timestamp\":\"Oct 09 09:17:38.079\"},\"source\":{\"ip\":\"unknown\",\"mac\":\"22:7c:71:90:17:a5\"},\"tags\":[\"wlc\",\"forwarded\",\"_dns_reverse_lookup_failed\"],\"user\":{\"name\":\"unknown\"},\"wlc\":{\"baseradiomac\":\"00:f2:8b:4d:bf:00\",\"ssid\":\"public-unibe\"}}, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {\"type\":\"mapper_parsing_exception\",\"reason\":\"failed to parse field [source.ip] of type [ip] in document with id 'YWObu4MBF-Kn07JyVca4'. Preview of field's value: 'unknown'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"'unknown' is not an IP string literal.\"}}, dropping event!",

nkakouros · 2022-10-11T22:25:44Z

I think it is important, whatever the explanation given here, to update the Kibana UI to explain the metrics in layman's terms and to give optical feedback of when data loss is occurring, together with links to docs on how to deal with this.

botelastic · 2023-10-11T23:04:12Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

ajax-shmyrko-o · 2024-01-05T12:54:37Z

+1, We are using Datadog to collect these metrics:

libbeat.output.events.dropped
libbeat.pipeline.events.dropped
libbeat.output.events.failed
libbeat.pipeline.events.failed

So it would be nice to have an explanation about their meaning in the documentation.

botelastic · 2025-01-04T13:16:01Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

jsoriano added docs Metricbeat Metricbeat Team:Integrations Label for the Integrations team Feature:Stack Monitoring labels Sep 6, 2021

botelastic bot added the Stalled label Oct 11, 2023

botelastic bot removed the Stalled label Jan 5, 2024

botelastic bot added the Stalled label Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify documentation of beats output errors #27763

Clarify documentation of beats output errors #27763

jsoriano commented Sep 6, 2021

elasticmachine commented Sep 6, 2021

elasticmachine commented Sep 6, 2021

nkakouros commented Sep 8, 2021

tom-x1 commented Oct 10, 2021

sachin-frayne commented May 10, 2022

6fears7 commented Oct 6, 2022 •

edited

Loading

zez3 commented Oct 6, 2022 •

edited

Loading

zez3 commented Oct 6, 2022 •

edited

Loading

tom-x1 commented Oct 7, 2022

6fears7 commented Oct 7, 2022 •

edited

Loading

zez3 commented Oct 9, 2022

zez3 commented Oct 9, 2022

zez3 commented Oct 9, 2022

nkakouros commented Oct 11, 2022

botelastic bot commented Oct 11, 2023

ajax-shmyrko-o commented Jan 5, 2024 •

edited

Loading

botelastic bot commented Jan 4, 2025

Clarify documentation of beats output errors #27763

Clarify documentation of beats output errors #27763

Comments

jsoriano commented Sep 6, 2021

elasticmachine commented Sep 6, 2021

elasticmachine commented Sep 6, 2021

nkakouros commented Sep 8, 2021

tom-x1 commented Oct 10, 2021

sachin-frayne commented May 10, 2022

6fears7 commented Oct 6, 2022 • edited Loading

zez3 commented Oct 6, 2022 • edited Loading

zez3 commented Oct 6, 2022 • edited Loading

tom-x1 commented Oct 7, 2022

6fears7 commented Oct 7, 2022 • edited Loading

zez3 commented Oct 9, 2022

zez3 commented Oct 9, 2022

zez3 commented Oct 9, 2022

nkakouros commented Oct 11, 2022

botelastic bot commented Oct 11, 2023

ajax-shmyrko-o commented Jan 5, 2024 • edited Loading

botelastic bot commented Jan 4, 2025

6fears7 commented Oct 6, 2022 •

edited

Loading

zez3 commented Oct 6, 2022 •

edited

Loading

zez3 commented Oct 6, 2022 •

edited

Loading

6fears7 commented Oct 7, 2022 •

edited

Loading

ajax-shmyrko-o commented Jan 5, 2024 •

edited

Loading