-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify documentation of beats output errors #27763
Comments
Pinging @elastic/integrations (Team:Integrations) |
Pinging @elastic/stack-monitoring (Stack monitoring) |
After reading the discussion and the docs I still don't know about the exact meaning of those metrics in the stack monitoring. I think we all agree that one of the most important questions a user has is: "Do I lose data or not?" I still can't answer this question. For example: I'm using packetbeat to gather DNS traffic and from time to time the "Failed in Pipeline" counter is jumping from 0 to 1000 and as stated above I'm asking myself: Do I lose those DNS queries or not? Also I don't really get the difference between the two graphs "Fail Rates" and "Output Errors". To me it sounds like they are both doing the same, but they probably don't? |
I would also like to know the meaning of these values, especially |
Inside the code for Libbeat:
When you're querying in Elastic for results of Libbeat (see below), the Output Errors is derived from the measured delta between the initial timestamp's readErrors + writeErrors and the latest timestamp's readErrors + writeErrors. According to the code commentary then, Output Errors is the number of network packets experiencing errors. The below example is utilizing apm-server as the beat type, but you can replace it to suit your needs.
|
@6fears7 |
I've added this
eg.
That 3 mil./day events are kind of a lot from my customer point of view |
@6fears7 |
In client.go, we are given an initial struct of:
Later, we see what constitutes a drop:
So dropped events would be those of the nonIndexable type. In order to determine what a "nonIndexable" type is, the code iterates through the Bulk results:
So any event that cannot be indexed (as an example here) will become tagged under the dead_letter_marker_field and dropped from the pipeline. To answer your question, I do believe that those events are worth looking into. My first guess would be to check how the pipeline is being parsed and if there's some Grok that needs to be done. |
Thank you for clarifying @6fears7 I use official pipelines from Fleet integrations but also custom ones. I also use dissect and the rest of filebeat processor.(dns is still missing from elasticsearch processor) |
Here is what I see in my Agent filbeat metrics{ my issue is with: "beat.stats.libbeat.output.events.dropped": [ and not with "beat.stats.libbeat.pipeline.events.dropped": [ @6fears7 |
So, I get a lot of such messages: Click to expand!{ In the above case they come from an official Fortinet integration but I also see other from my custom parsing:
|
I think it is important, whatever the explanation given here, to update the Kibana UI to explain the metrics in layman's terms and to give optical feedback of when data loss is occurring, together with links to docs on how to deal with this. |
Hi! We're labeling this issue as |
+1, We are using Datadog to collect these metrics:
So it would be nice to have an explanation about their meaning in the documentation. |
Hi! We're labeling this issue as |
There are several metrics reporting output errors from the
beat
module, clarify their meaning in the docs, focusing on how worrisome these errors are for users.For example it is not clear if a non-zero
beat.stats.libbeat.output.write.errors
implied some data loss, though it probably didn't ifbeat.stats.libbeat.output.events.dropped
is zero.For confirmed bugs, please report:
The text was updated successfully, but these errors were encountered: