Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts for 30s metrics #3853

Open
4 tasks
leehinman opened this issue Nov 30, 2023 · 4 comments
Open
4 tasks

Alerts for 30s metrics #3853

leehinman opened this issue Nov 30, 2023 · 4 comments
Assignees
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@leehinman
Copy link
Contributor

Describe the enhancement:

Alerts for 30s metrics

Describe a specific use case for the enhancement or feature:

We should have alerts for the 30s metrics for common performance problems. For example if the queue is always full this should trigger an alert that performance tuning is necessary

What is the definition of done?

  • Alert for constantly full queue
  • Alert for retries
  • Additional alerts
  • Alerts can be loaded into Kibana
@leehinman leehinman added the Team:Elastic-Agent Label for the Agent team label Nov 30, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@leehinman leehinman self-assigned this Nov 30, 2023
@cmacknz
Copy link
Member

cmacknz commented Nov 30, 2023

Are these metrics also collected by the monitoring metricbeat we start? It is using the stack monitoring Beat module. If they aren't, should we add them there?

If they are, then we have them at a higher frequency in the metrics-elastic_agent.* data streams. They are just duplicated in the logs so that we always have access to them in the case that the user has disabled metric monitoring or is using a standalone agent without Fleet.

map[string]interface{}{
idKey: "metrics-monitoring-agent",
"data_stream": map[string]interface{}{
"type": "metrics",
"dataset": fmt.Sprintf("elastic_agent.%s", fixedAgentName),
"namespace": monitoringNamespace,
},
"metricsets": []interface{}{"json"},
"path": "/stats",
"hosts": []interface{}{HttpPlusAgentMonitoringEndpoint(b.operatingSystem, b.config.C)},
"namespace": "agent",
"period": metricsCollectionIntervalString,
"index": fmt.Sprintf("metrics-elastic_agent.%s-%s", fixedAgentName, monitoringNamespace),
"processors": []interface{}{
map[string]interface{}{
"add_fields": map[string]interface{}{
"target": "data_stream",
"fields": map[string]interface{}{
"type": "metrics",
"dataset": fmt.Sprintf("elastic_agent.%s", fixedAgentName),
"namespace": monitoringNamespace,
},
},
},
map[string]interface{}{
"add_fields": map[string]interface{}{
"target": "event",
"fields": map[string]interface{}{
"dataset": fmt.Sprintf("elastic_agent.%s", fixedAgentName),
},
},
},
map[string]interface{}{
"add_fields": map[string]interface{}{
"target": "elastic_agent",
"fields": map[string]interface{}{
"id": b.agentInfo.AgentID(),
"version": b.agentInfo.Version(),
"snapshot": b.agentInfo.Snapshot(),
"process": "elastic-agent",
},
},
},
map[string]interface{}{
"add_fields": map[string]interface{}{
"target": "agent",
"fields": map[string]interface{}{
"id": b.agentInfo.AgentID(),
},
},
},
map[string]interface{}{
"copy_fields": map[string]interface{}{
"fields": httpCopyRules(),
"ignore_missing": true,
"fail_on_error": false,
},
},
map[string]interface{}{
"drop_fields": map[string]interface{}{
"fields": []interface{}{
"http",
},
"ignore_missing": true,
},
},
map[string]interface{}{
"add_fields": map[string]interface{}{
"target": "component",
"fields": map[string]interface{}{
"id": "elastic-agent",
"binary": "elastic-agent",
},
},
},
},
},
}
for unit, binaryName := range componentIDToBinary {
if !isSupportedMetricsBinary(binaryName) {
continue
}
endpoints := []interface{}{prefixedEndpoint(utils.SocketURLWithFallback(unit, paths.TempDir()))}
name := strings.ReplaceAll(strings.ReplaceAll(binaryName, "-", "_"), "/", "_") // conform with index naming policy
if isSupportedBeatsBinary(binaryName) {
beatsStreams = append(beatsStreams, map[string]interface{}{
idKey: "metrics-monitoring-" + name,
"data_stream": map[string]interface{}{
"type": "metrics",
"dataset": fmt.Sprintf("elastic_agent.%s", name),
"namespace": monitoringNamespace,
},
"metricsets": []interface{}{"stats", "state"},

@leehinman
Copy link
Contributor Author

Are these metrics also collected by the monitoring metricbeat we start? It is using the stack monitoring Beat module. If they aren't, should we add them there?

Yes, same metrics. And yes, these alerts should get added to stack monitoring. But for this first pass of development I want to keep them separate so we can use them for debugging support cases and figure out what good limits are for the alerts and what kind of alerts are useful.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jun 4, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

4 participants