-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows agent getting unhealthy with System integration and no data under kafka output. #6049
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
@muskangulati-qasource Please review. |
Secondary review is Done for this ticket |
This is #5332 again |
#5332 was resolved last week. @amolnater-qasource would you mind re-testing this issue with the latest |
@ycombinator Thank you for the update. We have revalidated this issue on latest 8.17.0 snapshot available and found this issue still reproducible.
Artifact: https://snapshots.elastic.co/8.17.0-1c58bcd8/downloads/beats/elastic-agent/elastic-agent-8.17.0-SNAPSHOT-windows-x86_64.zip Logs: Please let us know if we are missing anything here. Thanks!! |
Same problem as before {"log.level":"warn","@timestamp":"2024-11-26T04:43:43.726Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (HEALTHY->DEGRADED): Error fetching data for metricset beat.stats: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"} The intended fix for this is in 8.17 but it looks like it isn't helping here for some reason, https://github.com/elastic/elastic-agent/commits/8.17 Assigning to @pchila to investigate. |
@cmacknz from the logs we can see that we get the error about opening the named pipe more than once (with the fix of #5332 we trigger by default at the second consecutive error, with the assumption that the fetch will be retried in 60s) What we can see from the logs is that the unit {"log.level":"info","@timestamp":"2024-11-26T04:25:42.656Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (CONFIGURING->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"HEALTHY","old_state":"CONFIGURING"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-11-26T04:25:42.658Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (HEALTHY->DEGRADED): Error fetching data for metricset beat.stats: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.657Z","message":"Beat ID: 779a746f-aa35-4946-8b9e-06ab08036161","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"winlog"},"log":{"source":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"service.name":"filebeat","ecs.version":"1.6.0","log.origin":{"file.line":1070,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-26T04:25:42.669Z","message":"Error fetching data for metricset http.json: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-26T04:25:42.671Z","message":"Error fetching data for metricset http.json: error making http request: Get \"http://npipe/inputs\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"service.name":"metricbeat","ecs.version":"1.6.0","log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"ecs.version":"1.6.0"}","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"service.name":"metricbeat","ecs.version":"1.6.0","log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.656Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (CONFIGURING->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"HEALTHY","old_state":"CONFIGURING"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-11-26T04:25:42.658Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (HEALTHY->DEGRADED): Error fetching data for metricset beat.stats: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.657Z","message":"Beat ID: 779a746f-aa35-4946-8b9e-06ab08036161","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"winlog"},"log":{"source":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"service.name":"filebeat","ecs.version":"1.6.0","log.origin":{"file.line":1070,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-26T04:25:42.669Z","message":"Error fetching data for metricset http.json: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-26T04:25:42.671Z","message":"Error fetching data for metricset http.json: error making http request: Get \"http://npipe/inputs\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"service.name":"metricbeat","ecs.version":"1.6.0","log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-26T04:25:42.671Z","message":"Error fetching data for metricset http.json: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\hC6H1faJ6uJdcqwMEc7XDxNvsCB7nGo1.sock: The system cannot find the file specified.","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.677Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (CONFIGURING->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"HEALTHY","old_state":"CONFIGURING"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-11-26T04:25:42.677Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (HEALTHY->DEGRADED): Error fetching data for metricset http.json: error making http request: Get \"http://npipe/inputs\": open \\\\.\\pipe\\K5P2Tc74wcqesMl7DzGLkORtOcGrPO1a.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.681Z","message":"Output reload is enabled, the beat will restart as needed on change of output config","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"winlog"},"log":{"source":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"log.logger":"centralmgmt","log.origin":{"file.line":204,"file.name":"management/managerV2.go","function":"github.com/elastic/beats/v7/x-pack/libbeat/management.NewV2AgentManagerWithClient"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.681Z","message":"Set gc percentage to: 100","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"winlog"},"log":{"source":"winlog-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"log.origin":{"file.line":1124,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-26T04:25:42.684Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed beat/metrics-monitoring-metrics-monitoring-beats (DEGRADED->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"beat/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"beat/metrics-monitoring-metrics-monitoring-beats","type":"input","state":"HEALTHY","old_state":"DEGRADED"},"ecs.version":"1.6.0"} Probably for these cases the default of triggering the DEGRADED state on the second consecutive error is not enough... I will test to see if there's a more appropriate value for this input tomorrow. |
It may be because the Beats restarted because of an output change that caused the second instance of the problem. Beats decide to do this on their own and don't communicate their intent to restart to agent. |
@amolnater-qasource since you already have the setup for this test, could you try to increase the To set a different
Another interesting test would be to disable the failure threshold completely using
with 0 as failure threshold the monitoring input should never degrade no matter how many errors we get ;) In order to "reset" the value to the default (not exactly but it will be close enough) you can reset the
Could you please run these values of failure_threshold to see if the issue reproduces? |
Hi @pchila Just an update, that even under the #6049 (comment) tests, agent remained Healthy on Fleet UI and we observed no data under Kafka topic. Further please find below the logs for today's tests: Threshold: 05 Threshold: 0 Threshold: 02 Observations:
Please let us know if we are missing anything here. Thanks! |
If the agent was displayed as healthy in the Fleet UI it means that the fix for #5332 is working as intended (although in the logs we can see that some monitoring inputs become DEGRADED briefly, the agent doesn't stay DEGRADED for a long time). Looking into the logs though I noticed a lot of panics in metricbeat that looks related to the kafka output publishing for component {"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"panic: runtime error: invalid memory address or nil pointer dereference","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"[signal 0xc0000005 code=0x0 addr=0x30 pc=0x1ebc014]","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"goroutine 2168 [running]:","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"github.com/elastic/beats/v7/libbeat/outputs/kafka.(*client).Publish(0xc000306c08, {0xc002d33f70?, 0x0?}, {0x8441540, 0xc00383a9c0})","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"github.com/elastic/beats/v7/libbeat/outputs/kafka/client.go:167 +0xf4","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*clientWorker).run(0xc0033fda40, {0x8427bc8, 0xc001c4b360})","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"github.com/elastic/beats/v7/libbeat/publisher/pipeline/client_worker.go:101 +0xc6","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"created by github.com/elastic/beats/v7/libbeat/publisher/pipeline.makeClientWorker in goroutine 2131","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-11-27T09:23:40.950Z","message":"github.com/elastic/beats/v7/libbeat/publisher/pipeline/client_worker.go:75 +0x1f0","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"system/metrics"},"log":{"source":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-27T09:23:40.964Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":645},"message":"Component state changed system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '6420' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","state":"STOPPED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-27T09:23:40.964Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b-system/metrics-system-667a20db-207d-4c4b-8901-8a7ac7c96640 (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '6420' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","state":"STOPPED"},"unit":{"id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b-system/metrics-system-667a20db-207d-4c4b-8901-8a7ac7c96640","type":"input","state":"STOPPED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-11-27T09:23:40.964Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":663},"message":"Unit state changed system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '6420' exited with code '2'","log":{"source":"elastic-agent"},"component":{"id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","state":"STOPPED"},"unit":{"id":"system/metrics-222f7bb5-918e-4247-87cc-d93c0f1d496b","type":"output","state":"STOPPED","old_state":"HEALTHY"},"ecs.version":"1.6.0"} I am not really an expert about kafka output so I guess that this could be investigated more efficiently by somebody else... |
The Fleet checkin doesn't consider history, it is a report of the current health at the time of the checkin. The source of truth for this is what happened in the logs and what The Kafka output panic is separate and concerning. CC @pierrehilbert we'll want someone to look at that. |
I opened elastic/beats#41823 to track the Kafka panic separately. |
Kibana Build details:
Artifact Link: https://snapshots.elastic.co/8.17.0-7a041bf5/downloads/beats/elastic-agent/elastic-agent-8.17.0-SNAPSHOT-windows-x86_64.zip
Preconditions:
Steps to reproduce:
Expected Result:
Windows agent should remain healthy with System integration and data should be generated under kafka output.
Screenshot:
Agent Logs:
elastic-agent-diagnostics-2024-11-18T10-38-10Z-00.zip
What's working fine:
The text was updated successfully, but these errors were encountered: