Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet][Inconsistent] Windows Agent goes Unhealthy on installing Elastic Defend. #4734

Closed
harshitgupta-qasource opened this issue May 10, 2024 · 8 comments
Labels
bug Something isn't working impact:medium QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@harshitgupta-qasource
Copy link

Kibana Build details:

VERSION: 8.14.0 BC4
BUILD: 73836
COMMIT: 23ed1207772b3ae958cb05bc4cdbe39b83507707

Host OS and Browser version: All, All

Preconditions:

  1. 8.14.0 BC4 Kibana Cloud environment should be available.
  2. 8.14.0 windows agent should be installed with Elastic Defend integration.

Steps to reproduce:

  1. Navigate to the Agents Tab
  2. Wait for a while till the agent becomes unhealthy.
  3. Go to the Agent Details tab.
  4. Observe that the Windows agent alternates between unhealthy and healthy states.

Expected:

  • Windows Agent should remain Healthy throughout when installed with Elastic Defend.

Screenshot:
image (5)
image (4)

Agents Logs:
elastic-agent-diagnostics-2024-05-10T10-13-26Z-00.zip

@harshitgupta-qasource harshitgupta-qasource added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team impact:medium labels May 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@harshitgupta-qasource
Copy link
Author

@karanbirsingh-qasource Kindly review

@ghost
Copy link

ghost commented May 10, 2024

secondary review is done

@cmacknz
Copy link
Member

cmacknz commented May 10, 2024

Endpoint is failing to download artifacts for a time with connection reset by peer and 502 errors.

{"@timestamp":"2024-05-10T09:35:30.6624237Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":680,"name":"HttpLib.cpp"}}},"message":"HttpLib.cpp:680 Downloading artifact endpoint-hostisolationexceptionlist-windows-v1 without a proxy","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":3224,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:3224 HTTP code 502: Bad Gateway","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":3235,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:3235 Message: Get \"https://10.47.193.5:18516/api/fleet/artifacts/endpoint-hostisolationexceptionlist-windows-v1/d801aa1fb7ddcc330a5e3173372ea6af4a3d08ec58074478e85aa5603e926658\": read tcp 172.17.0.2:50022->10.47.193.5:18516: read: connection reset by peer\n","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":3267,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:3267 Failed to download artifact endpoint-hostisolationexceptionlist-windows-v1 - HTTP non-200 code received","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":734,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:734 Failed to initialize artifact, identifier: endpoint-hostisolationexceptionlist-windows-v1, reason: HTTP non-200 code received","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":1537,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:1537 All artifacts are being rejected because endpoint-hostisolationexceptionlist-windows-v1 is invalid","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":1565,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:1565 Failed to process artifact manifest","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:30.8675296Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":680,"name":"HttpLib.cpp"}}},"message":"HttpLib.cpp:680 Downloading artifact endpoint-exceptionlist-windows-v1 without a proxy","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":3224,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:3224 HTTP code 502: Bad Gateway","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":3235,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:3235 Message: {\"ok\":false,\"message\":\"backend closed connection\"}\n","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":3267,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:3267 Failed to download artifact endpoint-exceptionlist-windows-v1 - HTTP non-200 code received","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":734,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:734 Failed to initialize artifact, identifier: endpoint-exceptionlist-windows-v1, reason: HTTP non-200 code received","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":1537,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:1537 All artifacts are being rejected because endpoint-exceptionlist-windows-v1 is invalid","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":1565,"name":"Artifacts.cpp"}}},"message":"Artifacts.cpp:1565 Failed to process artifact manifest","process":{"pid":10144,"thread":{"id":2764}}}
{"@timestamp":"2024-05-10T09:35:31.0720975Z","agent":{"id":"b79b3120-0f74-4d6d-b7ed-e320302f044b","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2109,"name":"Config.cpp"}}},"message":"Config.cpp:2109 Attempting to download user artifacts","process":{"pid":10144,"thread":{"id":2764}}}

@nfritts any ideas on the root cause here?

@nfritts
Copy link

nfritts commented May 13, 2024

Did this seem to magically resolve itself? The ability to download Elastic from the internet was broken for a bit around the time this ticket was opened. I'm wondering if it was related to this?

@harshitgupta-qasource
Copy link
Author

Hi @nfritts ,

We have re-validated this issue on the latest 8.14.0 BC5 Kibana Cloud environment and found it not reproducible yet.

Observations:

  • Windows Agent remains Healthy throughout when installed or upgraded with Elastic Defend.

Build details:
VERSION: 8.14.0 BC5
BUILD: 73931
COMMIT: 7ea00b6178d67183a4def9bdd060b062cced043e

Further, we can close this issue if no changes are required here.

Kindly let us know if anything else is required from our end.

Thanks

@nfritts
Copy link

nfritts commented Jun 6, 2024

I think we can close this. I think it was a failure in the cloud.

@amolnater-qasource
Copy link

Thank you @nfritts

@amolnater-qasource amolnater-qasource added the QA:Validated Validated by the QA Team label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:medium QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

5 participants