Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm] Missing hostNetwork: true for managed agent in perNode preset #6324

Closed
eedugon opened this issue Dec 13, 2024 · 8 comments · Fixed by #6345
Closed

[Helm] Missing hostNetwork: true for managed agent in perNode preset #6324

eedugon opened this issue Dec 13, 2024 · 8 comments · Fixed by #6345
Assignees
Labels
bug Something isn't working

Comments

@eedugon
Copy link
Contributor

eedugon commented Dec 13, 2024

When following the instructions at https://www.elastic.co/guide/en/fleet/current/example-kubernetes-fleet-managed-agent-helm.html and installing a Fleet Managed with preset perNode

helm install demo ./deploy/helm/elastic-agent \
--set agent.fleet.enabled=true \
--set agent.fleet.url=https://fleet-svc.default.svc \
--set agent.fleet.token=myToken \
--set agent.fleet.preset=perNode \
--set agent.fleet.insecure=true

The agent works fine but it cannot access the kubelet endpoint when configuring k8s integration:

"log.level":"error","@timestamp":"2024-12-13T08:47:25.573Z","message":"Error fetching data for metricset kubernetes.proxy: error getting metrics: error making http request: Get \"http://localhost:10249/metrics\": dial tcp 127.0.0.1:10249: connect: connection refused","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-default","type":"kubernetes/metrics"},"log":{"source":"kubernetes/metrics-default"},"log.origin":{"file.line":333,"file.name":"module/wrapper.go","function":"github.com/elastic/beats/v7/metricbeat/mb/module.(*metricSetWrapper).handleFetchError"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

If we want this preset to work with the default values of the k8s integration we should probably add hostNetwork: true.

The hostNetwork true I believe it's also needed to perform the system monitoring of the network interfaces.

If we don't want to use hostNetwork: true (which I would love to), then we have to determine and document how to perform the kubelet monitoring and the system interfaces monitoring.

hostNetwork is apparently needed to also show the real hostnames in monitoring data instead of the pod name, which is important for infrastructure monitoring.

cc: @pkoutsovasilis ;)

@cmacknz
Copy link
Member

cmacknz commented Dec 13, 2024

hostNetwork is apparently needed to also show the real hostnames in monitoring data instead of the pod name, which is important for infrastructure monitoring.

This is the biggest reason we do this by default for Fleet managed agents, Fleet identifies agent by hostname in most of the UI and the pod names aren't stable and change constantly. Using the node hostname solves this.

@pkoutsovasilis
Copy link
Contributor

hostNetwork is apparently needed to also show the real hostnames in monitoring data instead of the pod name, which is important for infrastructure monitoring.

This is the biggest reason we do this by default for Fleet managed agents, Fleet identifies agent by hostname in most of the UI and the pod names aren't stable and change constantly. Using the node hostname solves this.

sure the daemonset of fleet mode having the hostnetwork: true by default, I can understand due to the system integration. But you raise a valid point @cmacknz with the node hostname but if we make this the case for all fleet-managed agents, then a separate deployment/statefulset of fleet-managed agents will fail to start with the following error

Error: runtime manager: error starting tcp listener for runtime manager: listen tcp 127.0.0.1:6789: bind: address already in use

right?

@cmacknz
Copy link
Member

cmacknz commented Dec 16, 2024

Right yes, we could change that port 6789 to port 0 in the agent configuration but this would be a general source of surprising conflict for people in other integrations that bind to network ports.

So best to leave hostNetwork false unless it solves an actual functional problem for data collection, which hostname gets displayed is more of a nice to have for the UI.

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Dec 16, 2024

even if we switched to port 0, would that mean that two agent would appear with the same name on Fleet?! Is this something we would want? I am assuming no, but assumptions are a little bit tricky thus, If we are okay with it, then maybe we can utilise other techniques e.g.

env:
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

to get the node name without hostNetwork 🙂 However if a pod goes to another Node you know... The same can happen even with hostNetwork: true for anything else than a Daemonset

@cmacknz
Copy link
Member

cmacknz commented Dec 16, 2024

Yeah this breaks down completely once the agent isn't a DaemonSet.

@pkoutsovasilis
Copy link
Contributor

Yeah this breaks down completely once the agent isn't a DaemonSet.

then I believe the best way forward is to have this enabled by default for the DaemonSet (aka perNode preset) used by kubernetes and system integrations

@cmacknz
Copy link
Member

cmacknz commented Dec 16, 2024

Sounds good to me.

@eedugon
Copy link
Contributor Author

eedugon commented Dec 17, 2024

@cmacknz

This is the biggest reason we do this by default for Fleet managed agents, Fleet identifies agent by hostname in most of the UI and the pod names aren't stable and change constantly. Using the node hostname solves this.

I think the reason to use hostNetwork: true should be something else then pod names. IMO we should use hostNetwork: true in workloads that really require it because they need complete access to the host network (including inheriting its hostname).

The only legitimate case that I can think of is a DaemonSet with system and kubernetes integration enabled. I don't see any need on doing it for a deployment in charge of KSM metrics or any other elastic agent, but I could be wrong.

The issue you mention about getting new agent names over and over in Fleet UI when they are ephemeral and associated to pods with dynamic names should be totally ok, as they are actually ephemeral. In 8.16, the new policy settings for automatically removing inactive agents works pretty well on Kubernetes, so I wouldn't consider that annoying anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants