-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic agent uses too much memory per Pod in k8s #5835
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Potential fixes I thought of below. Not sure how feasible they are given the provider architecture, but Vars themselves seem flexible enough to permit them:
|
@swiatekm You can assume that the data that is passed into the provider is not modified. The reason this code copies is because it ensures that the provider doesn't update the structure after sending to |
One aspect of this issue is that the Agent Coordinator doesn't tell variable providers which variables it actually wants, which means providers need to collect all supported data even if it's known deterministically that it will never be needed. There's an issue already involving preprocessing variable substitutions in new policies, and a natural feature to include with it is to give providers an explicit list of variables that need to be monitored. In that case, an empty policy that doesn't actually use the kubernetes metadata could avoid querying it in the first place. |
I think most of this data is actually used by filebeat. See an example filebeat input: data_stream:
dataset: kubernetes.container_logs
id: kubernetes-container-logs-nginx-d556bf558-vcdpl-4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995
index: logs-kubernetes.container_logs-default
parsers:
- container:
format: auto
stream: all
paths:
- /var/log/containers/*4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995.log
processors:
- add_fields:
fields:
input_id: filestream-container-logs-4b47f8c5-5515-4267-a33d-4fb64806f81c-kubernetes-f483454b-e8f2-42b5-8c22-4da229e86b8a.nginx
target: '@metadata'
- add_fields:
fields:
dataset: kubernetes.container_logs
namespace: default
type: logs
target: data_stream
- add_fields:
fields:
dataset: kubernetes.container_logs
target: event
- add_fields:
fields:
stream_id: kubernetes-container-logs-nginx-d556bf558-vcdpl-4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995
target: '@metadata'
- add_fields:
fields:
id: 5206cacb-562e-46a1-b256-0b6833a0d653
snapshot: false
version: 8.15.0
target: elastic_agent
- add_fields:
fields:
id: 5206cacb-562e-46a1-b256-0b6833a0d653
target: agent
- add_fields:
fields:
id: 4bd5a70737faebfa2fbd4d34b9003cf8f32cd086787133590342e24d331da995
image:
name: nginx:1.14.2
runtime: containerd
target: container
- add_fields:
fields:
cluster:
name: kind
url: kind-control-plane:6443
target: orchestrator
- add_fields:
fields:
container:
name: nginx
labels:
app: nginx
pod-template-hash: d556bf558
namespace: default
namespace_labels:
kubernetes_io/metadata_name: default
namespace_uid: 9df9c3db-a0ca-426d-bbb5-0c63092a39ae
node:
hostname: kind-control-plane
labels:
beta_kubernetes_io/arch: amd64
beta_kubernetes_io/os: linux
kubernetes_io/arch: amd64
kubernetes_io/hostname: kind-control-plane
kubernetes_io/os: linux
node-role_kubernetes_io/control-plane: ""
name: kind-control-plane
uid: 0b6d3cbf-7a86-4775-9351-86f5448c21d8
pod:
ip: 10.244.0.102
name: nginx-d556bf558-vcdpl
uid: f483454b-e8f2-42b5-8c22-4da229e86b8a
replicaset:
name: nginx-d556bf558
target: kubernetes
- add_fields:
fields:
annotations:
elastic_co/dataset: ""
elastic_co/namespace: ""
elastic_co/preserve_original_event: ""
target: kubernetes
- drop_fields:
fields:
- kubernetes.annotations.elastic_co/dataset
ignore_missing: true
when:
equals:
kubernetes:
annotations:
elastic_co/dataset: ""
- drop_fields:
fields:
- kubernetes.annotations.elastic_co/namespace
ignore_missing: true
when:
equals:
kubernetes:
annotations:
elastic_co/namespace: ""
- drop_fields:
fields:
- kubernetes.annotations.elastic_co/preserve_original_event
ignore_missing: true
when:
equals:
kubernetes:
annotations:
elastic_co/preserve_original_event: ""
- add_tags:
tags:
- preserve_original_event
when:
and:
- has_fields:
- kubernetes.annotations.elastic_co/preserve_original_event
- regexp:
kubernetes:
annotations:
elastic_co/preserve_original_event: ^(?i)true$
prospector:
scanner:
symlinks: true
type: filestream |
This could be true, but one thing that hasn't been obvious to me looking at examples is how much unused metadata implicitly comes with the current approach. E.g. in the example you give, even though it needs different categories of Kubernetes metadata for potentially many containers/pods, all the actual substituted fields together are still quite small, and would still not be a major memory footprint even with hundreds or thousands of pods/containers. (There may be limits on how well we can act on that, though, depending on what's cached by the Kubernetes support libraries themselves.) Anyway, if at some point there's reason to think that pruning fields would help, it would definitely be a feasible modification for the Agent Coordinator to pass the relevant/minimal field list through to variable providers. |
@swiatekm, your comment here #5835 (comment) seems to be related to something else https://github.com/elastic/ingest-dev/issues/2454#issuecomment-1737569099 that investigated the impact of |
To be clear, I wasn't making a statement about the performance of filebeat, just that the generated configuration for it appears to use a lot of the Pod metadata. This issue is about the elastic-agent binary exclusively. |
I think the complexity of only providing the fields that are used by the provider might outweigh the memory savings unless we could determine this to be a very large amount. The complexity comes when the policy changes and now a new field is now being referenced but the provider is not providing that variable now because it was omitted from the previous policy. Now the new set of fields need to be sent to the provider and then the provider now needs to update all mappings with that new variable. I think that complexity might outweigh the benefit of such a change. I think the patch of not cloning the mappings might be a large enough win that the need to omit fields might not be required. I do think this issue solved #3609 would be very nice to have done, and I think something that needs to be done relative to Hybrid OTel mode. Because when the agent is only running OTel configuration it should not be running any providers. That would be a memory savings for all agents that are not referencing anything kubernetes related in the policy. |
Experimenting with removing var cloning brought me to an unusual discovery: On agent start, we spend a lot of memory on getting the host ip addresses. See these heap profiles: I don't get why this would be the case. I verified we only call the host provider function once, and it allocating 33 MB while just getting ip addresses from Linux seems very excessive. But we only use Go stdlib functions to do this. Looking at interfaces via shell commands on the Node didn't yield anything unusual. I'll see if I can run a small program to show me if we're maybe iterating over a bunch of junk data - Kubernetes is know to mess with local networking a lot. EDIT: Looks like this is a kind of N+1 problem with syscalls. The code lists interfaces, and then lists addresses for each interface. The second call is not cheap, so if we have a lot of interfaces on a machine - which happens on a K8s Node with a lot of Pods - it adds up to a fair amount of memory allocations. There is actually a go stdlib function that gets all the addresses in a single syscall, I'll try to use it and see if that helps. |
When fetching ip addresses for the host, we fetch all the network interfaces, and then ip addresses for each interface. The latter call is surprisingly expensive on unix, as it involves opening a netlink socket, sending a request for routing information, and receiving and parsing the response. If the host has a lot of network interfaces, this can eat surprising amounts of memory - I got in the order of 10 MB on a Kubernetes Node with 100 Pods. See elastic/elastic-agent#5835 (comment) for some heap profiles from elastic-agent. Instead, get all the addresses in a single stdlib call. We don't actually care about which interface each ip address is attached to, we just want all of them. I've tested this in the real world scenario discussed in elastic/elastic-agent#5835, not sure how to include a self-contained test in this repo.
@andrewkroh any quick ideas about how to bring down the memory usage from |
@cmacknz elastic/go-sysinfo#246 should already help a lot here. |
Nice! How much of an improvement was that out of curiosity? |
Around 4x less memory in my environment. |
Hi there, Is there anything that can be done to reduce the memory used? Here is the result of a
The agent is in version 8.14.3, with both System and Kubernetes integrations. I've removed the System integration to try lower the memory footprint, and it does lead to a lower For reference, the agent config (fleet-enrolled, standalone behaves the same)apiVersion: apps/v1
kind: DaemonSet
metadata:
name: elastic-agent
namespace: kube-system
labels:
app: elastic-agent
spec:
selector:
matchLabels:
app: elastic-agent
template:
metadata:
labels:
app: elastic-agent
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
serviceAccountName: elastic-agent
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: elastic-agent
image: docker.elastic.co/beats/elastic-agent:8.14.3
env:
- name: FLEET_ENROLL
value: '1'
- name: FLEET_INSECURE
value: 'false'
- name: FLEET_URL
value: 'https://someid.fleet.westeurope.azure.elastic-cloud.com:443'
- name: FLEET_ENROLLMENT_TOKEN
value: 'the-token'
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ELASTIC_NETINFO
value: 'false'
securityContext:
runAsUser: 0
resources:
# Commented-out the limits, otherwise we run into OOMkilling land.
requests:
cpu: 100m
memory: 400Mi
volumeMounts:
- name: proc
mountPath: /hostfs/proc
readOnly: true
- name: cgroup
mountPath: /hostfs/sys/fs/cgroup
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
- name: etc-full
mountPath: /hostfs/etc
readOnly: true
- name: var-lib
mountPath: /hostfs/var/lib
readOnly: true
- name: etc-mid
mountPath: /etc/machine-id
readOnly: true
- name: sys-kernel-debug
mountPath: /sys/kernel/debug
- name: elastic-agent-state
mountPath: /usr/share/elastic-agent/state
volumes:
- name: proc
hostPath:
path: /proc
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlog
hostPath:
path: /var/log
- name: etc-full
hostPath:
path: /etc
- name: var-lib
hostPath:
path: /var/lib
- name: etc-mid
hostPath:
path: /etc/machine-id
type: File
- name: sys-kernel-debug
hostPath:
path: /sys/kernel/debug
- name: elastic-agent-state
hostPath:
path: /var/lib/elastic-agent-managed/kube-system/state-1
type: DirectoryOrCreate
#- name: universal-profiling-cache
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: ClusterRole
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: kube-system
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: Role
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: elastic-agent-kubeadm-config
namespace: kube-system
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: kube-system
roleRef:
kind: Role
name: elastic-agent-kubeadm-config
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: elastic-agent
labels:
k8s-app: elastic-agent
rules:
- apiGroups: ['']
resources:
- nodes
- namespaces
- events
- pods
- services
- configmaps
- serviceaccounts
- persistentvolumes
- persistentvolumeclaims
verbs: ['get', 'list', 'watch']
#- apiGroups: [""]
- apiGroups: ['extensions']
resources:
- replicasets
verbs: ['get', 'list', 'watch']
- apiGroups: ['apps']
resources:
- statefulsets
- deployments
- replicasets
- daemonsets
verbs: ['get', 'list', 'watch']
- apiGroups:
- ''
resources:
- nodes/stats
verbs:
- get
- apiGroups: ['batch']
resources:
- jobs
- cronjobs
verbs: ['get', 'list', 'watch']
- nonResourceURLs:
- '/metrics'
verbs:
- get
- apiGroups: ['rbac.authorization.k8s.io']
resources:
- clusterrolebindings
- clusterroles
- rolebindings
- roles
verbs: ['get', 'list', 'watch']
- apiGroups: ['policy']
resources:
- podsecuritypolicies
verbs: ['get', 'list', 'watch']
- apiGroups: ['storage.k8s.io']
resources:
- storageclasses
verbs: ['get', 'list', 'watch']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent
namespace: kube-system
labels:
k8s-app: elastic-agent
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: ['get', 'create', 'update']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: elastic-agent-kubeadm-config
namespace: kube-system
labels:
k8s-app: elastic-agent
rules:
- apiGroups: ['']
resources:
- configmaps
resourceNames:
- kubeadm-config
verbs: ['get']
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-agent
namespace: kube-system
labels:
k8s-app: elastic-agent
--- |
@Marchelune you could try setting |
In 8.16.2, which contains all of my improvements, the historical heap profile looks much more reasonable: Measuring memory consumption with 100 Pods on the Node, and Pods being created and deleted regularly, shows around ~100 MB worth of savings. There's not any noticeable change in a steady state without Pod changes, unfortunately. |
In its default configuration, agent has the kubernetes provider enabled. In DaemonSet mode, this provider keeps track of data about Pods scheduled on the Node the agent is running on. This issue concerns the fact that the agent process itself uses an excessive amount of memory if the number of these Pods is high (for the purpose of this issue, this will mean close to the default Kubernetes limit of 110). This was originally discovered while troubleshooting #4729.
This effect is visible even if we disable all inputs and self-monitoring, leaving agent to run as a single process without any components. This strongly implies it has to do with configuration variable providers. I used this empty configuration in my testing to limit confounding variables from beats, but the effect is more pronounced when components using variables are present in the configuration.
Here's a graph of agent memory consumption as the number of Pods on the Node increases from 10 to 110:
A couple of observations from looking at how configuration changes affect this behaviour:
Test setup
More data
I resized my Deployment a couple times and looked at a heap profile of the agent process:
The churn appears to be coming primarily from needing to recreate all the variables whenever a Pod is updated. The call to
composable.cloneMap
is where we copy data from the host provider.Root cause
The root cause appears to be a combination of behaviours in the variable pipeline:
The text was updated successfully, but these errors were encountered: