Reduce the amount of stored ReplicaSet data #5580

swiatekm · 2024-09-20T12:53:15Z

What does this PR do?

Use a transform function to drop all ReplicaSet data in the kubernetes provider, except for the owner references, which we need to find the Deployment name.

Unfortunately, the autodiscovery library doesn't let us pass the transform function in, so I had to copy their implementation and make the change myself. The plan is to upstream this change later.

Why is it important?

In clusters with a lot of Deployments, we end up storing a lot of ReplicaSets in the local cache of each agent, resulting in significant unnecessary memory consumption.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

There isn't an easy way. You can start a local K8s cluster, create a lot of Deployments, deploy Agent using the Helm Chart here, and check the memory consumption.

Related issues

Partially resolves Kubernetes metadata overwhelms memory limits in the Agent process #4729
Closes Agent and beats store too much ReplicaSet data in K8s #5623

mergify · 2024-09-20T12:53:51Z

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-09-20T12:53:52Z

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

swiatekm · 2024-09-20T15:48:48Z

Note: I'd like to verify that this actually improves the situation in a real cluster. I'm working with our SRE team to do this. Until then, I'm keeping this PR as Draft and not adding unit tests or changelog entries.

pkoutsovasilis · 2024-09-23T13:34:16Z

internal/pkg/composable/providers/kubernetes/pod.go

+	transformed.ObjectMeta = kubernetes.ObjectMeta{
+		Name:            old.GetName(),
+		Namespace:       old.GetNamespace(),
+		OwnerReferences: old.GetOwnerReferences(),


I think that if you don't include ResourceVersion: old.GetResourceVersion() here then you effectively make the opts.IsUpdated check inside UpdateFunc of the informer to be always false

pkoutsovasilis · 2024-09-23T13:53:02Z

Observations:

I like the introduction of the transform func and I left a comment on it

I am wondering the following; since it seems we want only the Metadata why aren't we using something like a metadata shared informer?! e.g. snippet for replicasets

metadataClient, err := metadata.NewForConfig(config) // "k8s.io/client-go/metadata"
if err != nil {
  panic(err.Error())
}

// Define the resource to watch metadata for
gvr := schema.GroupVersionResource{
    Group:    "apps",
Version:  "v1",
Resource: "replicasets",
}

resyncPeriod := 10 * time.Second

indexers := cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}

// Create the SharedIndexInformer using the provided code snippet
informer := cache.NewSharedIndexInformer(
    &cache.ListWatch{
        ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
            return metadataClient.Resource(gvr).List(ctx, options)
        },
        WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
            return metadataClient.Resource(gvr).Watch(ctx, options)
        },
    },
    &metav1.PartialObjectMetadata{},
    resyncPeriod,
    indexers,
)

after a quick local-testing because PartialObjectMetadata struct is way smaller than ReplicaSet struct used in this snippet the allocations to unmarshall the former (happening before calling transform func) are way less than the ones required for the latter and this is evident in List and Stream funcs of the informer.

internal/pkg/composable/providers/kubernetes/watcher.go

Use a transform function to drop all data except for the owner references, which we need to find the Deployment name.

elastic-sonarqube · 2024-09-27T14:55:19Z

Quality Gate failed

Failed conditions
9.8% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

swiatekm · 2024-10-01T15:54:53Z

Note: This code is going to move to the autodiscovery library, because we need to make a similar change in beats as well. I'm going to keep it here for now, and rebase once that's done. See #5580 for more details.

swiatekm · 2024-10-02T13:42:08Z

Closing in favor of elastic/elastic-agent-autodiscover#111. I'm going to recreate this PR after releasing a new version of autodiscover and updating go.mod in agent.

…cts (#109) We only use metadata from Jobs and ReplicaSets, but require that full resources are supplied. This change relaxes this requirement, allowing PartialObjectMetadata resources to be used. This allows callers to use metadata informers and avoid having to receive and deserialize non-metadata updates from the API Server. See elastic/elastic-agent#5580 for an example of how this could be used. I'm planning to add the metadata informer from that PR to this library as well. Together, these will allow us to greatly reduce memory used for processing and storing ReplicaSets and Jobs in beats and elastic-agent. This is will help elastic/elastic-agent#5580 and elastic/elastic-agent#4729 specifically, and elastic/elastic-agent#3801 in general.

Introduce metadata-only watchers to the kubernetes package. These are useful if we only need to track metadata for a resource - a good example are ReplicaSets, for which we usually only care about the OwnerReferences. As a result, we only store the metadata, reducing steady-state memory consumption, but also only get updates involving metadata, reducing churn greatly in larger clusters. The implementation introduces new constructors for the Watcher, allowing an informer to be passed in. Existing constructors are implemented using the new constructor, though none of the code actually changes. As a result, it is now possible to unit test the watcher, and I've added some basic unit tests for it. We also add two helper functions: - `GetKubernetesMetadataClient` creates a metadata-only kubernetes client, and is very similar to the existing `GetKubernetesClient` - `RemoveUnnecessaryReplicaSetData` is a transform function that can be passed into an informer so it only stores the metadata we actually use I tested these new functions in both beats and agent, in a kind cluster as well as one of our staging clusters. This is part of the solution to elastic/elastic-agent#5580. --------- Co-authored-by: Mauri de Souza Meneguzzo <[email protected]>

swiatekm added the enhancement New feature or request label Sep 20, 2024

mergify bot assigned swiatekm Sep 20, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 20, 2024

swiatekm force-pushed the k8sprovider-replicaset-storage branch from a56dfca to 1e43602 Compare September 20, 2024 14:58

pkoutsovasilis reviewed Sep 23, 2024

View reviewed changes

henrikno reviewed Sep 25, 2024

View reviewed changes

internal/pkg/composable/providers/kubernetes/watcher.go Show resolved Hide resolved

swiatekm added 2 commits September 27, 2024 15:35

Reduce the amount of stored ReplicaSet data

57a9b92

Use a transform function to drop all data except for the owner references, which we need to find the Deployment name.

Use metadata informer for ReplicaSets

6412ce3

swiatekm force-pushed the k8sprovider-replicaset-storage branch from 1e43602 to 6412ce3 Compare September 27, 2024 13:35

swiatekm mentioned this pull request Sep 27, 2024

Allow ReplicaSet and Job metadata generators to use partial meta objects elastic/elastic-agent-autodiscover#109

Merged

swiatekm mentioned this pull request Sep 30, 2024

Agent and beats store too much ReplicaSet data in K8s #5623

Closed

6 tasks

swiatekm closed this Oct 2, 2024

swiatekm mentioned this pull request Oct 2, 2024

Add metadata watcher and informer elastic/elastic-agent-autodiscover#111

Merged

pkoutsovasilis mentioned this pull request Oct 4, 2024

Only watch metadata for ReplicaSets in K8s provider #5699

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the amount of stored ReplicaSet data #5580

Reduce the amount of stored ReplicaSet data #5580

swiatekm commented Sep 20, 2024 •

edited by jlind23

Loading

mergify bot commented Sep 20, 2024

mergify bot commented Sep 20, 2024

swiatekm commented Sep 20, 2024

pkoutsovasilis Sep 23, 2024 •

edited

Loading

pkoutsovasilis commented Sep 23, 2024 •

edited

Loading

elastic-sonarqube bot commented Sep 27, 2024

swiatekm commented Oct 1, 2024

swiatekm commented Oct 2, 2024

Reduce the amount of stored ReplicaSet data #5580

Reduce the amount of stored ReplicaSet data #5580

Conversation

swiatekm commented Sep 20, 2024 • edited by jlind23 Loading

What does this PR do?

Why is it important?

Checklist

Disruptive User Impact

How to test this PR locally

Related issues

mergify bot commented Sep 20, 2024

mergify bot commented Sep 20, 2024

swiatekm commented Sep 20, 2024

pkoutsovasilis Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

pkoutsovasilis commented Sep 23, 2024 • edited Loading

elastic-sonarqube bot commented Sep 27, 2024

Quality Gate failed

swiatekm commented Oct 1, 2024

swiatekm commented Oct 2, 2024

swiatekm commented Sep 20, 2024 •

edited by jlind23

Loading

pkoutsovasilis Sep 23, 2024 •

edited

Loading

pkoutsovasilis commented Sep 23, 2024 •

edited

Loading