-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the amount of stored ReplicaSet data #5580
Conversation
This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
|
|
a56dfca
to
1e43602
Compare
Note: I'd like to verify that this actually improves the situation in a real cluster. I'm working with our SRE team to do this. Until then, I'm keeping this PR as Draft and not adding unit tests or changelog entries. |
transformed.ObjectMeta = kubernetes.ObjectMeta{ | ||
Name: old.GetName(), | ||
Namespace: old.GetNamespace(), | ||
OwnerReferences: old.GetOwnerReferences(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if you don't include ResourceVersion: old.GetResourceVersion()
here then you effectively make the opts.IsUpdated
check inside UpdateFunc of the informer to be always false
Observations:
|
Use a transform function to drop all data except for the owner references, which we need to find the Deployment name.
1e43602
to
6412ce3
Compare
Quality Gate failedFailed conditions |
Note: This code is going to move to the autodiscovery library, because we need to make a similar change in beats as well. I'm going to keep it here for now, and rebase once that's done. See #5580 for more details. |
Closing in favor of elastic/elastic-agent-autodiscover#111. I'm going to recreate this PR after releasing a new version of autodiscover and updating go.mod in agent. |
…cts (#109) We only use metadata from Jobs and ReplicaSets, but require that full resources are supplied. This change relaxes this requirement, allowing PartialObjectMetadata resources to be used. This allows callers to use metadata informers and avoid having to receive and deserialize non-metadata updates from the API Server. See elastic/elastic-agent#5580 for an example of how this could be used. I'm planning to add the metadata informer from that PR to this library as well. Together, these will allow us to greatly reduce memory used for processing and storing ReplicaSets and Jobs in beats and elastic-agent. This is will help elastic/elastic-agent#5580 and elastic/elastic-agent#4729 specifically, and elastic/elastic-agent#3801 in general.
Introduce metadata-only watchers to the kubernetes package. These are useful if we only need to track metadata for a resource - a good example are ReplicaSets, for which we usually only care about the OwnerReferences. As a result, we only store the metadata, reducing steady-state memory consumption, but also only get updates involving metadata, reducing churn greatly in larger clusters. The implementation introduces new constructors for the Watcher, allowing an informer to be passed in. Existing constructors are implemented using the new constructor, though none of the code actually changes. As a result, it is now possible to unit test the watcher, and I've added some basic unit tests for it. We also add two helper functions: - `GetKubernetesMetadataClient` creates a metadata-only kubernetes client, and is very similar to the existing `GetKubernetesClient` - `RemoveUnnecessaryReplicaSetData` is a transform function that can be passed into an informer so it only stores the metadata we actually use I tested these new functions in both beats and agent, in a kind cluster as well as one of our staging clusters. This is part of the solution to elastic/elastic-agent#5580. --------- Co-authored-by: Mauri de Souza Meneguzzo <[email protected]>
What does this PR do?
Use a transform function to drop all ReplicaSet data in the kubernetes provider, except for the owner references, which we need to find the Deployment name.
Unfortunately, the autodiscovery library doesn't let us pass the transform function in, so I had to copy their implementation and make the change myself. The plan is to upstream this change later.
Why is it important?
In clusters with a lot of Deployments, we end up storing a lot of ReplicaSets in the local cache of each agent, resulting in significant unnecessary memory consumption.
Checklist
./changelog/fragments
using the changelog toolDisruptive User Impact
How to test this PR locally
There isn't an easy way. You can start a local K8s cluster, create a lot of Deployments, deploy Agent using the Helm Chart here, and check the memory consumption.
Related issues