Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Secrets: Use a cached client #3464

Closed
wants to merge 10 commits into from

Conversation

barkbay
Copy link
Contributor

@barkbay barkbay commented Sep 22, 2023

What does this PR do?

Use a namespace scoped cached client to read Kubernetes Secrets

Why is it important?

To avoid hammering the K8s API, refer to #3442

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • Not sure about the log levels...

How to test this PR locally

I don't know yet tbh 🤷

Related issues

Fixes #3442

Use cases

n/a

Screenshots

n/a

Logs

I'll update once I have a better understanding of the log levels in this project

Questions to ask yourself

  • How are we going to debug this?

Logs should be enough, if not using the k8s audit API can be an alternative.

  • What are the metrics I should take care of?

Reader created, non cached requests...

@barkbay barkbay requested a review from a team as a code owner September 22, 2023 12:49
@barkbay barkbay requested review from blakerouse and faec September 22, 2023 12:49
@mergify
Copy link
Contributor

mergify bot commented Sep 22, 2023

This pull request does not have a backport label. Could you fix it @barkbay? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 22, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-25T13:56:23.161+0000

  • Duration: 28 min 52 sec

Test stats 🧪

Test Results
Failed 0
Passed 6313
Skipped 59
Total 6372

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 22, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 66.102% (195/295) 👍
Classes 65.574% (360/549) 👎 -0.12
Methods 52.791% (1135/2150) 👍 0.022
Lines 38.214% (12873/33687) 👎 -0.014
Conditionals 100.0% (0/0) 💚

@barkbay
Copy link
Contributor Author

barkbay commented Sep 22, 2023

Unscientific comparison with an Docker image I built myself on a small dev cluster:

image

The remaining spikes come from an external secret sync. operator.

@cmacknz cmacknz requested a review from gizas September 22, 2023 17:50
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Sep 22, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@barkbay barkbay changed the title Kubernetes Secrets: Used a cached client Kubernetes Secrets: Use a cached client Sep 25, 2023
@gizas
Copy link
Contributor

gizas commented Sep 25, 2023

@barkbay are you gonna provide doc updates and samples on how to configure Elastic Agent with the secrets client?

I would expect some simple commands in the test section here to help me configure the reader for a given namespace. Can you please provide those?

@barkbay
Copy link
Contributor Author

barkbay commented Sep 25, 2023

@barkbay are you gonna provide doc updates and samples on how to configure Elastic Agent with the secrets client?

This PR doesn't change anything from a user point of view, we are just replacing a K8S client implementation by an other one. Do you have something specific in mind?

@gizas
Copy link
Contributor

gizas commented Sep 25, 2023

@barkbay are you gonna provide doc updates and samples on how to configure Elastic Agent with the secrets client?

This PR doesn't change anything from a user point of view, we are just replacing a K8S client implementation by an other one. Do you have something specific in mind?

Was referring to this page https://www.elastic.co/guide/en/fleet/current/kubernetes_secrets-provider.html as for now it is the only relevant that I can find. If this can be enhanced with an example with the manifest and refernece of a secret or some logs to troubleshoot references of https://github.com/elastic/elastic-agent/pull/3464/files#diff-8ce13c7c88aad2289f0a95358cafa95589e40293ba2dddeb14832bd2d604737fR141

kind: enhancement

# Change summary; a 80ish characters long description of the change.
summary: Use a cached client to retrieve Kubernetes Secrets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really help our users when reading the changelog. Can this be updated more to describe the benefits this provides? Less API calls, better performance, etc.

p.readers[namespace] = newReader
reader = newReader
go func() {
if err := newReader.Start(p.ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this ever get stopped? Looks to me that it leaks. If a secret is pulled from a namespace, and then the policy is updated to pull from another namespace this will stay running for the life of the Elastic Agent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the blocker, by the way. I didn't make that clear in the review. Sorry about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the main struct:

	// ctx is the context used to start the informers. Informers will be stopped once the context is done.

and then the policy is updated to pull from another namespace this will stay running for the life of the Elastic Agent.

I was assuming that the context was canceled in that case. Is there a way to get notified about such a change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context for a provider is only cancelled in the case that the Elastic Agent is being stopped.

The kubernetes context provider was the only "fetch` provider that was added to Elastic Agent and it was added after the fact. We didn't want the Elastic Agent to read/watch every secret, which we don't. The original design of all context and dynamic providers was for it to keep a local cache of all keys/values and then signal the composable manager when a key/value changes.

Based on that information it was never required to alert a provider to a re-render (which happens anytime the policy is updated, or when a key/value is changed from any provider in the Elastic Agent composable manager). I believe for this to work correctly you will need to handle this correctly to add this type of caching.

See me comment here #3442 (comment). Everything in that comment still applies even with the usage of https://github.com/kubernetes-sigs/controller-runtime/blob/d5bc8734caccabddac6a1bea250b0b9d771d318d/pkg/cache/cache.go#L61-L70

}

// ContextProviderBuilder builds the context provider.
func ContextProviderBuilder(logger *logger.Logger, c *config.Config, managed bool) (corecomp.ContextProvider, error) {
func ContextProviderBuilder(logger *logger.Logger, c *config.Config, _ bool) (corecomp.ContextProvider, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this change has the Elastic Agent now watching for changes in secrets, it would be great to see this also now trigger a re-render of the policy when a secret changes. Seems that the code is already listening to the API for those events, why not take it to the next level and improve it some more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to see this also now trigger a re-render of the policy when a secret changes.

I’m afraid I don’t know what "trigger a re-render of the policy" means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above for more context. It means that when a secret value is updated that you would call to the comm Communicator that a value has changed. That will signal the Elastic Agent composable manager to re-render the component model because something changed.

Think of it like this:

  1. Elastic Agent is running with a policy and inside of that policy it references a password from a kubernetes secret.
  2. The kubernetes secret is updated to change the password.
  3. Elastic Agent gets the event that the password has changed.
  4. It updates its cache for the secret.
  5. It alerts the composable manager that a value has changed.
  6. The Elastic Agent re-renders the policy with that new password substituted.
  7. The Elastic Agent sends the new configuration to the running component.
  8. The running component starts using the new password.

Elastic Agent already works that way today, just not for kubernetes secrets (everything else does). The kubernetes secrets provider just needs to "5." in that list and Elastic Agent would do the rest for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this change has the Elastic Agent now watching for changes in secrets,

The benefit of using the controller runtime client is to get an abstraction over the watch mechanism, and therefore to get a eventually up-to-date cache without managing watches. Given your explanations, I feel less and less confident in continuing down this path.

@elastic-sonarqube
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue with K8S Secret Provider
5 participants