-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend agent container initialisation logic #4925
Extend agent container initialisation logic #4925
Conversation
…d raises capabilities
This pull request does not have a backport label. Could you fix it @pkoutsovasilis? 🙏
NOTE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't made it through the whole PR. Just some early comments as I was going.
dev-tools/packaging/templates/docker/Dockerfile.elastic-agent.tmpl
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR only Relates or it actually resolves #4087 (comment) and https://github.com/elastic/ingest-dev/issues/3253
Hello @jlind23 👋 this PR actually resolves #4087 as it removes elastic-agent from root group. https://github.com/elastic/ingest-dev/issues/3253 essentially has two requirements rootless agent and a helm chart for agent; this PR resolves the rootless requirement |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
What is the best way to add automated tests for this? We don't have a great template for testing agent in a container yet.
|
IMO having integration tests that target solely k8s is the way to go for testing agent in a container. I have seen the k8s tests you mention and from what I can tell these act as a mean to validate the agent k8s manifest, nothing more. On the same page, I have seen this file here which essentially sets up a standalone agent on k8s with kubernetes integration. I would expect an integration test to check that this k8s manifest produces the expected documents in the expected data-streams. However, after a quick search I can't find such a test; @cmacknz do you know if something like that exists or not? |
No, the state of our k8s specific testing is poor |
I don't think we should block this PR on having an integration test. We are not in a worse place with this PR, versus how it is today. Would really like to see if validated with an integration test, but we need to spend more time on enabling that. Blocking this PR because of that is not great. |
dev-tools/packaging/templates/docker/Dockerfile.elastic-agent.tmpl
Outdated
Show resolved
Hide resolved
We don't need to block the PR on this, but we should:
|
For another example may be able to get Buildkite to just directly invoke the agent container as a build step since Buildkite can run containers without modification. See elastic/elastic-agent-system-metrics#140 for an example buildkite pipeline that has a step running a specific container and a define command to run on that container. Edit: this is using pre-defined buildkite agent variants so perhaps not this simple, but getting CI to start a docker container is not an advanced use of CI. |
An ideal test IMO would be something similar to this. This particular test, does exactly what we need, deploys an Elasticsearch cluster though ECK operator, deploys agent with kubernetes integration enabled through a Helm chart (spoiler alert 🚨), and checks that the expected events are getting delivered to the expected data_streams. So some ways forward that pop up in my mind are:
|
… agent at base k8s files
This pull request is now in conflicts. Could you fix it? 🙏
|
…asilis/container_capabilities_chown # Conflicts: # NOTICE.txt # go.mod
… agent at base k8s files
@cmacknz I introduced the support for k8s inner tests, which are essentially go unit-tests that will run inside the agent pod. Some background of the design choices to support k8s inner tests:
Now elastic-agent standalone in k8s is tested that runs without issues against the following scenarios:
|
…doesn't report healthy
…nd elastic-agent uid
…random uid and gid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the initialization logic.
The whole inner testing logic of executing a separate binary inside of the container seems complex and error prone to debug. Would it be possible to just change this to running commands directly from like kubectl exec
instead of just running a golang binary inside of the container?
I see your point, but given the limitation of testing inside a pod mentioned here, this is the best approach I could come up with. However, I'm more than happy to remove the Kubernetes inner tests from here and isolate them in a separate PR to give it more thought. Does that sound like a good plan? 🙂 |
|
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am good with this change and I really appreciate all the testing you added. I think its better to get this in and go from there to sit on it any longer.
What does this PR do?
This PR affects only the
DOCKER
package of elastic-agent (code path ofelastic-agent container ...
). Specifically it does the following:allow all capabilities as permitted at the elastic-agent binary
adds init logic to raise all capabilities present in Permitted set when running as non-root
adds init logic to chown agent-related paths, including state, config, logs, data, etc. if either CAP_CHOWN or DAC_OVERRIDE capabilities is given
Flowchart for visual aid
Why is it important?
I consider this PR important for the following reasons:
DAC_READ_SEARCH
without the need to run as root or being part of the root group.Checklist
[ ] I have made corresponding change to the default configuration files./changelog/fragments
using the changelog tool[ ] I have added an integration test or an E2E testDisruptive User Impact
How to test this PR locally
Related issues