-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"install -f" uses exec to uninstall an existing agent #4965
"install -f" uses exec to uninstall an existing agent #4965
Conversation
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
This should have a simple integration test to ensure install -f fails without the tamper protection token. This check can probably just be added to one of the existing endpoint security tests. https://github.com/elastic/elastic-agent/blob/main/testing/integration/endpoint_security_test.go |
Looks like the integration tests are failing because if this change:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please ensure integration tests exist for requiring the flag, and a test for not requiring the flag.
@elastic/support-tech-leads Heads up on the change of behavior as described in the Disruptive User Impact section of this bugfix PR. |
Hello @ycombinator - huge thanks for the heads up. May I kindly request 2 things:
|
Trying this out with an 8.14.1 agent using a policy with tamper protection enabled. Running the uninstall command fails when I omit the uninstall token with the expected error, but a force install succeeds with an orphaned endpoint as captured in the issue. An ubuntu@suave-ling:~/elastic-agent-8.14.1-linux-arm64$ sudo elastic-agent uninstall -f
[== ] Failed to uninstall service [1s] failed to uninstall component "endpoint-default": error uninstalling service: 2024-06-24 19:55:42: error: InstallLib.cpp:1253 Invalid uninstall token: exit status 28
[== ] Failed to uninstall agent [1s] Error uninstalling. Printing logs
2024-06-24T19:55:41.942Z DEBUG [uninstall.state_migration] state store /opt/Elastic/Agent/data/elastic-agent-8.14.1-1348b9/state.enc already exists
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable] Starting controller for composable inputs
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable] Started controller for composable inputs
2024-06-24T19:55:41.949Z INFO [uninstall.composable.providers.docker] Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable] Variable state changed for composable inputs; debounce started
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable] kubernetes_secrets provider skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable] Kubernetes leaderelection provider skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable.providers.kubernetes] Kubernetes provider for resource pod skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-24T19:55:41.949Z DEBUG [uninstall.composable.providers.kubernetes] Kubernetes provider for resource node skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-24T19:55:42.056Z DEBUG [uninstall.composable] Computing new variable state for composable inputs
2024-06-24T19:55:42.056Z DEBUG [uninstall.composable] Stopping controller for composable inputs
2024-06-24T19:55:42.181Z DEBUG [uninstall.composable] Stopped controller for composable inputs
2024-06-24T19:55:42.191Z INFO [uninstall] Capabilities file not found in /opt/Elastic/Agent/capabilities.yml
2024-06-24T19:55:42.192Z DEBUG [uninstall] uninstall endpoint-security service
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: debug: ProcFile.cpp:855 Found 1 cgroups for pid(7628) {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: debug: ProcFile.cpp:861 cgroup: id=0 type= path=/user.slice/user-1000.slice/session-99.scope {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: MainPosix.cpp:262 Executing uninstall {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: Internal.cpp:51 Found config path [/opt/Elastic/Endpoint/elastic-endpoint.yaml] {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: debug: ECSUtilities.cpp:420 Tamper protection enabled {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: InstallLib.cpp:961 Checking installed uninstall protection artifacts {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: Internal.cpp:51 Found config path [/opt/Elastic/Endpoint/elastic-endpoint.yaml] {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: InstallLib.cpp:720 No custom public key detected in Endpoint config {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: CryptoLib.cpp:1460 RSA signature verified {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: InstallLib.cpp:893 Failed to read os section of tamper-protection-config, continuing {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: InstallLib.cpp:982 Finished checking installed uninstall protection artifacts with result deny {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: info: InstallLib.cpp:1054 Finished checking command line provided uninstall resource result deny {"context": "command output"}
2024-06-24T19:55:42.207Z ERROR [uninstall] 2024-06-24 19:55:42: error: InstallLib.cpp:1253 Invalid uninstall token {"context": "command output"}
Error: error uninstalling agent: error uninstalling components: error uninstalling component: error uninstalling service: 2024-06-24 19:55:42: error: InstallLib.cpp:1253 Invalid uninstall token: exit status 28
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.14/fleet-troubleshooting.html
ubuntu@suave-ling:~/elastic-agent-8.14.1-linux-arm64$ sudo ./elastic-agent install -f
[ ==] Service Started [5s] Elastic Agent successfully installed, starting enrollment.
[ ==] Done [5s]
Elastic Agent has been successfully installed.
|
Running Running So it looks like the EDIT: ensuring that the uninstall configs match does not fix the issue |
When elastic-agent/internal/pkg/agent/install/uninstall.go Lines 250 to 252 in b4af28b
|
Ah, that seems like the real underlying source of the bug. |
Change the approach that is taken when "elastic-agent install -f" is ran to use an exec to run "elastic-agent uninstall -f" in cases where the agent is installed. this allows the process that runs the uninstall to use all the correct path values for the installed agent instead of the values associated with the binary that the install command is ran from.
// Uninstall will fail on protected agent. | ||
// The protected Agent will need to be uninstalled first before it can be installed. | ||
pt.Describe("Uninstalling current Elastic Agent") | ||
err = Uninstall(cfgFile, topPath, "", log, pt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this call to uninstall was failing because paths.Components()
would use a value that reflected the location from where elastic-agent install -f
was being ran and not the installed agent.
This would result in the uninstall detecting 0 components, and skipping calling uninstall on each of the services, so endpoint uninstall was never called
629dde3
to
b210ac2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one large side-effect to this change. Before this change it was actually the new version of the Elastic Agent performing the uninstallation. With this change now only the installed Elastic Agent is performing the uninstallation. The issue with that is that if there is a bug or issue in the currently installed Elastic Agent that prevents it from uninstalling, then the user is now really stuck.
I like this change and believe it is the correct change, but I do wonder if an optional flag should be added to allow it to uninstall the old-way just in-case? @cmacknz
internal/pkg/agent/cmd/install.go
Outdated
args := []string{ | ||
"uninstall", | ||
"--force", | ||
} | ||
execPath, err := exec.LookPath(paths.BinaryName) | ||
if err != nil { | ||
return fmt.Errorf("unable to find %s on path: %w", paths.BinaryName, err) | ||
} | ||
uninstall := exec.Command(execPath, args...) | ||
uninstall.Stdout = streams.Out | ||
uninstall.Stderr = streams.Err | ||
if err := uninstall.Start(); err != nil { | ||
return fmt.Errorf("unable to start elastic-agent uninstall: %w", err) | ||
} | ||
if err := uninstall.Wait(); err != nil { | ||
return fmt.Errorf("failed to uninstall elastic-agent: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should move this to its own function. Then cleanup the errors to be more generic in the function and then wrap the error output like so:
err := performUninstall(...)
if err != nil {
return fmt.Errorf("failed to uninstall current Elastic Agent: %w", err)
}
That ensures that the error back to the user is clear that its failing installing the currently installed Elastic Agent.
I wonder if this wouldn't leave a security flaw but I agree that having an escape hatch in such case would be nice anyway. |
The old way would be leaving an orphaned endpoint process behind that cannot get configuration updates from agent. On Linux for example this is the same result you get if you run What we should do is make it possible to run uninstall and ignore every error except tamper protection errors. Most of our recent pain has been around removing files on Windows with the dreaded "access is denied" error. Ignoring something like that is fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just add hide the extra flag, as discuss in the weekly.
buildkite test this |
@michel-laterman SonarQube is blocking your PR, but everything else looks good. You are going to need to get @ycombinator, @pierrehilbert, or @cmacknz to perform the merge because of it. |
Quality Gate failedFailed conditions |
Acknowledged. Will merge once Buildkite statuses are green. |
* Add explicit check for token and tamper protection in Uninstall func * fix typo * Load features from config, fix protection flag load * Change approach to execute elastic-agent uninstall command Change the approach that is taken when "elastic-agent install -f" is ran to use an exec to run "elastic-agent uninstall -f" in cases where the agent is installed. this allows the process that runs the uninstall to use all the correct path values for the installed agent instead of the values associated with the binary that the install command is ran from. * Add e2e test * lookup agent binary on path, fix test * fix test * Add flag that preserves old approach * fix typo * change args format in test * Use same fixture * Hide new option --------- Co-authored-by: Julien Lind <[email protected]> (cherry picked from commit 1d7b865) # Conflicts: # internal/pkg/agent/cmd/install.go # testing/integration/endpoint_security_test.go
…g agent (#5021) * "install -f" uses exec to uninstall an existing agent (#4965) * Add explicit check for token and tamper protection in Uninstall func * fix typo * Load features from config, fix protection flag load * Change approach to execute elastic-agent uninstall command Change the approach that is taken when "elastic-agent install -f" is ran to use an exec to run "elastic-agent uninstall -f" in cases where the agent is installed. this allows the process that runs the uninstall to use all the correct path values for the installed agent instead of the values associated with the binary that the install command is ran from. * Add e2e test * lookup agent binary on path, fix test * fix test * Add flag that preserves old approach * fix typo * change args format in test * Use same fixture * Hide new option --------- Co-authored-by: Julien Lind <[email protected]> (cherry picked from commit 1d7b865) # Conflicts: # internal/pkg/agent/cmd/install.go # testing/integration/endpoint_security_test.go * cleanup --------- Co-authored-by: Michel Laterman <[email protected]> Co-authored-by: michel-laterman <[email protected]>
What does this PR do?
When
elastic-agent install -f
is ran, and an agent install has been detected, execelastic-agent uninstall -f
instead of callingUninstall
from within theInstall
func.This causes the uninstall process to use paths associated with the installed agent binary instead of paths associated with the agent running the install command.
Why is it important?
If an agent with tamper protection (with the defend integration) is installed, the defend integration can be orphaned by using the a
--force/-f
flag with the install command.Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added tests that prove my fix is effective or that my feature works./changelog/fragments
using the changelog toolDisruptive User Impact
install -f
by default will use the existing system's agent to perform the uninstall. Using the system's agent to uninstall can fail if we introduce regression into the uninstall process in a future release.The
--run-uninstall-from-binary
flag has been provided to restore old behaviour, but is hidden as it's considered an advanced option that can orphan the endpoint component.How to test this PR locally
Enroll the agent with a policy that contains the defend integration and has tamper protection enabled.
Attempt to re-install (using the
-f
flag) into another policy.With the changes the output is:
Related issues
install -f
Leaves Orphaned Tamper-Protected Endpoint #4506