Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent should support a restart action #3367

Open
Tracked by #144585
jlind23 opened this issue Sep 7, 2023 · 6 comments
Open
Tracked by #144585

Elastic Agent should support a restart action #3367

jlind23 opened this issue Sep 7, 2023 · 6 comments
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@jlind23
Copy link
Contributor

jlind23 commented Sep 7, 2023

In order to give our users the ability to remotely restart an Agent, the Elastic Agent should support a restart action.
We have two path forward:
This should have the exact same behaviour as the elastic-agent restart command.

@jlind23 jlind23 transferred this issue from elastic/kibana Sep 7, 2023
@jlind23 jlind23 added the Team:Elastic-Agent Label for the Agent team label Sep 7, 2023
@jlind23
Copy link
Contributor Author

jlind23 commented Sep 7, 2023

@cmacknz @pierrehilbert I believe this is the first piece to deliver before working on elastic/kibana#144585

@cmacknz
Copy link
Member

cmacknz commented Sep 7, 2023

Yes this needs to come first.

@pierrehilbert
Copy link
Contributor

From my point of view, we "just" need to add a new handler in the same way we did it for remote diag or upgrades.
What is the priority for this new feature compare to the other we have?

@blakerouse
Copy link
Contributor

The hardest part of adding the action is how to handle the ACK. Do we ACK before we restart or after we restart. If we do it after then we need to handle that, if we do it before we need to ensure the ACK is received by Fleet Server before we perform the restart.

@cmacknz
Copy link
Member

cmacknz commented Sep 7, 2023

We should acknowledge only after the restart has completed. This is basically the same problem as acknowledging an upgrade, and the easiest solution to implement without edge cases will be to report the state in as part of the periodic check in rather than trying to guarantee delivery of a single acknowledgment across restarts.

There are probably a few ways to accomplish this, off the top of my head including any kind of monotonic number in the check in that resets on a restart is one way for Fleet to detect this without an explicit acknowledgement. A counter for the number of restarts performed, the accumulated time since the last restart / since the agent process started, etc.

We could also persist the acknowledgement to disk and just retry it infinitely until it gets through, but this will have a lot more edge cases to test.

@ycombinator ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 16, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

6 participants