Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fleet Agent Supports Node Failover [SURE-9419] #3096

Open
manno opened this issue Nov 25, 2024 · 0 comments
Open

Fleet Agent Supports Node Failover [SURE-9419] #3096

manno opened this issue Nov 25, 2024 · 0 comments
Labels
Milestone

Comments

@manno
Copy link
Member

manno commented Nov 25, 2024

Fleet-agent is deployed as statefulset with a single replica. In this case there is NOT automatic fail over in case the node hosting that pod fails. (this is per design in statefulsets). An administrator has to delete the pod manually to get fleet back running. This is inacceptable because it is reactive and not automatic.

We have to deploy stateful-sets with a replica >1 to have fault tolerance or we have to use deployments.
A deployment with replica count 1 can take a long time to migrate to another node. We should make the replica count configurable.

Business impact: High as it causes downtime

Repro steps:

  • Deploy fleet-agent
  • Poweroff the node hosting the fleet agent

Acceptance Criteria

@manno manno added this to Fleet Nov 25, 2024
@manno manno converted this from a draft issue Nov 25, 2024
@manno manno added this to the v2.11.0 milestone Nov 25, 2024
@manno manno moved this from To Triage to 📋 Backlog in Fleet Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant