Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disruption budgets pertaining to node labels #1902

Open
tobbbles opened this issue Jan 6, 2025 · 4 comments
Open

Disruption budgets pertaining to node labels #1902

tobbbles opened this issue Jan 6, 2025 · 4 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@tobbbles
Copy link

tobbbles commented Jan 6, 2025

Description

What problem are you trying to solve?

When running distributed systems that are zone aware (e.g. Mimir) it would be beneficial if a single Karpenter nodepool could serve all zones, and have a disruption configured on various node labels (e.g. only disrupt at most 1 of the unique values under the topology.kubernetes.io/zone label key.) This would in turn disrupt all nodes in zone a according to other budgets, then zone b, then zone c.

How important is this feature to you?

This would improve Karpenter using Karpenter for more zone aware deployment models, and improve disruption budgets to account for architectural patterns.

I'm sure there are some other use cases this would be beneficial to serve, and would love to hear about them more in the comments.


  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@tobbbles tobbbles added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 6, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 6, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jonathan-innis
Copy link
Member

This is interesting -- so you are basically fine with Karpenter disrupting all of a zone at one time -- you just don't want multiple zones to be fully disrupted at the same time? I'm curious how you're configuring something like this today or if this is just an improvement on your existing configuration. Would you mind sharing your current configuration?

@tobbbles
Copy link
Author

Previously we have just leveraged a unique ASGs per zone , and manually performed maintenance per zone whilst rotating through them.

We haven't replicated something in Karpenter so far, we are just using very very strict disruption budgets (1 node across the entire pool that consists of all availability zones.)

For the specific cases of Mimir and Loki (which is what I have most experience dealing with) Grafana have created an operator to control the rollout/maintenance of regional sets, which may provide more information on a use case. https://github.com/grafana/rollout-operator

@sftim
Copy link

sftim commented Jan 14, 2025

If we do this, we should consider whether we could also manage this with a NodePool per zone, with each of those having an equal priority. Might need more code to handle the priorities, but I think that's easier to teach than making the NodePool API itself more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants