Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Upgrades not working as expected #932

Open
wkonitzer opened this issue Jan 21, 2025 · 1 comment · May be fixed by #1063
Open

[bug] Upgrades not working as expected #932

wkonitzer opened this issue Jan 21, 2025 · 1 comment · May be fixed by #1063
Assignees
Labels
bug Something isn't working

Comments

@wkonitzer
Copy link
Contributor

Describe the bug
After initiating the upgrade, only one node is updated. Expected outcome is all nodes are upgraded.

To Reproduce
Setup and run the demos from https://github.com/k0rdent/demos

Run steps for demo 1, and then steps for demo 2.

Output shows only one node upgraded

KUBECONFIG="kubeconfigs/k0rdent-aws-test1.kubeconfig" PATH=$PATH:./bin kubectl get node -w
NAME STATUS ROLES AGE VERSION
k0rdent-aws-test1-cp-0 Ready control-plane 13h v1.31.3+k0s
k0rdent-aws-test1-md-vrsp4-twb48 Ready 13h v1.31.2+k0s
k0rdent-aws-test1-md-vrsp4-z76tx Ready 13h v1.31.2+k0s

Expected behavior
I would expect all nodes to be running v1.31.3+k0s

Additional context
Seen across all providers, e.g. AWS, OpenStack, so must be an issue somewhere else, maybe k0smotron?

@wkonitzer wkonitzer added the bug Something isn't working label Jan 21, 2025
@github-project-automation github-project-automation bot moved this to Todo in k0rdent Jan 21, 2025
@eromanova eromanova self-assigned this Jan 23, 2025
@eromanova eromanova moved this from Todo to In Progress in k0rdent Feb 11, 2025
@eromanova
Copy link
Contributor

Confirmed with the k0smotron team that there is a related issue regarding status reporting: k0sproject/k0smotron#911

While trying to reproduce the issue on the AWS provider I discovered that the capi controller is failing on:

I0211 12:43:19.028375       1 machineset_controller.go:685] "MachineSet is scaling up to 1 replicas by creating 1 machines" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="kcm-system/aws-ekaz-md-mnl5k" namespace="kcm-system" name="aws-ekaz-md-mnl5k" reconcileID="25ead8b5-d3c7-4c0a-8847-9e95f7038330" Cluster="kcm-system/aws-ekaz" MachineDeployment="kcm-system/aws-ekaz-md" replicas=1 machineCount=0
I0211 12:43:19.029256       1 machineset_preflight.go:139] "Scale up on hold because K0sControlPlane kcm-system/aws-ekaz-cp is upgrading (\"ControlPlaneIsStable\" preflight check failed). The operation will continue after the preflight check(s) pass" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="kcm-system/aws-ekaz-md-mnl5k" namespace="kcm-system" name="aws-ekaz-md-mnl5k" reconcileID="25ead8b5-d3c7-4c0a-8847-9e95f7038330" Cluster="kcm-system/aws-ekaz" MachineDeployment="kcm-system/aws-ekaz-md"

We've recently faced similar preflight check errors on EKS deployments #907 and there is a workaround: add "machineset.cluster.x-k8s.io/skip-preflight-checks": "ControlPlaneIsStable" annotation to the MachineDeployment.

After I applied the w/a, the upgrade proceeded and was completed successfully. I'll check if this workaround applies to other providers and if we can fix it in the cluster templates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants