Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s core component validation #116

Merged
merged 14 commits into from
Dec 11, 2024

Conversation

ipetrov117
Copy link
Contributor

@ipetrov117 ipetrov117 commented Dec 10, 2024

This PR suggests an enhancement over the Kubernetes component version upgrade validation that we currently do.

A new optional field for KubernetesDistributions has been introduced in the release manifest called coreComponents. If specified this field indicates all components considered to be essential for a Kubernetes version upgrade.

With this new addition the upgrade-controller Kubernetes verification workflow will look like this:

  1. Wait for each control-plane, or worker node to be marked as Ready, Scheduable and with the correct K8s version.
  2. Go through each component in the coreComponents list (if it exists) and wait for the specific component resource type to become ready and be with the correct version.
  3. Mark K8s upgrade as completed.

Where will core components be retrieved from?

  • RKE2's coreComponents are all deployed through HelmChart resources and their versions can be viewed from the Chart Versions table of each RKE2 release. Example for v1.31.2 - https://github.com/rancher/rke2/releases/tag/v1.31.2%2Brke2r1. Alternatively the chart list can also be viewed from the repository here. Everything .yaml file from this directory will be deployed using the RKE2 auto-deploy manifests mechanism.
  • For K3s it is more complex as they deploy their coreComponents both through HelmCharts and Deployments. To get the full core component list that is used for a K3s release, we would need to look at the manifests directory of a specific K3s release. Example for 1.31.2 - https://github.com/k3s-io/k3s/tree/v1.31.2%2Bk3s1/manifests. Everything from this directory will be deployed through the K3s auto-deploy manifests mechanism.

How will the release manifest look like when coreComponents are defined?

A release manifest for K3s/RKE2 1.31.2 version, would look like this:

apiVersion: lifecycle.suse.com/v1alpha1
kind: ReleaseManifest
metadata:
  labels:
    app.kubernetes.io/name: upgrade-controller
    app.kubernetes.io/managed-by: kustomize
  name: release-manifest-3-2-0
  namespace: upgrade-controller-system
spec:
  releaseVersion: 3.2.0
  components:
    kubernetes:
      k3s:
        version: v1.31.2+k3s1
        coreComponents:
        - name: traefik-crd
          version: 27.0.201+up27.0.2
          type: HelmChart
        - name: traefik
          version: 27.0.201+up27.0.2
          type: HelmChart
        - name: local-path-provisioner
          containers:
          - name: local-path-provisioner
            image: rancher/local-path-provisioner:v0.0.30
          type: Deployment
        - name: coredns
          containers:
          - name: coredns
            image: rancher/mirrored-coredns-coredns:1.11.3
          type: Deployment
        - name: metrics-server
          containers:
          - name: metrics-server
            image: rancher/mirrored-metrics-server:v0.7.2
          type: Deployment
      rke2:
        version: v1.31.2+rke2r1
        coreComponents:
        - name: rke2-cilium
          version: 1.16.201
          type: HelmChart
        - name: rke2-canal
          version: v3.28.2-build2024101601
          type: HelmChart
        - name: rke2-calico-crd
          version: v3.28.200
          type: HelmChart
        - name: rke2-calico
          version: v3.28.200
          type: HelmChart
        - name: rke2-coredns
          version: 1.33.002
          type: HelmChart
        - name: rke2-ingress-nginx
          version: 4.10.501
          type: HelmChart
        - name: rke2-metrics-server
          version: 3.12.004
          type: HelmChart
        - name: rancher-vsphere-csi
          version: 3.3.1-rancher100
          type: HelmChart
        - name: rancher-vsphere-cpi
          version: 1.9.000
          type: HelmChart
        - name: harvester-cloud-provider
          version: 0.2.600
          type: HelmChart
        - name: harvester-csi-driver
          version: 0.1.2000
          type: HelmChart
        - name: rke2-snapshot-controller-crd
          version: 3.0.601
          type: HelmChart
        - name: rke2-snapshot-controller
          version: 3.0.601
          type: HelmChart
        - name: rke2-snapshot-validation-webhook
          version: 1.9.001
          type: HelmChart

How will the upgrade of K3s/RKE2 core components look like in the release manifest?

RKE2:

  1. Go to the RKE2 release page and find the desired release.
  2. Navigate to the Chart Versions table.
  3. Update the coreComponents for the RKE2 component with the versions from this table.

K3s:

  1. Go to the repository tag for the desired K3s release.
  2. Navigate to the manifests directory.
  3. Check the following HelmChart files for changes in their versions:
    • traefik.yaml
  4. Check the following Deployment files for changes in their image versions:
    • local-storage.yaml
    • coredns.yam
    • metrics-server/metrics-server-deployment.yaml

What does this PR fix?

By applying this fix, we ensure that we wait for all the core components related to a specific K8s distribution before marking its upgrade as complete. Doing this, we ensure that we do not start the Helm chart upgrade prematurely. We also avoid the problem of the Helm chart upgrade failing, due to it being unable to communicate with a core component that is currently being recreated.

closes: #109

Do all coreComponents need to be present on the machine?

No. RKE2/K3s support different use-cases, which require different core components. The upgrade-controller will only wait for the upgrade of coreComponents that are currently present on the cluster.

What use-cases have been tested?

  • K3s single cluster upgrade
  • K3s HA cluster upgrade
  • RKE2 single cluster upgrade
  • RKE2 HA cluster upgrade
  • No defined coreComponents upgrade

Copy link
Contributor

@atanasdinov atanasdinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A couple of questions and suggestions before we merge this. Let me know what you think.

internal/controller/reconcile_kubernetes.go Outdated Show resolved Hide resolved
internal/controller/reconcile_kubernetes.go Show resolved Hide resolved
internal/controller/reconcile_kubernetes.go Show resolved Hide resolved
internal/controller/reconcile_kubernetes.go Outdated Show resolved Hide resolved
internal/upgrade/container_test.go Outdated Show resolved Hide resolved
internal/upgrade/container_test.go Outdated Show resolved Hide resolved
@ipetrov117 ipetrov117 merged commit d8ae794 into suse-edge:main Dec 11, 2024
2 checks passed
atanasdinov added a commit that referenced this pull request Dec 11, 2024
…y, annotations usage (#117)

* Use CustomValidator interface for webhook implementation (#97)

* Use CustomValidator interface for webhook implementation

Signed-off-by: Atanas Dinov <[email protected]>

* Unify webhook register

Signed-off-by: Atanas Dinov <[email protected]>

---------

Signed-off-by: Atanas Dinov <[email protected]>

* Improve node matching (#98)

* Improve node matching during Kubernetes upgrades

Signed-off-by: Atanas Dinov <[email protected]>

* Improve node matching during OS upgrades

Signed-off-by: Atanas Dinov <[email protected]>

---------

Signed-off-by: Atanas Dinov <[email protected]>

* Move away from using annotations in favour of labels (#100)

* Move to label usage

* Provide better readibility for label addition

* Bring back release as annotation

* Bump helm.sh/helm/v3 from 3.15.4 to 3.16.2 (#99)

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.15.4 to 3.16.2.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](helm/helm@v3.15.4...v3.16.2)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Reduce concurrency for OS SUC workers (#107)

* Reduce concurrency for OS SUC workers

* Fix unit tests

* K8s core component validation (#116)

* Add core component definitions in release manifest

* Align helm chart CRD indentations with kubebuilder generated CRD

* Introduce new release manifest CRD changes

* make generate

* Introduce container comparison logic

* Update variable name to improve reusability

* Add deployment monitoring permissions to reconciler

* Introduce helm release comparison function

* Update function to parse K8s distribution

* Introduce wait mechanism for K8s core components

* Fix typos

* Don't fail on job not found errors

* Check deployment staus conditions

* Fix typos

---------

Signed-off-by: Atanas Dinov <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Ivo Petrov <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance K8s upgrade validation
2 participants