Upgrade openscapes AWS EKS k8s version 1.21 to 1.24 #2125

consideRatio · 2023-02-01T16:48:17Z

The oldest k8s version we use is now in openscapes, with k8s 1.21. Let's get it upgraded so that the oldest version becomes 1.22.

This issue was branched off from #2057.

I most recently upgraded an EKS cluster in #2085, and will use notes from there and adjust them as I go and iterate on them once again.

  # For reference, this is the steps I took when upgrading carbonplan from k8s 1.19 to k8s
  # 1.24, Jan 24th 2023.
  #
  # 1. Updated the version field in this config from 1.19 to 1.20
  #
  #    - It is not allowed to upgrade the control plane more than one minor at the time
  #
  # 2. Upgraded the control plane (takes ~10 minutes)
  #
  #    - I ran into permission errors, so I visited the AWS cloud console to
  #      create an access key for my user and set it up temporary environment
  #      variables.
  #
  #      export AWS_ACCESS_KEY_ID="..."
  #      export AWS_SECRET_ACCESS_KEY="..."
  #
  #    eksctl upgrade cluster --config-file eksctl-cluster-config.yaml --approve
  #
  # 3. Deleted all non-core nodegroups
  #
  #    - I had to add a --drain=false flag due to an error likely related to a
  #      very old EKS cluster.
  #
  #    - I used --include="nb-*,dask-*" because I saw that the core node pool
  #      was named "core-a", and the other nodes started with "nb-" or "dask-".
  #
  #    eksctl delete nodegroup --config-file=eksctl-cluster-config.yaml --include "nb-*,dask-*" --approve --drain=false
  #
  # 4. Updated the version field in this config from 1.20 to 1.22
  #
  #    - It is allowed to have a nodegroup +-2 minors away from the control plan version
  #
  # 5. Created a new core nodepool (core-b)
  #
  #    - I ran into "Unauthorized" errors and resolved them by first using the
  #      deployer to acquire credentials to modify a ConfigMap named "aws-auth"
  #      in the k8s namespace kube-system.
  #
  #      deployer use-cluster-credentials carbonplan
  #
  #      kubectl edit cm -n kube-system aws-auth
  #
  #    eksctl create nodegroup --config-file=eksctl-cluster-config.yaml --include "core-b" --install-nvidia-plugin=false
  #
  # 6. Deleted the old core nodepool (core-a)
  #
  #    - I first updated the eksctl config file to include a "core-a" entry,
  #      because I didn't really add a "core-b" previously, I just renamed the
  #      "core-a" to "core-b".
  #
  #    eksctl delete nodegroup --config-file=eksctl-cluster-config.yaml --include "core-a" --approve
  #
  # 7. Upgraded add-ons (takes ~3*5s)
  #
  #    eksctl utils update-kube-proxy --cluster=carbonplanhub --approve
  #    eksctl utils update-aws-node --cluster=carbonplanhub --approve
  #    kubectl patch daemonset -n kube-system aws-node --patch='{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"aws-node"}],"containers":[{"name":"aws-node","securityContext":{"allowPrivilegeEscalation":null,"runAsNonRoot":null}}]}}}}'
  #    eksctl utils update-coredns --cluster=carbonplanhub --approve
  #
  #    - I diagnosed two separate errors following this:
  #
  #      kubectl get pod -n kube-system
  #      kubectl describe pod -n kube-system aws-node-7rcsw
  #
  #      Warning  Failed     9s (x7 over 69s)  kubelet            Error: container has runAsNonRoot and image will run as root
  #
  #      - the aws-node daemonset's pods failed to start because of a too
  #        restrictive container securityContext not running as root.
  #
  #        aws-node issue: https://github.com/weaveworks/eksctl/issues/6048.
  #
  #        Resolved by removing `runAsNonRoot: true` and
  #        `allowPrivilegeEscalation: false`. Using --output-patch=true led me
  #        to a `kubectl patch` command to use.
  #
  #        kubectl edit ds -n kube-system aws-node --output-patch=true
  #
  #      - the kube-proxy deamonset's pods failed to pull the image, it was not
  #        found.
  #
  #        This didn't need to be resolved mid way through upgrades, and was an
  #        issue that went away in k8s 1.23.
  #
  # 8. Update the version field in this config from 1.22 to 1.21
  #
  # 9. Upgraded the control plane, as in step 2.
  #
  # A. Upgraded add-ons, as in step 7.
  #
  # B. Update the version field in this config from 1.21 to 1.22
  #
  # C. Upgraded the control plane, as in step 2.
  #
  # D. Upgraded add-ons, as in step 7.
  #
  # E. I refreshed the ekscluster config's .jsonnet file based on
  #    template.jsonnet which has been updated to declare a addon related to ebs
  #    storage. In practice, this was probably not used later by subsequent
  #    commands I realize. It feels good to have it in the ekscluster config
  #    though to reflect adding it manually.
  #
  #    addons: [
  #        {
  #            // aws-ebs-csi-driver ensures that our PVCs are bound to PVs that
  #            // couple to AWS EBS based storage, without it expect to see pods
  #            // mounting a PVC failing to schedule and PVC resources that are
  #            // unbound.
  #            //
  #            // Related docs: https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html
  #            //
  #            name: 'aws-ebs-csi-driver',
  #            wellKnownPolicies: {
  #                ebsCSIController: true,
  #            },
  #        },
  #    ],
  #
  #    eksctl create iamserviceaccount \
  #             --name=ebs-csi-controller-sa \
  #             --namespace=kube-system \
  #             --cluster=carbonplanhub \
  #             --attach-policy-arn=arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  #             --approve \
  #             --role-only \
  #             --role-name=AmazonEKS_EBS_CSI_DriverRole
  #    
  #    eksctl create addon --name=aws-ebs-csi-driver --cluster=carbonplanhub --service-account-role-arn=arn:aws:iam::631969445205:role/AmazonEKS_EBS_CSI_DriverRole --force
  #
  # F. Update the version field in this config from 1.22 to 1.23
  #
  # G. Upgraded the control plane, as in step 2.
  #
  # H. Upgraded add-ons, as in step 7.
  #
  # I. Update the version field in this config from 1.23 to 1.24
  #
  # J. Upgraded the control plane, as in step 2.
  #
  # K. Upgraded add-ons, as in step 7.
  #
  # L. I created a new core node pool and deleted the old, as in step 5-6.
  #
  #    eksctl create nodegroup --config-file=eksctl-cluster-config.yaml --include "core-a" --install-nvidia-plugin=false
  #    eksctl delete nodegroup --config-file=eksctl-cluster-config.yaml --include "core-b" --approve
  #
  # M. I recreated all other nodegroups.
  #
  #    eksctl create nodegroup --config-file=eksctl-cluster-config.yaml --include "nb-*,dask-*" --install-nvidia-plugin=false
  #

The text was updated successfully, but these errors were encountered:

consideRatio added this to DEPRECATED Engineering and Product Backlog and Sprint Board Feb 1, 2023

consideRatio self-assigned this Feb 1, 2023

damianavila moved this to Todo 👍 in Sprint Board Feb 1, 2023

consideRatio mentioned this issue Feb 5, 2023

openscapes: update EKS cluster config templates from k8s 1.21 to 1.24 #2139

Merged

consideRatio closed this as completed in #2139 Feb 6, 2023

github-project-automation bot moved this from Todo 👍 to Done 🎉 in Sprint Board Feb 6, 2023

github-project-automation bot moved this to Complete in DEPRECATED Engineering and Product Backlog Feb 6, 2023

damianavila mentioned this issue Apr 4, 2023

Update for 2023 Q1 2i2c-org/team-compass#685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade openscapes AWS EKS k8s version 1.21 to 1.24 #2125

Upgrade openscapes AWS EKS k8s version 1.21 to 1.24 #2125

consideRatio commented Feb 1, 2023 •

edited

Loading

Upgrade openscapes AWS EKS k8s version 1.21 to 1.24 #2125

Upgrade openscapes AWS EKS k8s version 1.21 to 1.24 #2125

Comments

consideRatio commented Feb 1, 2023 • edited Loading

consideRatio commented Feb 1, 2023 •

edited

Loading