Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Upgrade v1.25 -> v1.29 #279

Merged
merged 6 commits into from
Aug 15, 2024
Merged

Conversation

ronardcaktus
Copy link
Member

@ronardcaktus ronardcaktus commented May 14, 2024

This PR serves as documentation/reference for the Kubernetes upgrade of the Philly Hip cluster. The sites on this cluster are all the staging and production.

Ingress Nginx

> helm -n ingress-nginx list                                    
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
ingress-nginx   ingress-nginx   7               2024-05-14 10:21:08.299385 -0400 EDT    deployed        ingress-nginx-4.9.1     1.9.6

Cert Manager

>  helm -n cert-manager list                                    
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
cert-manager    cert-manager    6               2024-05-13 14:39:06.888402 -0400 EDT    deployed        cert-manager-v1.14.3    v1.14.3

AWS Fluent Bit
Installing the updated version required running kubectl -n kube-system get secret | grep fluent to list secrets. Then kubectl -n kube-system delete secret sh.helm.release.v1.aws-for-fluent-bit.v1 to delete the old secret. Finally, running the playbook with the updated version.

 > helm -n kube-system list          
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION     
aws-for-fluent-bit      kube-system     1               2024-05-20 13:25:21.358038 -0400 EDT    deployed        aws-for-fluent-bit-0.1.32       2.31.12.20231011  

Cloudwatch Metrics

> helm -n amazon-cloudwatch list
NAME                    NAMESPACE               REVISION        UPDATED                                 STATUS          CHART                           APP VERSION   
aws-cloudwatch-metrics  amazon-cloudwatch       2               2024-05-14 10:30:13.814141 -0400 EDT    deployed        aws-cloudwatch-metrics-0.0.11   1.300032.2b361

Descheduler Version

name: descheduler
              app.kubernetes.io/instance: descheduler
              app.kubernetes.io/name: descheduler
            name: descheduler
              - /bin/descheduler
              image: registry.k8s.io/descheduler/descheduler:v0.29.0
              name: descheduler
            serviceAccount: descheduler
            serviceAccountName: descheduler

Cluster and pods version

Screenshot 2024-05-14 at 1 47 35 PM

Notes:

  • I could not update k8s_aws_fluent_bit_chart_version - It threw an error.

Closes

@copelco
Copy link
Member

copelco commented May 14, 2024

Task linked: CU-8687f7bgp Philly-hip

@ronardcaktus
Copy link
Member Author

I tried to updated k8s_aws_fluent_bit_chart_version from 0.1.18 to 0.1.32 but I got this error:

TASK [Add AWS for fluent bit helm chart (centralized logging)] *********************************************************************************************************************************
fatal: [production]: FAILED! => changed=false 
  command: /opt/homebrew/bin/helm --version=0.1.32 --repo=https://aws.github.io/eks-charts upgrade -i --reset-values --wait -f=/var/folders/fy/237g79n92sd9g4rnjh32ql8c0000gn/T/tmpcu3cxga1.yml aws-for-fluent-bit aws-for-fluent-bit
  msg: |-
    Failure when executing Helm command. Exited 1.
    stdout:
    stderr: Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "aws-for-fluent-bit" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
    ensure CRDs are installed first
  stderr: |-
    Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "aws-for-fluent-bit" namespace: "" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
[0;31m    ensure CRDs are installed first
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

The resource is in kube-system

 > helm -n kube-system list                                      direnv py python-3.10.9 node system kube philly-hip-stack-cluster 02:39:13 PM
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
aws-for-fluent-bit      kube-system     1               2022-06-29 10:55:18.9521 -0400 EDT      deployed        aws-for-fluent-bit-0.1.18       2.21.5     
descheduler             kube-system     5               2024-05-14 13:41:35.362494 -0400 EDT    deployed        descheduler-0.29.0              0.29.0 

So I thought about uninstalling it using helm and reinstalling the latest version. Thoughts?

Copy link
Member

@copelco copelco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Added a few comments.

So I thought about uninstalling it using helm and reinstalling the latest version. Thoughts?

Yes, this sounds good to me! 👍🏻

Also, were you able to get past the aws_profile issue?

requirements/dev/dev.in Show resolved Hide resolved
kubernetes==12.0.0
kubernetes-validate~=1.25.0
kubernetes-validate~=1.29.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following error running the deploy-cluster.yml playbook:

TASK [caktus.k8s-web-cluster : Grant access to IAM users if aws] *************************************************************************************************************
fatal: [production]: FAILED! => changed=false 
  msg: Failed to import the required Python library (kubernetes-validate) on rana.lan's Python /Users/copelco/projects/philly-hip/.direnv/python-3.10/bin/python3.10. Please read the module documentation and install it in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter

kubernetes-validate appears to be installed, but it does throw an exception when importing it:

> python                             
Python 3.10.11 (v3.10.11:7d4cc5aa85, Apr  4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import kubernetes_validate
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/copelco/projects/philly-hip/.direnv/python-3.10/lib/python3.10/site-packages/kubernetes_validate/__init__.py", line 1, in <module>
    from .utils import latest_version, validate, validate_file, validate_resource
  File "/Users/copelco/projects/philly-hip/.direnv/python-3.10/lib/python3.10/site-packages/kubernetes_validate/utils.py", line 14, in <module>
    from referencing import Registry, Resource
  File "/Users/copelco/projects/philly-hip/.direnv/python-3.10/lib/python3.10/site-packages/referencing/__init__.py", line 1, in <module>
    from referencing._core import (  # noqa: F401
  File "/Users/copelco/projects/philly-hip/.direnv/python-3.10/lib/python3.10/site-packages/referencing/_core.py", line 6, in <module>
    from attrs import evolve, field
ModuleNotFoundError: No module named 'attrs'

This is confusing to me, because pip freeze reports it's installed:

> pip freeze | grep attrs                                                                                      
attrs==20.3.0

Upgrading the dev requirements ended up working for me:

pip-compile --output-file=requirements/dev/dev.txt requirements/dev/dev.in --upgrade
make setup

But then the tests failed to run (maybe related to wagtail-factories?), so it may require additional investigation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the playbook works/runs for me.

@ronardcaktus
Copy link
Member Author

Looking good! Added a few comments.

So I thought about uninstalling it using helm and reinstalling the latest version. Thoughts?

Yes, this sounds good to me! 👍🏻

Also, were you able to get past the aws_profile issue?

I updated the version to 0.1.32.

@ronardcaktus
Copy link
Member Author

@copelco tests are passing now.

@ronardcaktus ronardcaktus requested a review from copelco May 21, 2024 13:56
@ronardcaktus ronardcaktus merged commit 42326d3 into develop Aug 15, 2024
1 check passed
@ronardcaktus ronardcaktus deleted the CU-8687f7bgp-ph-k8s-upgrade branch August 15, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants