Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes #3131

Conversation

consideRatio
Copy link
Contributor

@consideRatio consideRatio commented Sep 13, 2023

This is the resolution to #2947, because the 2i2c cluster's core node pool got itself a balanced disk, and can therefore run the ingress nginx controller performant enough.

Since the cloudbank and 2i2c cluster was actively used, I transferred all the existing core node pool workloads to run in a temporary created node pool via the cloud console as an intermediate step.

k8s cluster upgrades

  • meom-ige from 1.26 -> 1.27
  • callysto from 1.25 -> 1.27
  • 2i2c-uk from 1.24 -> 1.27
  • m2lines from 1.25 -> 1.27
  • linked-earth from 1.27 -> 1.27

standard disk -> pd-balanced on core nodes

  • 2i2c
  • 2i2c-uk
  • callysto
  • cloudbank
  • meom-ige
  • m2lines

transitions to n2

  • 2i2c transitions from n1- to n2-
  • m2lines transitions from n1- to n2-
  • meom-ige transitions from n1- to n2-, and being a daskhub, also from -highmem-2 to -highmem-4 to ensure it can fit a prometheus-server consuming memory as a daskhub
  • linked-earth transitions from e2- to n2- (historically this was me testing it out)

@consideRatio consideRatio requested a review from a team as a code owner September 13, 2023 14:32
@consideRatio

This comment was marked as resolved.

@consideRatio

This comment was marked as resolved.

@consideRatio consideRatio force-pushed the pr/2i2c-pilot-hubs-update-core-node-pool branch from ac4f203 to a5e4262 Compare September 13, 2023 20:02
@consideRatio consideRatio changed the title 2i2c, terraform: update core node from n1- to n2-highmem-4 2i2c and meom-ige, terraform: update core nodes to n2-highmem-4 with pd-balanced disks Sep 13, 2023
Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd eventually like us to do a swipe through various clusters and look at resizing prometheus as well, now that things are less broken there. But no need to block that on this one, although I'd have preferred this PR to have just dealt with the 2i2c cluster.

@yuvipanda
Copy link
Member

Ah, I see perhaps that this resizing is in response to the oscillating pagerduty alerts? Is a bit unclear to me, but ok to try if that is the case.

@consideRatio
Copy link
Contributor Author

Ah, I see perhaps that this resizing is in response to the oscillating pagerduty alerts? Is a bit unclear to me, but ok to try if that is the case.

Yes! It was apparently very broken, with a user pod stuck with DNS issues to mount NFS for 36 hours for example.

This will force a recreation of core nodes, but not having this has
turned out to break the pilot-hubs cluster and meom-ige, so we really
need to do this if there is a project without it already.
@consideRatio consideRatio force-pushed the pr/2i2c-pilot-hubs-update-core-node-pool branch from a5e4262 to de5d712 Compare September 13, 2023 22:29
@consideRatio consideRatio changed the title 2i2c and meom-ige, terraform: update core nodes to n2-highmem-4 with pd-balanced disks gcp: k8s version updates, transitions to pd-balanced disks Sep 13, 2023
@consideRatio consideRatio force-pushed the pr/2i2c-pilot-hubs-update-core-node-pool branch from de5d712 to 991d121 Compare September 13, 2023 22:46
@consideRatio consideRatio changed the title gcp: k8s version updates, transitions to pd-balanced disks gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes Sep 13, 2023
@github-actions
Copy link

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
gcp linked-earth No Yes Following helm chart values files were modified: common.values.yaml

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp linked-earth prod Following helm chart values files were modified: common.values.yaml

@consideRatio consideRatio merged commit 31ba2d8 into 2i2c-org:master Sep 13, 2023
8 checks passed
@github-actions
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/6179039712

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

2 participants