Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ✏️ instance and node group changes #5090

Merged
merged 1 commit into from
Dec 13, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions runbooks/source/node-group-changes.html.md.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Handling Node Group and Instance Changes
weight: 54
last_reviewed_on: 2023-13-12
review_in: 6 months
---

# <%= current_page.data.title %>

## Why?

You may need to make a change to an EKS [cluster node group] or [instance type config]. We can't just let terraform apply these changes because terraform doesn't gracefully rollout the old and new nodes. Terraform will bring down all of the old nodes immediately, which will cause outages to users.

## How?

The method to avoid bringing down all the nodes at once is to follow these steps:

1. add a new node group with your [updated changes]
1. lookup the old node group name (you can find this in the aws gui)
1. once merged in you can drain the old node group using the following command:
> cloud-platform pipeline cordon-and-drain --cluster-name <cluster_name> --node-group <old_node_group_name>
1. raise a new [pr deleting] the old node group

notes:

- When making changes to the default node group in live, it's handy to pause the pipelines for each of our environments for the duration of the change.
- the `cloud-platform pipeline` command [cordons and drains nodes] in a given node group waiting 2mins between each drained node. This command runs remotely in concourse.

[cluster node group]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L60
[instance type config]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L43
[pr deleting]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2663
[updated changes]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2657
[cordons and drains nodes]: https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/cordon-and-drain-nodes.yaml
Loading