From 411c925c2b626f947cc0931e2d9375be814d0a22 Mon Sep 17 00:00:00 2001 From: jaskaransarkaria Date: Thu, 14 Dec 2023 14:37:58 +0000 Subject: [PATCH 1/2] =?UTF-8?q?docs:=20=E2=9C=8F=EF=B8=8F=20update=20node?= =?UTF-8?q?=20group?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- runbooks/source/node-group-changes.html.md.erb | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/runbooks/source/node-group-changes.html.md.erb b/runbooks/source/node-group-changes.html.md.erb index 946eb9f1..c3534664 100644 --- a/runbooks/source/node-group-changes.html.md.erb +++ b/runbooks/source/node-group-changes.html.md.erb @@ -13,22 +13,26 @@ You may need to make a change to an EKS [cluster node group] or [instance type c ## How? -The method to avoid bringing down all the nodes at once is to follow these steps: +To avoid bringing down all the nodes at once is to follow these steps: 1. add a new node group with your [updated changes] 1. lookup the old node group name (you can find this in the aws gui) 1. once merged in you can drain the old node group using the command below: 1. raise a new [pr deleting] the old node group -> ```cloud-platform pipeline cordon-and-drain --cluster-name --node-group ``` +> cloud-platform pipeline cordon-and-drain --cluster-name --node-group -notes: +[script source] because this command runs remotely in concourse you can't use this command to drain default ng on the manager cluster. + +### Notes: - When making changes to the default node group in live, it's handy to pause the pipelines for each of our environments for the duration of the change. -- the `cloud-platform pipeline` command [cordons-and-drains-nodes] in a given node group waiting 2mins between each drained node. This command runs remotely in concourse. +- The `cloud-platform pipeline` command [cordons-and-drains-nodes] in a given node group waiting 4mins between each drained node. +- If you can avoid it try not to fiddle around with the target node group in the aws console for example reducing the desired nodes, aws deletes nodes in an unpredictable way which might cause the pipeline command to fail. Although it is possible if you need to. [cluster node group]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L60 [instance type config]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L43 [pr deleting]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2663 [updated changes]: https://github.com/ministryofjustice/cloud-platform-infrastructure/pull/2657 [cordons and drains nodes]: https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/main/pipelines/manager/main/cordon-and-drain-nodes.yaml +[script source]: https://github.com/ministryofjustice/cloud-platform-terraform-concourse/blob/7851f741e6c180ed868a97d51cec0cf1e109de8d/pipelines/manager/main/cordon-and-drain-nodes.yaml#L50 From a90edf13e4665db82dd0d6fb3a7557b06a9868b8 Mon Sep 17 00:00:00 2001 From: Jaskaran Sarkaria Date: Thu, 14 Dec 2023 14:54:52 +0000 Subject: [PATCH 2/2] Update runbooks/source/node-group-changes.html.md.erb Co-authored-by: Steve Williams <105657964+sj-williams@users.noreply.github.com> --- runbooks/source/node-group-changes.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/node-group-changes.html.md.erb b/runbooks/source/node-group-changes.html.md.erb index c3534664..95434f3f 100644 --- a/runbooks/source/node-group-changes.html.md.erb +++ b/runbooks/source/node-group-changes.html.md.erb @@ -27,7 +27,7 @@ To avoid bringing down all the nodes at once is to follow these steps: ### Notes: - When making changes to the default node group in live, it's handy to pause the pipelines for each of our environments for the duration of the change. -- The `cloud-platform pipeline` command [cordons-and-drains-nodes] in a given node group waiting 4mins between each drained node. +- The `cloud-platform pipeline` command [cordons-and-drains-nodes] in a given node group waiting 5mins between each drained node. - If you can avoid it try not to fiddle around with the target node group in the aws console for example reducing the desired nodes, aws deletes nodes in an unpredictable way which might cause the pipeline command to fail. Although it is possible if you need to. [cluster node group]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/97768bfd8b4e25df6f415035acac60cf531d88c1/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf#L60