From e5062069e469a0a80d06ec6b52b4caa19c5d0ea3 Mon Sep 17 00:00:00 2001 From: Mike Bell Date: Fri, 26 Jan 2024 10:28:21 +0000 Subject: [PATCH 1/5] Add creating a live-like cluster --- runbooks/source/creating-a-live-like.erb | 66 ++++++++++++++++++++++++ runbooks/source/eks-cluster.html.md.erb | 24 --------- 2 files changed, 66 insertions(+), 24 deletions(-) create mode 100644 runbooks/source/creating-a-live-like.erb diff --git a/runbooks/source/creating-a-live-like.erb b/runbooks/source/creating-a-live-like.erb new file mode 100644 index 00000000..9a7a222e --- /dev/null +++ b/runbooks/source/creating-a-live-like.erb @@ -0,0 +1,66 @@ +--- +title: Creating a live-like Cluster +weight: 350 +last_reviewed_on: 2024-01-16 +review_in: 3 months +--- + +# Creating a live-like cluster + +When testing cluster upgrades, it is useful to test the procedure which is as close to the live cluster as possible. The following steps will update an existing test cluster +to the configuration similar to the live cluster. + +## Pre-requisites + +- a test cluster created using the [cluster build pipeline] or manually + +## Setting Cluster size to match Live + +1. Set the node group desired size to 48 (check the live cluster for up-to-date number) in the AWS console under Compute +2. Set the node_groups_count to same as live cluster (64) and default_ng_min_count to 48 in [terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf] +3. Copy the node_size values from live to default, currently `["r6i.2xlarge", "r6i.xlarge", "r5.2xlarge"]` +4. Copy the monitoring_node_size values from live to default, currently `["r6i.8xlarge", "r5a.2xlarge"]` +5. Ensure that your Terraform workspace matches your cluster name +6. Run `terraform plan` and confirm that your changes are correct +7. Run `terraform apply` to apply the changes to your test cluster + +## Installing live components and test applications + +1. In [terraform/aws-accounts/cloud-platform-aws/vpc/eks/components] enable the following components: + * cluster_autoscaler + * large_nodegroup + * kibana_proxy + * ecr_exporter + * cloudwatch_exporter + * velero + +> To find components that are enabled in live but not in test you can search for `lookup(local.live_workspace, terraform.workspace, false)` in `components.tf`. + +2. Add the `starter_pack_count = 40` variable to the starter_pack module +3. Run `terraform plan` and confirm that your changes are correct +4. Run `terraform apply` to apply the changes to your test cluster +5. You may need to run `plan` and `apply` again as the starter pack addons don't like to be installed all at once + +## Upgrading a live-like test cluster + +See documentation for upgrading a [cluster](upgrade-eks-cluster.html). + +## Monitoring the upgrade + +* Setup pingdom alerts for starter-pack helloworld app +* @todo more + +## Final Tests + +1. Run `make run-tests` from the root cloud-platform repository +2. Update `cluster.tf` `cluster_version` to match version upgraded to +3. Run `terraform plan` to ensure there are no unexpected changes + +## Tearing down + +1. Run the delete cluster pipeline +2. Remove pingdom checks + +[cluster build pipeline]: https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/create-cluster +[terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf +[terraform/aws-accounts/cloud-platform-aws/vpc/eks/components]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components diff --git a/runbooks/source/eks-cluster.html.md.erb b/runbooks/source/eks-cluster.html.md.erb index 11ded9e4..bbd12fc0 100644 --- a/runbooks/source/eks-cluster.html.md.erb +++ b/runbooks/source/eks-cluster.html.md.erb @@ -154,27 +154,3 @@ terraform init terraform workspace new terraform apply ``` - -## Creating a live like test cluster - -When testing clusteer upgrades, it is useful to test the procedure which is as close to the live cluster as possible. The following steps will update an existing test cluster -to the configuration similar to the live cluster. - -**Pre-requisites:** - -- a test cluster created using the [cluster build pipeline] or manually -- The environment variables and pre-requisites as described [above](#pre-requisites) - -**Steps:** - -- Update the node group desired count to same as live cluster (say 50) in the console. The terraform way of applying doesnt work for desired count -- Set the node_groups_count to same as live cluster (say 64) and default_ng_min_count to 50 in [terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf] -- Apply the terraform code changes to the test cluster -- cd to [terraform/aws-accounts/cloud-platform-aws/vpc/eks/components] and enable ecr-exporter, cloudwatch_exporter, velero, overprovisioner and other components that are installed specific to live cluster -- Apply the terraform code changes to the test cluster -- Update the starter pack count to 40 and apply the terraform code changes to the test cluster -- Setup pingdom alerts for starter-pack helloworld app - -[cluster build pipeline]: https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/create-cluster -[terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf -[terraform/aws-accounts/cloud-platform-aws/vpc/eks/components]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components From e05d8bae7227ee9545a9e5ef3152203b00954321 Mon Sep 17 00:00:00 2001 From: Mike Bell Date: Fri, 26 Jan 2024 11:31:14 +0000 Subject: [PATCH 2/5] Rename file and add more to monitoring section --- ...-like.erb => creating-a-live-like.html.md.erb} | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) rename runbooks/source/{creating-a-live-like.erb => creating-a-live-like.html.md.erb} (80%) diff --git a/runbooks/source/creating-a-live-like.erb b/runbooks/source/creating-a-live-like.html.md.erb similarity index 80% rename from runbooks/source/creating-a-live-like.erb rename to runbooks/source/creating-a-live-like.html.md.erb index 9a7a222e..be8e57a0 100644 --- a/runbooks/source/creating-a-live-like.erb +++ b/runbooks/source/creating-a-live-like.html.md.erb @@ -47,8 +47,19 @@ See documentation for upgrading a [cluster](upgrade-eks-cluster.html). ## Monitoring the upgrade -* Setup pingdom alerts for starter-pack helloworld app -* @todo more +* Setup pingdom alerts for starter-pack helloworld and multi-container app + +> When nodes recycle it's possible that the multi-container app will break giving false positives. + +* Useful command liners + * `watch -n 1 "kubectl get events"` - get all Kubernetes events + * `watch -n 1 "kubectl get pods -A | grep ContainerStatusUnknown"` - get all containers in "ContainerStatusUnknown" state + * `watch -n 1 "kubectl get pods -A | grep Error"` - get all containers in "Error" state + * `watch -n 1 "kubectl get nodes --sort-by=\".metadata.creationTimestamp\""` - get all nodes and sort by create timestamp + +* Useful third party tools + * [k9s](https://k9scli.io/) + * [Stern](https://github.com/stern/stern) ## Final Tests From e0f9960e321ff1e0c2e8ca11739330021b58d8c9 Mon Sep 17 00:00:00 2001 From: Mike Bell Date: Fri, 26 Jan 2024 11:31:49 +0000 Subject: [PATCH 3/5] Update review date --- runbooks/source/creating-a-live-like.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/creating-a-live-like.html.md.erb b/runbooks/source/creating-a-live-like.html.md.erb index be8e57a0..765720f7 100644 --- a/runbooks/source/creating-a-live-like.html.md.erb +++ b/runbooks/source/creating-a-live-like.html.md.erb @@ -1,7 +1,7 @@ --- title: Creating a live-like Cluster weight: 350 -last_reviewed_on: 2024-01-16 +last_reviewed_on: 2024-01-26 review_in: 3 months --- From 3a3a42fc1adda7403fcaf242346970f6aba70820 Mon Sep 17 00:00:00 2001 From: Mike Bell Date: Fri, 26 Jan 2024 11:34:38 +0000 Subject: [PATCH 4/5] Fix capitalisation --- runbooks/source/creating-a-live-like.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/creating-a-live-like.html.md.erb b/runbooks/source/creating-a-live-like.html.md.erb index 765720f7..6122c71f 100644 --- a/runbooks/source/creating-a-live-like.html.md.erb +++ b/runbooks/source/creating-a-live-like.html.md.erb @@ -14,7 +14,7 @@ to the configuration similar to the live cluster. - a test cluster created using the [cluster build pipeline] or manually -## Setting Cluster size to match Live +## Setting cluster size to match Live 1. Set the node group desired size to 48 (check the live cluster for up-to-date number) in the AWS console under Compute 2. Set the node_groups_count to same as live cluster (64) and default_ng_min_count to 48 in [terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf] From 672b5ac0f08ffd9efa8c480dcf00d9bbbf3bf27f Mon Sep 17 00:00:00 2001 From: Mike Bell Date: Fri, 26 Jan 2024 11:42:30 +0000 Subject: [PATCH 5/5] Review in 6 months --- runbooks/source/creating-a-live-like.html.md.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runbooks/source/creating-a-live-like.html.md.erb b/runbooks/source/creating-a-live-like.html.md.erb index 6122c71f..f341f883 100644 --- a/runbooks/source/creating-a-live-like.html.md.erb +++ b/runbooks/source/creating-a-live-like.html.md.erb @@ -2,7 +2,7 @@ title: Creating a live-like Cluster weight: 350 last_reviewed_on: 2024-01-26 -review_in: 3 months +review_in: 6 months --- # Creating a live-like cluster