Skip to content

Commit

Permalink
Update runbooks and bump review date
Browse files Browse the repository at this point in the history
  • Loading branch information
poornima-krishnasamy committed Sep 27, 2023
1 parent ca87944 commit 863e8ba
Show file tree
Hide file tree
Showing 12 changed files with 74 additions and 191 deletions.
4 changes: 2 additions & 2 deletions runbooks/source/add-new-receiver-alert-manager.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Add a new Alertmanager receiver and a slack webhook
weight: 85
last_reviewed_on: 2023-06-12
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# Add a new Alertmanager receiver and a slack webhook
Expand Down
87 changes: 25 additions & 62 deletions runbooks/source/add-nodes-to-the-eks-cluster.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
---
title: Add nodes/change the instance type of the AWS EKS cluster
title: Add nodes to the AWS EKS cluster
weight: 65
last_reviewed_on: 2023-05-22
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# Add nodes/change the instance type of the AWS EKS cluster
# Add nodes to the AWS EKS cluster

This runbook covers how to increase the number of nodes in an eks cluster and/or change the instance type (worker_node_machine_type)
This runbook covers how to increase the number of nodes in an eks cluster

This can address the problem of CPU high usage/load

## Add nodes to the eks cluster

### Requirements

#### 1. Ensure you have access to the Cloud Platform AWS account
Expand All @@ -30,79 +28,44 @@ Use
`git crypt unlock` to see the following code:

```
node_groups = {
default_ng = {
desired_capacity = var.cluster_node_count
max_capacity = 30
min_capacity = 1
subnets = data.aws_subnet_ids.private.ids

instance_type = var.worker_node_machine_type
k8s_labels = {
Terraform = "true"
Cluster = local.cluster_name
Domain = local.cluster_base_domain_name
}
additional_tags = {
default_ng = "true"
}
}
```

#### [Variable.tf](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/cloud-platform-eks/variables.tf)

```
variable "vpc_name" {
description = "The VPC name where the cluster(s) are going to be provisioned. VPCs are created in cloud-platform-network"
default = ""
node_groups_count = {
live = "64"
live-2 = "7"
manager = "4"
default = "3"
}

variable "cluster_node_count" {
description = "The number of worker node in the cluster"
default = "4"
}

variable "worker_node_machine_type" {
description = "The AWS EC2 instance types to use for worker nodes"
default = "m4.large"
# Default node group minimum capacity
default_ng_min_count = {
live = "45"
live-2 = "2"
manager = "4"
default = "2"
}
```
### Issue

There is an issue that you cannot update the default "cluster_node_count" (in isolation) with terraform
- unless you increase the default "worker_node_machine_type" too.<br>
The issue is to do with auto-scaling complexities utilising Terrafom - please see [here](https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md#notes)

Therefore you either have to update default "worker_node_machine_type" to - in above example "m4.xlarge" and also the default "cluster_node_count" to - in above example "5" or "6"

Or you have to edit the "Desired size" in the "AWS EKS dashboard Edit Node Group" (once you have carried out the AWS dashboard change - update the terraform config, `terraform apply` accordingly - so that it is in sync with the AWS dashboard):

#### [AWS dashboard EKS - Edit Node Group:](https://eu-west-2.console.aws.amazon.com/eks/home?region=eu-west-2#/clusters/manager/nodegroups/manager-default_ng-composed-sculpin/edit-nodegroup)
#### AWS dashboard EKS - Edit Node Group

```
Group size
Minimum size
Set the minimum number of nodes that the group can scale in to.
1
2
nodes
Maximum size
Set the maximum number of nodes that the group can scale out to.
30
85
nodes
Desired size
Set the desired number of nodes that the group should launch with initially.
4
3
nodes
```

## Change the AWS EKS instance type (worker_node_machine_type)

* update default "worker_node_machine_type" to - in above example "m4.xlarge"

* A 'terraform plan' will show that that it will replace the existing nodes
Modifying the node_groups_count in terraform will not update the desired size of the EKS cluster nor increase the actual node count. Its a design decision the
module has taken. Refer issue [#835](https://github.com/terraform-aws-modules/terraform-aws-eks/issues/835).

* `terraform apply' the changes in the usual way
To increase/decrease the desired node group count, we need to use the AWS dashboard. Login to the AWS dashboard and navigate to EKS -> Select Cluster -> Select Compute tab
Choose the Node Group you want to edit and Click Edit. Change the desired size and click Save Changes.

* monitor how the update is going in the [AWS Autoscaling dashboard:](https://eu-west-2.console.aws.amazon.com/ec2/autoscaling/home?region=eu-west-2#AutoScalingGroups:view=tags;filter=eks)
Watch the number of nodes using `kubectl get nodes`. You should see the new nodes getting created to match the desired size.

Note that it will create the instances/nodes before it deletes the existing - so there should be no down time
4 changes: 2 additions & 2 deletions runbooks/source/auth0-rotation.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Credentials rotation for auth0 apps
weight: 68
last_reviewed_on: 2023-06-12
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# <%= current_page.data.title %>
Expand Down
4 changes: 2 additions & 2 deletions runbooks/source/aws-create-user.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: AWS Console Access
weight: 115
last_reviewed_on: 2023-06-12
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# AWS Console Access
Expand Down
4 changes: 2 additions & 2 deletions runbooks/source/bastion-node.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Create and access bastion node
weight: 97
last_reviewed_on: 2023-06-12
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# Create and access bastion node.
Expand Down
8 changes: 4 additions & 4 deletions runbooks/source/delete-prometheus-metrics.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Delete Prometheus Metrics
weight: 170
last_reviewed_on: 2023-05-15
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# <%= current_page.data.title %>
Expand All @@ -19,8 +19,8 @@ More information in [this article](https://www.shellhacks.com/prometheus-delete-

* Filter the list and convert it to plain text

```
cat metrics-select-box.html | sed $'s/></>\\\n</g' | grep jenkins_node_ | sed 's/<.option>//' | sed 's/.*>//' > metrics
```shell
cat metrics-select-box.html | sed $'s/></>\n </g' | grep jenkins_node_ | sed 's/<.option>//' | sed 's/.*>//' > metrics
```

This example captures all metrics whose names include the string `jenkins_node_`
Expand Down
4 changes: 2 additions & 2 deletions runbooks/source/delete-state-lock.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Delete terraform state lock
weight: 199
last_reviewed_on: 2023-06-05
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# <%= current_page.data.title %>
Expand Down
4 changes: 2 additions & 2 deletions runbooks/source/export-elasticsearch-to-csv.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Export data from AWS Elasticsearch into a CSV file
weight: 190
last_reviewed_on: 2023-06-12
review_in: 3 months
last_reviewed_on: 2023-09-27
review_in: 6 months
---

# Export data from Elasticsearch into a CSV file
Expand Down
Loading

0 comments on commit 863e8ba

Please sign in to comment.