Skip to content

Commit

Permalink
Merge branch 'main' into opensearch-updates
Browse files Browse the repository at this point in the history
  • Loading branch information
sj-williams authored Nov 27, 2024
2 parents 582c54e + 39bbe93 commit 04f91f5
Show file tree
Hide file tree
Showing 15 changed files with 119 additions and 104 deletions.
2 changes: 1 addition & 1 deletion runbooks/source/add-new-opa-policy.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Add a new OPA policy
weight: 9000
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
2 changes: 1 addition & 1 deletion runbooks/source/auth0-rotation.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Credentials rotation for auth0 apps
weight: 68
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
2 changes: 1 addition & 1 deletion runbooks/source/bastion-node.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Create and access bastion node
weight: 97
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
146 changes: 75 additions & 71 deletions runbooks/source/container-images.html.md.erb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion runbooks/source/delete-prometheus-metrics.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Delete Prometheus Metrics
weight: 170
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
2 changes: 1 addition & 1 deletion runbooks/source/delete-state-lock.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Delete terraform state lock
weight: 199
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
4 changes: 2 additions & 2 deletions runbooks/source/disaster-recovery-scenarios.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Cloud Platform Disaster Recovery Scenarios
weight: 91
last_reviewed_on: 2024-05-20
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down Expand Up @@ -152,7 +152,7 @@ This way of restoring the whole cluster have been tested with below procedure
Any namespaces over 3 hours old can be recovered using Velero (newer namespaces might not have been backed up before the incident occurred).

Create the cluster with the **same** name from the [source code](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/create-cluster.rb)
and provide the exisiting `vpc-name`. This will link the velero backup locations to the lost cluster.
and provide the existing `vpc-name`. This will link the velero backup locations to the lost cluster.

Find the name of the most recent backup of the `allnamespacebackup` schedule:

Expand Down
2 changes: 1 addition & 1 deletion runbooks/source/export-elasticsearch-to-csv.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Export data from AWS Elasticsearch into a CSV file
weight: 190
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Locate the PR number for the namespace deletion PR, and execute the following co

```bash
cloud-platform environment destroy \
--prNumber [namespace-deletion-PR] \
--pr-number [namespace-deletion-PR] \
--cluster arn:aws:eks:eu-west-2:754256621582:cluster/live \
--kubecfg ~/.kube/config \
--clusterdir live.cloud-platform.service.justice.gov.uk \
Expand Down
2 changes: 1 addition & 1 deletion runbooks/source/resolve-opensearch-no-logs.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Resolving no logs in modsec OpenSearch
weight: 190
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down
4 changes: 2 additions & 2 deletions runbooks/source/resolve-opensearch-shard-issues.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Resolving OpenSearch shard problems
weight: 190
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down Expand Up @@ -52,7 +52,7 @@ kubectl run curl-pod -n <your-namespace> --image="alpine/curl" --restart=Never -

## Connecting to the OpenSearch api

Because we have fine-grained access enabled on OpenSearch connection isn't based on ip. It's based on SAML. To link your cli with OpenSearch there is a manual step of adding your aws user arn to the `all_access` OpenSearh role.
Because we have fine-grained access enabled on OpenSearch connection isn't based on ip. It's based on SAML. To link your cli with OpenSearch there is a manual step of adding your aws user arn to the `all_access` OpenSearch role.

1. login to the OpenSearch dashboard using github via saml
1. as a webops team member you have permissions to edit roles so head to Security -> Roles -> `all_access` (see screenshot below)
Expand Down
28 changes: 14 additions & 14 deletions runbooks/source/upgrade-eks-addons.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ aws eks describe-addon-versions --kubernetes-version=$K8S_VERSION | jq '.addons[

this will pull out the default compatible value for the k8s version for your addon.

You can use the helper script to get the most up-to-date available addon versions for each kubernetes cluster version [script here](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/scripts/addons-upgrade.bash)

## Preparing for upgrade

Check the changelog for each of the addons and determine if there are any breaking changes.
Expand All @@ -53,26 +55,24 @@ Create a thread in #cloud-platform notifying the team that upgrades are starting

## Starting the upgrade

1. Bump the version number in cloud-platform-terraform-eks-add-ons
2. Commit changes on a new branch and create a pull request
3. Request review from someone on the team
4. Merge pull request and create a new release through the Github UI
5. Bump the version number of the cloud-platform-terraform-eks-add-ons in cloud-platform-infrastructure
6. Commit changes on a new branch and create a pull request
7. Request review from someone on the team
8. Check the terraform plan in concourse and pause the following pipelines:
1. Run the helper [script](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/scripts/addons-upgrade.bash) before
2. Bump the version of the addon
3. Commit changes on a new branch and create a pull request
4. Request review from someone on the team
5. Check the terraform plan in concourse and pause the following pipelines:
* bootstrap
* infrastructure-live
* infrastructure-manager
* infrastructure-live-2
9. Create an output of the configuration of a pod before the upgrade. `kubectl -n kube-system get pod $addon -oyaml` there is also a helper [script](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/scripts/addons-upgrade.bash).
10. Merge the pull request
11. Unpause an infrastructure pipeline and wait for it to complete
12. While running:
6. Create an output of the configuration of a pod before the upgrade. `kubectl -n kube-system get pod $addon -oyaml` there is also a helper .
7. Merge the pull request
8 Unpause an infrastructure pipeline and wait for it to complete
9. While running:
* Keep an eye on pods recycling `watch -n 1 "kubectl -n kube-system get pods"`
* Keep an eye on events `watch -n 1 "kubectl -n kube-system get events"`
13. Run the reporting pipeline on the infrastructure environment
14. If everything is green repeat steps 11-14 on each environment.
10. Run the helper [script](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/scripts/addons-upgrade.bash) after
11. Run the reporting pipeline on the infrastructure environment
12. If everything is green repeat steps 8 - 11 on each environment.

## Finish the upgrade

Expand Down
15 changes: 13 additions & 2 deletions runbooks/source/upgrade-eks-cluster.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ Pause the following pipelines:
* infrastructure-live-2
* infrastructure-manager

> **IMPORTANT:** Add a Pull Request to pause the Dependabot action in the infrastructure repository before pausing as you do not want any changes going through concourse after unpausing the pipeline.

Update `cluster.tf` in `cloud-platform-infrastructure` with the version of Kubernetes you are upgrading to.

Run a `tf plan` against the cluster your upgrading to check to see if everything is expected, the only changes should be to resources relating to the the version upgrade.
Expand Down Expand Up @@ -106,7 +108,12 @@ As with preparing for the upgrade communication is really important, keep the th

#### Increasing coredns pods

To ensure that coredns stays up and running during the cluster upgrade replications should be scaled up to 10.
To ensure that coredns stays up and running during the cluster upgrade replications should be scaled up to 10. This can be done with the following command:

```bash
kubectl scale deployment coredns --replicas=10 -n kube-system
```
> **NOTE:** This is a temporary measure, double check the deployment for the current replicaset, as you will need this for when you scale back after the completion of the upgrade.

#### Upgrading the control pane

Expand All @@ -130,7 +137,7 @@ Click `Update`

From the cluster control panel select `Compute` tab.

Select `Upgrade now` next to the monitoring node group.
Select `Upgrade now` next to the default node group.

For update strategy select "Force update"

Expand All @@ -154,6 +161,10 @@ Unpause the bootstrap pipeline.

Scale down the coredns pods.

```bash
kubectl scale deployment coredns --replicas=3 -n kube-system
```

### Finishing touches

The `kubectl` version in the `cloud-platform-cli` and `cloud-platform-tools-image` needs updating to match the current Kubernetes version.
Expand Down
8 changes: 4 additions & 4 deletions runbooks/source/upgrade-terraform-version.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Upgrade Terraform Version
weight: 54
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down Expand Up @@ -126,10 +126,10 @@ Here is a snapshot of how our directory looks but this is likely to change:
aws-accounts
├── cloud-platform-aws
│ ├── account # AWS Account specific configuration.
│ └── vpc # VPC creation. Workspaces for individual clusters
│ └── vpc # VPC creation. Workspaces for individual clusters
│ ├── eks # Holding EKS, workspaces for individual clusters.
│ │ └── components # EKS components. Workspaces for individual clusters
└── kops # Holding KOPS, workspaces for individual clusters.
│ │ └── core # EKS core. Workspaces for individual clusters
| └── components # EKS components.
├── cloud-platform-dsd
│ └── main.tf
├── cloud-platform-ephemeral-test
Expand Down
2 changes: 1 addition & 1 deletion runbooks/source/upgrade-user-components.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Upgrade user components
weight: 55
last_reviewed_on: 2024-05-24
last_reviewed_on: 2024-11-25
review_in: 6 months
---

Expand Down

0 comments on commit 04f91f5

Please sign in to comment.