Skip to content

Commit

Permalink
stash
Browse files Browse the repository at this point in the history
  • Loading branch information
consideRatio committed May 7, 2024
1 parent 2f5fa74 commit e7caaac
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 103 deletions.
93 changes: 18 additions & 75 deletions docs/howto/upgrade-cluster/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,73 +3,19 @@
# Upgrade Kubernetes cluster on AWS

```{warning}
This upgrade will cause disruptions for users and could trigger alerts for
[](uptime-checks). To help other engineers, communicate that your are starting a
cluster upgrade in the `#maintenance-notices` Slack channel.
```

```{warning}
We haven't yet established a policy for planning and communicating maintenance
procedures to users. So preliminary, only make a k8s cluster upgrade while the
cluster is unused or that the maintenance is communicated ahead of time.
Before proceeding, communicate that your are starting a cluster upgrade in the
`#maintenance-notices` Slack channel.
```

## Pre-requisites

1. *Install or upgrade CLI tools*

Install required tools as documented in [](new-cluster:prerequisites),
and ensure you have a recent version of eksctl.

```{warning}
Using a modern version of `eksctl` has been found important historically, make
sure to use the latest version to avoid debugging an already fixed bug!
```

2. *Consider changes to `template.jsonnet`*

The eksctl config jinja2 template `eksctl/template.jsonnet` was once used to
generate the jsonnet template `eksctl/$CLUSTER_NAME.jsonnet`, that has been
used to generate an actual eksctl config.
Install required tools as documented in [](new-cluster:aws-required-tools),
and ensure you have the latest version of `eksctl`. Without it you may be
unable to use a modern versions of k8s.

Before upgrading an EKS cluster, it could be a good time to consider changes
to `eksctl/template.jsonnet` since this cluster's jsonnet template was last
generated, which it was initially according to
[](new-cluster:generate-cluster-files).

To do this first ensure `git status` reports no changes, then generate new
cluster files using the deployer script, then restore changes to everything
but the `eksctl/$CLUSTER_NAME.jsonnet` file.

```bash
export CLUSTER_NAME=<cluster-name>
export CLUSTER_REGION=<cluster-region-like ca-central-1>
export HUB_TYPE=<hub-type-like-basehub>
```

```bash
# only continue below if git status reports a clean state
git status

# generates a few new files
deployer generate dedicated-cluster aws --cluster-name=$CLUSTER_NAME --cluster-region=$CLUSTER_REGION --hub-type=$HUB_TYPE

# overview changed files
git status

# restore changes to all files but the .jsonnet files
git add *.jsonnet
git checkout .. # .. should be the git repo's root
git reset

# inspect changes
git diff
```

Finally if you identify changes you think should be retained, add and commit
them. Discard the remaining changes with a `git checkout .` command.

3. *Learn how to generate an `eksctl` config file*
2. *Learn/recall how to generate an `eksctl` config file*

When upgrading an EKS cluster, we will use `eksctl` extensively and reference
a generated config file, `$CLUSTER_NAME.eksctl.yaml`. It's generated from the
Expand All @@ -90,11 +36,15 @@ cluster is unused or that the maintenance is communicated ahead of time.

## Cluster upgrade

### 1. Ensure in-cluster permissions
### 1. Acquire and configure AWS credentials

Refer to [](cloud-access:aws) on how to do this.

### 2. Ensure in-cluster permissions

The k8s api-server won't accept commands from you unless you have configured
a mapping between the AWS user to a k8s user, and `eksctl` needs to make some
commands behind the scenes.
a mapping between the AWS user to a k8s user, as `eksctl` needs to make some
k8s commands as a k8s user.

This mapping is done from a ConfigMap in kube-system called `aws-auth`, and
we can use an `eksctl` command to influence it.
Expand All @@ -108,20 +58,13 @@ eksctl create iamidentitymapping \
--group=system:masters
```

### 2. Acquire and configure AWS credentials
Note that this doesn't make `kubectl` work against the k8s cluster, and if you
would use `deployer use-cluster-credentials` you may disrupt the AWS user
credentials.

Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials.
### 3. Open a new terminal window

In case the AWS account isn't managed there, inspect
`config/$CLUSTER_NAME/cluster.yaml` to understand what AWS account number to
login to at https://console.aws.amazon.com/.

Configure credentials like:

```bash
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
```

### 3. Upgrade the k8s control plane one minor version

Expand All @@ -138,7 +81,7 @@ where the version must be updated.
{
name: "openscapeshub",
region: clusterRegion,
version: '1.27'
version: "1.29",
}
```

Expand Down
11 changes: 2 additions & 9 deletions docs/howto/upgrade-cluster/gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,8 @@
# Upgrade Kubernetes cluster on GCP

```{warning}
This upgrade could cause disruptions for users and trigger alerts for
[](uptime-checks). To help other engineers, communicate that your are starting a
cluster upgrade in the `#maintenance-notices` Slack channel.
```

```{warning}
We haven't yet established a policy for planning and communicating maintenance
procedures to users. So preliminary, only make a k8s cluster upgrade while the
cluster is unused or that the maintenance is communicated ahead of time.
Before proceeding, communicate that your are starting a cluster upgrade in the
`#maintenance-notices` Slack channel.
```

## Pre-requisites
Expand Down
20 changes: 15 additions & 5 deletions docs/howto/upgrade-cluster/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,21 @@
How we upgrade a Kubernetes cluster is specific to the cloud provider. This
section covers topics in upgrading an existing Kubernetes cluster.

(upgrade-cluster:planning)=
## Upgrade planning

```{warning}
As of now, we also only have written documentation for how to upgrade Kubernetes
clusters on AWS.
We haven't yet established a policy for planning and communicating maintenance
procedures to community champions and users.
Up until now we have made some k8s cluster upgrades opportunistically,
especially for clusters that has showed little to no activity during some
periods. Other cluster upgrades has been scheduled with community champions, and
some in shared clusters has been announced ahead of time.
```

## Upgrade policy
(upgrade-cluster:ambition)=
## Upgrade ambition

1. To keep our k8s cluster's control plane and node pools upgraded to the latest
_three_ and _four_ [official minor k8s versions] respectively at all times.
Expand All @@ -35,7 +44,8 @@ clusters on AWS.
```{toctree}
:maxdepth: 1
:caption: Upgrading Kubernetes clusters
kinds-of-upgrade-disruptions.md
aws.md
upgrade-disruptions.md
gcp.md
aws.md
azure.md
```
Original file line number Diff line number Diff line change
@@ -1,21 +1,29 @@
(upgrade-cluster:disruptions)=

# Overview of different kinds of upgrade disruptions
# About upgrade disruptions

When we upgrade our Kubernetes clusters we can cause different kinds of
disruptions, this is an overview of the kinds of disruptions we should consider.
disruptions, this text provides an overview of them.

## Kubernetes api-server disruption

K8s clusters' control plane (api-server etc.) can be either highly available
(HA) or not. EKS clusters and "regional" GKE clusters are HA, but "zonal" GKE
clusters are not. Upgrading a HA Kubernetes cluster's control plane should by
itself not cause any disruptions.
(HA) or not. EKS clusters, AKS clusters, and "regional" GKE clusters are HA, but
"zonal" GKE clusters are not. A few of our GKE clusters are zonal still, but as
the cost savings are minimal we only create for regional clusters now.

If you upgrade a non-HA cluster's control plane, it typically takes less time,
only ~5 minutes. During this time user pods and JupyterHub remains accessible,
but JupyterHub won't be able to start new user servers, and user servers won't
be able to create or scale their dask-clusters.
If upgrading a zonal cluster, the single k8s api-server will be temporarily
unavailable, but that is not a big problem as user servers and JupyterHub will
remains accessible. The brief disruption is that JupyterHub won't be able to
start new user servers, and user servers won't be able to create or scale their
dask-clusters.

## Provider managed workload disruptions

When upgrading a cloud provider managed k8s cluster, it may upgrade some managed
workload part of the k8s cluster, such as calico that enforces NetworkPolicy
rules. Maybe this could cause a disruption for users, but its not currently know
to do so.

## Core node pool disruptions

Expand All @@ -40,10 +48,6 @@ will however have broken connections and user pods unable to establish new
directly if `kubectl delete` is used on this single pod, or `kubectl drain` is
used on the node.

As long as we have one replica, we should to minimize user disruptions do
`kubectl rollout restart -n support deploy/support-ingress-nginx-controller` to
migrate this pod from one node to another before `kubectl drain` is used.

### hub pod disruptions

Our JupyterHub installations each has a single `hub` pod, and having more isn't
Expand Down Expand Up @@ -71,4 +75,4 @@ of five minutes.

## User node pool disruptions

Disruptions to a user node pool will disrupt user server pods running on it. A user
Disruptions to a user node pool will disrupt user server pods running on it.

0 comments on commit e7caaac

Please sign in to comment.