stash

consideRatio · May 7, 2024 · e7caaac · e7caaac
1 parent 2f5fa74
commit e7caaac
Show file tree

Hide file tree

Showing 4 changed files with 53 additions and 103 deletions.
diff --git a/docs/howto/upgrade-cluster/aws.md b/docs/howto/upgrade-cluster/aws.md
@@ -3,73 +3,19 @@
 # Upgrade Kubernetes cluster on AWS
 
 ```{warning}
-This upgrade will cause disruptions for users and could trigger alerts for
-[](uptime-checks). To help other engineers, communicate that your are starting a
-cluster upgrade in the `#maintenance-notices` Slack channel.
-```
-
-```{warning}
-We haven't yet established a policy for planning and communicating maintenance
-procedures to users. So preliminary, only make a k8s cluster upgrade while the
-cluster is unused or that the maintenance is communicated ahead of time.
+Before proceeding, communicate that your are starting a cluster upgrade in the
+`#maintenance-notices` Slack channel.
 ```
 
 ## Pre-requisites
 
 1. *Install or upgrade CLI tools*
 
-   Install required tools as documented in [](new-cluster:prerequisites),
-   and ensure you have a recent version of eksctl.
-
-   ```{warning}
-   Using a modern version of `eksctl` has been found important historically, make
-   sure to use the latest version to avoid debugging an already fixed bug!
-   ```
-
-2. *Consider changes to `template.jsonnet`*
-
-   The eksctl config jinja2 template `eksctl/template.jsonnet` was once used to
-   generate the jsonnet template `eksctl/$CLUSTER_NAME.jsonnet`, that has been
-   used to generate an actual eksctl config.
+   Install required tools as documented in [](new-cluster:aws-required-tools),
+   and ensure you have the latest version of `eksctl`. Without it you may be
+   unable to use a modern versions of k8s.
 
-   Before upgrading an EKS cluster, it could be a good time to consider changes
-   to `eksctl/template.jsonnet` since this cluster's jsonnet template was last
-   generated, which it was initially according to
-   [](new-cluster:generate-cluster-files).
-
-   To do this first ensure `git status` reports no changes, then generate new
-   cluster files using the deployer script, then restore changes to everything
-   but the `eksctl/$CLUSTER_NAME.jsonnet` file.
-
-   ```bash
-   export CLUSTER_NAME=<cluster-name>
-   export CLUSTER_REGION=<cluster-region-like ca-central-1>
-   export HUB_TYPE=<hub-type-like-basehub>
-   ```
-
-   ```bash
-   # only continue below if git status reports a clean state
-   git status
-
-   # generates a few new files
-   deployer generate dedicated-cluster aws --cluster-name=$CLUSTER_NAME --cluster-region=$CLUSTER_REGION --hub-type=$HUB_TYPE
-
-   # overview changed files
-   git status
-
-   # restore changes to all files but the .jsonnet files
-   git add *.jsonnet
-   git checkout ..  # .. should be the git repo's root
-   git reset
-
-   # inspect changes
-   git diff
-   ```
-
-   Finally if you identify changes you think should be retained, add and commit
-   them. Discard the remaining changes with a `git checkout .` command.
-
-3. *Learn how to generate an `eksctl` config file*
+2. *Learn/recall how to generate an `eksctl` config file*
 
    When upgrading an EKS cluster, we will use `eksctl` extensively and reference
    a generated config file, `$CLUSTER_NAME.eksctl.yaml`. It's generated from the
@@ -90,11 +36,15 @@ cluster is unused or that the maintenance is communicated ahead of time.
 
 ## Cluster upgrade
 
-### 1. Ensure in-cluster permissions
+### 1. Acquire and configure AWS credentials
+
+Refer to [](cloud-access:aws) on how to do this.
+
+### 2. Ensure in-cluster permissions
 
 The k8s api-server won't accept commands from you unless you have configured
-a mapping between the AWS user to a k8s user, and `eksctl` needs to make some
-commands behind the scenes.
+a mapping between the AWS user to a k8s user, as `eksctl` needs to make some
+k8s commands as a k8s user.
 
 This mapping is done from a ConfigMap in kube-system called `aws-auth`, and
 we can use an `eksctl` command to influence it.
@@ -108,20 +58,13 @@ eksctl create iamidentitymapping \
    --group=system:masters
 ```
 
-### 2. Acquire and configure AWS credentials
+Note that this doesn't make `kubectl` work against the k8s cluster, and if you
+would use `deployer use-cluster-credentials` you may disrupt the AWS user
+credentials.
 
-Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials.
+### 3. Open a new terminal window
 
-In case the AWS account isn't managed there, inspect
-`config/$CLUSTER_NAME/cluster.yaml` to understand what AWS account number to
-login to at https://console.aws.amazon.com/.
 
-Configure credentials like:
-
-```bash
-export AWS_ACCESS_KEY_ID="..."
-export AWS_SECRET_ACCESS_KEY="..."
-```
 
 ### 3. Upgrade the k8s control plane one minor version
 
@@ -138,7 +81,7 @@ where the version must be updated.
 {
    name: "openscapeshub",
    region: clusterRegion,
-   version: '1.27'
+   version: "1.29",
 }
 ```
 

diff --git a/docs/howto/upgrade-cluster/gcp.md b/docs/howto/upgrade-cluster/gcp.md
@@ -3,15 +3,8 @@
 # Upgrade Kubernetes cluster on GCP
 
 ```{warning}
-This upgrade could cause disruptions for users and trigger alerts for
-[](uptime-checks). To help other engineers, communicate that your are starting a
-cluster upgrade in the `#maintenance-notices` Slack channel.
-```
-
-```{warning}
-We haven't yet established a policy for planning and communicating maintenance
-procedures to users. So preliminary, only make a k8s cluster upgrade while the
-cluster is unused or that the maintenance is communicated ahead of time.
+Before proceeding, communicate that your are starting a cluster upgrade in the
+`#maintenance-notices` Slack channel.
 ```
 
 ## Pre-requisites

diff --git a/docs/howto/upgrade-cluster/index.md b/docs/howto/upgrade-cluster/index.md
@@ -4,12 +4,21 @@
 How we upgrade a Kubernetes cluster is specific to the cloud provider. This
 section covers topics in upgrading an existing Kubernetes cluster.
 
+(upgrade-cluster:planning)=
+## Upgrade planning
+
 ```{warning}
-As of now, we also only have written documentation for how to upgrade Kubernetes
-clusters on AWS.
+We haven't yet established a policy for planning and communicating maintenance
+procedures to community champions and users.
+
+Up until now we have made some k8s cluster upgrades opportunistically,
+especially for clusters that has showed little to no activity during some
+periods. Other cluster upgrades has been scheduled with community champions, and
+some in shared clusters has been announced ahead of time.
 ```
 
-## Upgrade policy
+(upgrade-cluster:ambition)=
+## Upgrade ambition
 
 1. To keep our k8s cluster's control plane and node pools upgraded to the latest
    _three_ and _four_ [official minor k8s versions] respectively at all times.
@@ -35,7 +44,8 @@ clusters on AWS.
 ```{toctree}
 :maxdepth: 1
 :caption: Upgrading Kubernetes clusters
-kinds-of-upgrade-disruptions.md
-aws.md
+upgrade-disruptions.md
 gcp.md
+aws.md
+azure.md
 ```
diff --git a/docs/howto/upgrade-cluster/disruptions.md → ...to/upgrade-cluster/upgrade-disruptions.md b/docs/howto/upgrade-cluster/disruptions.md → ...to/upgrade-cluster/upgrade-disruptions.md
@@ -1,21 +1,29 @@
 (upgrade-cluster:disruptions)=
 
-# Overview of different kinds of upgrade disruptions
+# About upgrade disruptions
 
 When we upgrade our Kubernetes clusters we can cause different kinds of
-disruptions, this is an overview of the kinds of disruptions we should consider.
+disruptions, this text provides an overview of them.
 
 ## Kubernetes api-server disruption
 
 K8s clusters' control plane (api-server etc.) can be either highly available
-(HA) or not. EKS clusters and "regional" GKE clusters are HA, but "zonal" GKE
-clusters are not. Upgrading a HA Kubernetes cluster's control plane should by
-itself not cause any disruptions.
+(HA) or not. EKS clusters, AKS clusters, and "regional" GKE clusters are HA, but
+"zonal" GKE clusters are not. A few of our GKE clusters are zonal still, but as
+the cost savings are minimal we only create for regional clusters now.
 
-If you upgrade a non-HA cluster's control plane, it typically takes less time,
-only ~5 minutes. During this time user pods and JupyterHub remains accessible,
-but JupyterHub won't be able to start new user servers, and user servers won't
-be able to create or scale their dask-clusters.
+If upgrading a zonal cluster, the single k8s api-server will be temporarily
+unavailable, but that is not a big problem as user servers and JupyterHub will
+remains accessible. The brief disruption is that JupyterHub won't be able to
+start new user servers, and user servers won't be able to create or scale their
+dask-clusters.
+
+## Provider managed workload disruptions
+
+When upgrading a cloud provider managed k8s cluster, it may upgrade some managed
+workload part of the k8s cluster, such as calico that enforces NetworkPolicy
+rules. Maybe this could cause a disruption for users, but its not currently know
+to do so.
 
 ## Core node pool disruptions
 
@@ -40,10 +48,6 @@ will however have broken connections and user pods unable to establish new
 directly if `kubectl delete` is used on this single pod, or `kubectl drain` is
 used on the node.
 
-As long as we have one replica, we should to minimize user disruptions do
-`kubectl rollout restart -n support deploy/support-ingress-nginx-controller` to
-migrate this pod from one node to another before `kubectl drain` is used.
-
 ### hub pod disruptions
 
 Our JupyterHub installations each has a single `hub` pod, and having more isn't
@@ -71,4 +75,4 @@ of five minutes.
 
 ## User node pool disruptions
 
-Disruptions to a user node pool will disrupt user server pods running on it. A user 
+Disruptions to a user node pool will disrupt user server pods running on it.