From 1cf940cc2cc5861efe2b31ae25b2841ee70bccad Mon Sep 17 00:00:00 2001 From: shane knapp Date: Tue, 21 May 2024 13:54:06 -0700 Subject: [PATCH 01/19] WIP --- docs/admins/howto/clusterswitch.md | 39 +++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index a7485acfc..3a7f5b548 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -4,12 +4,49 @@ This document describes how to switch an existing hub to a new cluster. The exa ## Make a new cluster 1. Create a new cluster using the specifications here: - https://docs.datahub.berkeley.edu/en/latest/topic/cluster-config.html + https://docs.datahub.berkeley.edu/en/latest/admins/cluster-config.html 2. Set up helm on the cluster according to the instructions here: http://z2jh.jupyter.org/en/latest/setup-helm.html - Make sure the version of helm you're working with matches the version CircleCI is using. For example: https://github.com/berkeley-dsep-infra/datahub/blob/staging/.circleci/config.yml#L169 +## Setting the 'context' for kubectl and work on the new cluster. +1. Ensure you're logged in to GCP: `gcloud auth login` +2. Pull down the credentials from the new cluster: `gcloud container clusters get-credentials --region us-central1` +3. Switch the kubectl context to this cluster: `kubectl config use-context gke_ucb-datahub-2018_us-central1_` + +## Install and configure the certificate manager +Before you can deploy any of the hubs or support tooling, the certificate manager must be installed and +configured on the new cluster. Until this is done, `hubploy` and `helm` will fail with the following error: +`ensure CRDs are installed first`. + +1. Create a new feature branch and update your helm dependencies: `helm dep up` +2. At this point, it's usually wise to upgrade `cert-manager` to the latest version found in the chart repo. + You can find this by running the following command: + + cert-manager-version=$(helm show all -n cert-manager jetstack/cert-manager | grep ^appVersion | awk '{print $2}') + +3. Then, you can install the latest version of `cert-manager`: + + kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/${cert-manager-version}/cert-manager.yaml + +4. Change the corresponding entry in `support/requirements.yaml` to `$cert-manager-version` and commit the changes (do not push). + +## Create the node-placeholder k8s namespace +The [calendar autoscaler](https://docs.datahub.berkeley.edu/en/latest/admins/howto/calendar-scaler.html) requires the `node-placeholder` namespace. Run the following command to create it: + + kubectl create namespace node-placeholder + +## Manually deploy the support and prometheus pools +Now we will manually deploy the `support` helm chart: + + sops -d support/secrets.yaml > /tmp/secrets.yaml + helm install -f support/values.yaml -f /tmp/secrets.yaml -n support support support/ --set installCRDs=true --debug --create-namespace + +## Manually deploy a cluster + +## Update CircleCI + ## Switch staging over to new cluster 1. Change the name of the cluster in hubploy.yaml to match the name you chose when creating your new cluster. 2. Make sure the staging IP is a 'static' IP - so we don't lose the IP. You can see the list of IPs used by the project by checking the google cloud console. From ee392230ce34c31e7de310774c704970d5b5478d Mon Sep 17 00:00:00 2001 From: shane knapp Date: Wed, 22 May 2024 13:26:11 -0700 Subject: [PATCH 02/19] more WIP --- docs/admins/howto/clusterswitch.md | 31 +++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 3a7f5b548..6ddfda0bd 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -9,12 +9,18 @@ This document describes how to switch an existing hub to a new cluster. The exa http://z2jh.jupyter.org/en/latest/setup-helm.html - Make sure the version of helm you're working with matches the version CircleCI is using. For example: https://github.com/berkeley-dsep-infra/datahub/blob/staging/.circleci/config.yml#L169 +3. Re-create all existing node pools for hubs, support and prometheus deployments in the new cluster. If the old cluster is still up and running, you will probably run out of CPU quota, as the new node pools will immediately default to three nodes. Wait ~15m for the new pools to wind down to zero, and then continue. ## Setting the 'context' for kubectl and work on the new cluster. 1. Ensure you're logged in to GCP: `gcloud auth login` 2. Pull down the credentials from the new cluster: `gcloud container clusters get-credentials --region us-central1` 3. Switch the kubectl context to this cluster: `kubectl config use-context gke_ucb-datahub-2018_us-central1_` +## Recreate node pools +Re-create all existing node pools for hubs, support and prometheus deployments in the new cluster. + +If the old cluster is still up and running, you will probably run out of CPU quota, as the new node pools will immediately default to three nodes. Wait ~15m for the new pools to wind down to zero, and then continue. + ## Install and configure the certificate manager Before you can deploy any of the hubs or support tooling, the certificate manager must be installed and configured on the new cluster. Until this is done, `hubploy` and `helm` will fail with the following error: @@ -37,16 +43,39 @@ The [calendar autoscaler](https://docs.datahub.berkeley.edu/en/latest/admins/how kubectl create namespace node-placeholder +## Switch DNS to the new cluster's endpoint IP and point our deployment at it. +1. Grab the new endpoint: `gcloud container clusters describe --region us-central1 | grep ^endpoint` +2. Open [infoblox](https://infoblox.net.berkeley.edu) and change the wildcard entry for datahub to the IP from the previous step. +3. Create a new static IP. +4. Update `support/values.yaml`, under `ingress-nginx` with the newly created IP from infoblox: `loadBalancerIP: xx.xx.xx.xx` +5. Add and commit this change to your feature branch (still do not push). + ## Manually deploy the support and prometheus pools +First, update any node pools in the configs to point to the new cluster. Typically, this is just for the `ingress-nginx` controllers in `support/values.yaml`. + Now we will manually deploy the `support` helm chart: sops -d support/secrets.yaml > /tmp/secrets.yaml helm install -f support/values.yaml -f /tmp/secrets.yaml -n support support support/ --set installCRDs=true --debug --create-namespace -## Manually deploy a cluster +One special thing to note: our `prometheus` instance uses a persistent volume that contains historical monitoring data. This is specified in `support/values.yaml`, under the `prometheus:` block: + + persistentVolume: + size: 1000Gi + storageClass: ssd + existingClaim: prometheus-data-2024-05-15 + +## Manually deploy a hub +Finally, we can attempt to deploy a hub to the new cluster! Any hub will do, but we should start with a low-traffic hub (eg: https://dev.datahub.berkeley.edu). + + + hubploy deploy dev hub staging + +When the deploy is done, visit that hub and confirm that things are working. ## Update CircleCI + ## Switch staging over to new cluster 1. Change the name of the cluster in hubploy.yaml to match the name you chose when creating your new cluster. 2. Make sure the staging IP is a 'static' IP - so we don't lose the IP. You can see the list of IPs used by the project by checking the google cloud console. From e2a898dbfb706f9d37f17860fdc0146956e3e691 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Tue, 28 May 2024 10:25:57 -0700 Subject: [PATCH 03/19] more edits --- docs/admins/howto/clusterswitch.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 6ddfda0bd..ec33cfb1a 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -65,16 +65,26 @@ One special thing to note: our `prometheus` instance uses a persistent volume th storageClass: ssd existingClaim: prometheus-data-2024-05-15 -## Manually deploy a hub +## Manually deploy a hub to staging Finally, we can attempt to deploy a hub to the new cluster! Any hub will do, but we should start with a low-traffic hub (eg: https://dev.datahub.berkeley.edu). +First, check the hub's configs for any node pools that need updating. Typically, this is just the core pool. After this is done, add the changes to your feature branch (but don't push). After that, deploy a hub manually: hubploy deploy dev hub staging When the deploy is done, visit that hub and confirm that things are working. +## Manually deploy remaining hubs to staging +Now, update the remaining hubs' configs to point to the new core pool and use `hubploy` to deploy them to staging as with the previous step. The easiest way to do this is to have a list of hubs in a text file, and iterate over it with a `for` loop: + + for x in $(cat hubs.txt); do hubploy deploy ${x} hub staging; done + +When done, add the modified configs to your feature branch (and again, don't push yet). + ## Update CircleCI +Once you've successfully deployed the clusters manually via `hubploy`, it's time to update CircleCI to point to the new cluster. +All you need to do is `grep` for the old cluster name in `.circleci/config.yaml` and change this to the name of the new cluster. There should just be four entries: two for the `gcloud get credentials `, and two in comments. Make these changes and add them to your existing feature branch, but don't commit yet. ## Switch staging over to new cluster 1. Change the name of the cluster in hubploy.yaml to match the name you chose when creating your new cluster. From 995098ccb62134adf675189704be2a7f3a7311f6 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Wed, 5 Jun 2024 11:55:01 -0700 Subject: [PATCH 04/19] moar edits --- docs/admins/howto/clusterswitch.md | 40 +++++------------------------- 1 file changed, 6 insertions(+), 34 deletions(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index ec33cfb1a..24266326b 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -74,10 +74,11 @@ First, check the hub's configs for any node pools that need updating. Typically When the deploy is done, visit that hub and confirm that things are working. -## Manually deploy remaining hubs to staging +## Manually deploy remaining hubs to staging and prod Now, update the remaining hubs' configs to point to the new core pool and use `hubploy` to deploy them to staging as with the previous step. The easiest way to do this is to have a list of hubs in a text file, and iterate over it with a `for` loop: for x in $(cat hubs.txt); do hubploy deploy ${x} hub staging; done + for x in $(cat hubs.txt); do hubploy deploy ${x} hub prod; done When done, add the modified configs to your feature branch (and again, don't push yet). @@ -86,41 +87,12 @@ Once you've successfully deployed the clusters manually via `hubploy`, it's time All you need to do is `grep` for the old cluster name in `.circleci/config.yaml` and change this to the name of the new cluster. There should just be four entries: two for the `gcloud get credentials `, and two in comments. Make these changes and add them to your existing feature branch, but don't commit yet. -## Switch staging over to new cluster -1. Change the name of the cluster in hubploy.yaml to match the name you chose when creating your new cluster. -2. Make sure the staging IP is a 'static' IP - so we don't lose the IP. You can see the list of IPs used by the project by checking the google cloud console. - For example: https://console.cloud.google.com/networking/addresses/list?project=data8x-scratch - Make sure you are in the right project! -3. If the staging IP (which you can find in staging.yaml) is marked as 'ephemeral', mark it as 'static' -4. Make a PR that includes your hubploy.yaml change, but don't merge it just yet. +## Create and merge your PR! +Now you can finally push your changes to github. Create a PR, merge to `staging` and immediately kill off the deploy jobs for `node-placeholder`, `support` and `deploy`. -Now we will perform the IP switch over from the old cluster to the new cluster. There will be downtime during the switchover! +Create another PR to merge to `prod` and that deploy should work just fine. -The current easiest way to do this is: -1. Merge the PR. -2. Immediately delete the service 'proxy-public' in the appropriate staging namespace in the old cluster. Make sure you have the command ready for this so that you can execute reasonably quickly. - - gcloud container clusters list - gcloud container clusters get-credentials ${OLDCLUSTER} --region=us-central1 - kubectl --namespace=data8x-staging get svc - kubectl --namespace=data8x-staging delete svc proxy-public - -As the PR deploys, staging on the new cluster should pick up the IP we released from the old cluster. This way we don't have to wait for DNS propagation time. - -At this time you can switch to the new cluster and watch the pods come up. - -Once done, poke around and make sure the staging cluster works fine. Since data8x requires going through EdX in order to load a hub, testing can be tricky. If you're able, the easiest way is to edit an old course you have access to and point one the notebooks to the staging instance. - -Assuming everything worked correctly, you can follow the above steps to switch production over. - -## Get hub logs from old cluster -Prior to deleting the old cluster, fetch the usage logs. - - HUB=data8x - kubectl --namespace=${HUB}-prod exec -it $(kubectl --namespace=${HUB}-prod get pod -l component=hub -o name | sed 's_pod/__') -- grep -a 'seconds to ' jupyterhub.log > ${HUB}-usage.log - -Currently these are being placed on google drive here: - https://drive.google.com/open?id=1bUIJYGdFZCgmFXkhkPzFalJ1v9T8v7__ +FIN! ## Deleting the old cluster From 5d3716001b63093d58578a2b8be63a4ea58d1359 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Wed, 5 Jun 2024 17:41:44 -0700 Subject: [PATCH 05/19] more more edits --- docs/admins/howto/clusterswitch.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 24266326b..75f7a5dfe 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -68,14 +68,20 @@ One special thing to note: our `prometheus` instance uses a persistent volume th ## Manually deploy a hub to staging Finally, we can attempt to deploy a hub to the new cluster! Any hub will do, but we should start with a low-traffic hub (eg: https://dev.datahub.berkeley.edu). -First, check the hub's configs for any node pools that need updating. Typically, this is just the core pool. After this is done, add the changes to your feature branch (but don't push). After that, deploy a hub manually: +First, check the hub's configs for any node pools that need updating. Typically, this is just the core pool. + +Second, update `hubploy.yaml` for this hub and point it to the new cluster you've created. + +After this is done, add the changes to your feature branch (but don't push). After that, deploy a hub manually: hubploy deploy dev hub staging When the deploy is done, visit that hub and confirm that things are working. ## Manually deploy remaining hubs to staging and prod -Now, update the remaining hubs' configs to point to the new core pool and use `hubploy` to deploy them to staging as with the previous step. The easiest way to do this is to have a list of hubs in a text file, and iterate over it with a `for` loop: +Now, update the remaining hubs' configs to point to the new node pools and `hubploy.yaml` to the cluster. + +Then use `hubploy` to deploy them to staging as with the previous step. The easiest way to do this is to have a list of hubs in a text file, and iterate over it with a `for` loop: for x in $(cat hubs.txt); do hubploy deploy ${x} hub staging; done for x in $(cat hubs.txt); do hubploy deploy ${x} hub prod; done From d36522857a9932297e1ba2fe0ac7eae93233ba5a Mon Sep 17 00:00:00 2001 From: Balaji Alwar Date: Fri, 14 Jun 2024 18:14:36 -0700 Subject: [PATCH 06/19] Enable enhanced privileges for Data 100 summer instructors --- deployments/data100/config/common.yaml | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/deployments/data100/config/common.yaml b/deployments/data100/config/common.yaml index 5168ba6e7..2f44c20a8 100644 --- a/deployments/data100/config/common.yaml +++ b/deployments/data100/config/common.yaml @@ -32,18 +32,18 @@ jupyterhub: # this role will be assigned to... groups: - course::1524699::group::all-admins - # Data 100, Spring 2024, https://github.com/berkeley-dsep-infra/datahub/issues/5376 - #course-staff-1531798: + #Data 100, Summer 2024, https://github.com/berkeley-dsep-infra/datahub/issues/5802 + course-staff-1535115: # description: Enable course staff to view and access servers. # this role provides permissions to... - # scopes: - # - admin-ui - # - list:users!group=course::1531798 - # - admin:servers!group=course::1531798 - # - access:servers!group=course::1531798 + scopes: + - admin-ui + - list:users!group=course::1535115 + - admin:servers!group=course::1535115 + - access:servers!group=course::1535115 # this role will be assigned to... - # groups: - # - course::1531798::group::Admins + groups: + - course:: 1535115::group::Admins # Econ 148, Spring 2024, DH-225 #course-staff-1532866: # description: Enable course staff to view and access servers. From 6ca379958822ccf8abc6a0631e570935f28d7cd0 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Sat, 15 Jun 2024 10:18:48 -0700 Subject: [PATCH 07/19] removing errant space --- deployments/data100/config/common.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deployments/data100/config/common.yaml b/deployments/data100/config/common.yaml index 2f44c20a8..72c79b7e7 100644 --- a/deployments/data100/config/common.yaml +++ b/deployments/data100/config/common.yaml @@ -43,7 +43,7 @@ jupyterhub: - access:servers!group=course::1535115 # this role will be assigned to... groups: - - course:: 1535115::group::Admins + - course::1535115::group::Admins # Econ 148, Spring 2024, DH-225 #course-staff-1532866: # description: Enable course staff to view and access servers. From 020a485850d816e91426a797b0d8b39f789435d5 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Sat, 15 Jun 2024 10:56:36 -0700 Subject: [PATCH 08/19] quick readme update --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6ec62b08e..bbdd68e55 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,10 @@ # Berkeley JupyterHubs Contains a fully reproducible configuration for JupyterHub on datahub.berkeley.edu, -as well as its single user image. +as well as the single user images. + +[UC Berkeley Datahub](https://cdss.berkeley.edu/data) +[UC Berkeley CDSS](https://cdss.berkeley.edu) ## Branches From fca19b9d0d0d30dcca2c85ec7059ac49a0ff546f Mon Sep 17 00:00:00 2001 From: shane knapp Date: Sat, 15 Jun 2024 10:59:08 -0700 Subject: [PATCH 09/19] added newline --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index bbdd68e55..bbe842c5b 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ Contains a fully reproducible configuration for JupyterHub on datahub.berkeley.e as well as the single user images. [UC Berkeley Datahub](https://cdss.berkeley.edu/data) + [UC Berkeley CDSS](https://cdss.berkeley.edu) ## Branches From 0e5f0ac72585d2678e84500084e7cb3a20f00e35 Mon Sep 17 00:00:00 2001 From: ryanlovett Date: Mon, 17 Jun 2024 00:15:09 -0700 Subject: [PATCH 10/19] Bump jupyter-server-proxy for security update. https://github.com/jupyterhub/jupyter-server-proxy/security/advisories/GHSA-fvcq-4x64-hqxr --- deployments/astro/image/environment.yml | 2 +- deployments/biology/image/environment.yml | 2 +- deployments/cee/image/environment.yml | 2 +- deployments/data101/image/environment.yml | 2 +- deployments/datahub/images/default/environment.yml | 2 +- deployments/dev/images/default/environment.yml | 2 +- deployments/eecs/image/environment.yml | 2 +- deployments/ischool/image/environment.yml | 2 +- deployments/julia/image/environment.yml | 2 +- deployments/publichealth/image/environment.yml | 2 +- deployments/shiny/image/environment.yml | 2 +- deployments/stat159/image/environment.yml | 2 +- deployments/stat20/image/environment.yml | 2 +- 13 files changed, 13 insertions(+), 13 deletions(-) diff --git a/deployments/astro/image/environment.yml b/deployments/astro/image/environment.yml index 5edf61d8a..16d4ab815 100644 --- a/deployments/astro/image/environment.yml +++ b/deployments/astro/image/environment.yml @@ -6,7 +6,7 @@ channels: dependencies: - python=3.11.* -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 # A linux desktop environment - websockify diff --git a/deployments/biology/image/environment.yml b/deployments/biology/image/environment.yml index a8997208f..e4c9924eb 100644 --- a/deployments/biology/image/environment.yml +++ b/deployments/biology/image/environment.yml @@ -9,7 +9,7 @@ dependencies: - nb_conda_kernels=2.3.1 # proxy web applications -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-rsession-proxy==2.0.1 # Packages from bioconda for IB134L diff --git a/deployments/cee/image/environment.yml b/deployments/cee/image/environment.yml index c084aa216..1c7f9b056 100644 --- a/deployments/cee/image/environment.yml +++ b/deployments/cee/image/environment.yml @@ -5,7 +5,7 @@ channels: # Only libraries *not* available in PyPI should be here dependencies: - python=3.11.* -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 #adding math functionality - matplotlib=3.7.* - scipy=1.10.* diff --git a/deployments/data101/image/environment.yml b/deployments/data101/image/environment.yml index 92a0e4a13..99cf7e93f 100644 --- a/deployments/data101/image/environment.yml +++ b/deployments/data101/image/environment.yml @@ -34,7 +34,7 @@ dependencies: - jupyter-archive==3.4.0 - jupyter-book==0.15.1 - jupyter-resource-usage==1.0.0 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter_bokeh - jupyterlab==4.0.11 - jupyterlab-favorites==3.0.0 diff --git a/deployments/datahub/images/default/environment.yml b/deployments/datahub/images/default/environment.yml index e84ff3859..7e21be4d2 100644 --- a/deployments/datahub/images/default/environment.yml +++ b/deployments/datahub/images/default/environment.yml @@ -73,7 +73,7 @@ dependencies: # data8; foundation - datascience==0.17.6 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-rsession-proxy==2.2.0 - folium==0.12.1.post1 diff --git a/deployments/dev/images/default/environment.yml b/deployments/dev/images/default/environment.yml index 6e357b7b6..b8fabc4d7 100644 --- a/deployments/dev/images/default/environment.yml +++ b/deployments/dev/images/default/environment.yml @@ -5,7 +5,7 @@ dependencies: # bug w/notebook and traitlets: https://github.com/jupyter/notebook/issues/7048 - traitlets=5.9.* -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-rsession-proxy==2.2.0 - syncthing==1.23.5 diff --git a/deployments/eecs/image/environment.yml b/deployments/eecs/image/environment.yml index 8f5d709d8..c481bdd5a 100644 --- a/deployments/eecs/image/environment.yml +++ b/deployments/eecs/image/environment.yml @@ -7,7 +7,7 @@ dependencies: - python=3.11.* - nbclassic==1.0.0 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 # Visual Studio Code! - jupyter-vscode-proxy=0.1 - code-server=4.5.2 diff --git a/deployments/ischool/image/environment.yml b/deployments/ischool/image/environment.yml index 28ce5e586..877fbc307 100644 --- a/deployments/ischool/image/environment.yml +++ b/deployments/ischool/image/environment.yml @@ -8,7 +8,7 @@ dependencies: - jupyter-rsession-proxy==2.2.0 # https://github.com/berkeley-dsep-infra/datahub/issues/5251 - nodejs=16 # code-server requires node < 17 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-vscode-proxy==0.5 - code-server==4.10.1 # bug w/notebook and traitlets: https://github.com/jupyter/notebook/issues/7048 diff --git a/deployments/julia/image/environment.yml b/deployments/julia/image/environment.yml index 859659c6e..d2faefc2e 100644 --- a/deployments/julia/image/environment.yml +++ b/deployments/julia/image/environment.yml @@ -1,5 +1,5 @@ dependencies: -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - nodejs==20.8.1 - pip==22.3.1 - python==3.11.* diff --git a/deployments/publichealth/image/environment.yml b/deployments/publichealth/image/environment.yml index d87507720..6a1ca02fd 100644 --- a/deployments/publichealth/image/environment.yml +++ b/deployments/publichealth/image/environment.yml @@ -1,7 +1,7 @@ dependencies: - pip - syncthing==1.18.6 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-rsession-proxy==2.2.0 - pip: # bug w/notebook and traitlets: https://github.com/jupyter/notebook/issues/7048 diff --git a/deployments/shiny/image/environment.yml b/deployments/shiny/image/environment.yml index fc84d5b45..abc7bc0b1 100644 --- a/deployments/shiny/image/environment.yml +++ b/deployments/shiny/image/environment.yml @@ -3,7 +3,7 @@ dependencies: - ipywidgets==8.1.2 - jupyter-archive==3.4.0 - jupyter-resource-usage==1.0.1 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-rsession-proxy==2.2.0 - jupyter-syncthing-proxy==1.0.3 - jupyterhub==4.1.5 diff --git a/deployments/stat159/image/environment.yml b/deployments/stat159/image/environment.yml index 9f941f17b..eb154eb65 100644 --- a/deployments/stat159/image/environment.yml +++ b/deployments/stat159/image/environment.yml @@ -137,7 +137,7 @@ dependencies: - syncthing==1.23.0 - websockify==0.11.0 - - jupyter-server-proxy==4.1.2 + - jupyter-server-proxy==4.2.0 # VS Code support - jupyter-vscode-proxy==0.2 - code-server==4.10.1 diff --git a/deployments/stat20/image/environment.yml b/deployments/stat20/image/environment.yml index ff7148386..8d91f0335 100644 --- a/deployments/stat20/image/environment.yml +++ b/deployments/stat20/image/environment.yml @@ -5,7 +5,7 @@ channels: dependencies: - syncthing==1.22.2 -- jupyter-server-proxy==4.1.2 +- jupyter-server-proxy==4.2.0 - jupyter-rsession-proxy==2.2.0 # bug w/notebook and traitlets: https://github.com/jupyter/notebook/issues/7048 - traitlets=5.9.* From 98508ac95af555522f7a02fac39867482e660b77 Mon Sep 17 00:00:00 2001 From: ryanlovett Date: Mon, 17 Jun 2024 09:47:39 -0700 Subject: [PATCH 11/19] Bump and disable paup. --- deployments/biology/image/bio1b-packages.bash | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/deployments/biology/image/bio1b-packages.bash b/deployments/biology/image/bio1b-packages.bash index d46adb68a..d926eaddb 100644 --- a/deployments/biology/image/bio1b-packages.bash +++ b/deployments/biology/image/bio1b-packages.bash @@ -1,5 +1,11 @@ # Install PAUP* for BIO 1B # https://github.com/berkeley-dsep-infra/datahub/issues/1699 -wget http://phylosolutions.com/paup-test/paup4a168_ubuntu64.gz -O ${CONDA_DIR}/bin/paup.gz + +# This package was requested in 2020 for the instructor to try out. +# The 168 version doesn't exist so I've bumped it to 169, but also disabled +# it in case the package is no longer needed. +return + +wget https://phylosolutions.com/paup-test/paup4a169_ubuntu64.gz -O ${CONDA_DIR}/bin/paup.gz gunzip ${CONDA_DIR}/bin/paup.gz chmod +x ${CONDA_DIR}/bin/paup From 7022c95de0e206cd2ae936ac1ca924a68ef3a0d6 Mon Sep 17 00:00:00 2001 From: ryanlovett Date: Mon, 17 Jun 2024 09:51:05 -0700 Subject: [PATCH 12/19] Temporarily unpin DataFrames version. There's a dependency problem in CI. --- deployments/julia/image/install-julia-packages.jl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deployments/julia/image/install-julia-packages.jl b/deployments/julia/image/install-julia-packages.jl index d1cf6324e..c30f854eb 100755 --- a/deployments/julia/image/install-julia-packages.jl +++ b/deployments/julia/image/install-julia-packages.jl @@ -15,7 +15,7 @@ Pkg.add.([ Pkg.PackageSpec(;name="VegaLite", version="2.6.0"), Pkg.PackageSpec(;name="CSVFiles", version="1.0.1"), Pkg.PackageSpec(;name="Distributions", version="0.23.11"), - Pkg.PackageSpec(;name="DataFrames", version="0.21.8"), + Pkg.PackageSpec(;name="DataFrames"), Pkg.PackageSpec(;name="Plots", version="1.24.3"), Pkg.PackageSpec(;name="Images", version="0.24.1"), Pkg.PackageSpec(;name="PyPlot", version="2.10.0"), From 2401d1d517478887de452a3bcdcf14a462b6493f Mon Sep 17 00:00:00 2001 From: Balaji Alwar Date: Mon, 17 Jun 2024 10:34:43 -0700 Subject: [PATCH 13/19] Scale down workshop hub RAM --- deployments/workshop/config/common.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/deployments/workshop/config/common.yaml b/deployments/workshop/config/common.yaml index 2d1a2853c..1efc418de 100644 --- a/deployments/workshop/config/common.yaml +++ b/deployments/workshop/config/common.yaml @@ -49,5 +49,5 @@ jupyterhub: subPath: "{username}" memory: # As low a guarantee as possible - guarantee: 4G - limit: 4G + guarantee: 1G + limit: 1G From 79bcd14f1f76242984e362c92bffabb9ebb66261 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Mon, 17 Jun 2024 13:17:35 -0700 Subject: [PATCH 14/19] expanding upon DNS black magic *waves hands* --- docs/admins/howto/clusterswitch.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 75f7a5dfe..4ab024d17 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -43,13 +43,15 @@ The [calendar autoscaler](https://docs.datahub.berkeley.edu/en/latest/admins/how kubectl create namespace node-placeholder -## Switch DNS to the new cluster's endpoint IP and point our deployment at it. -1. Grab the new endpoint: `gcloud container clusters describe --region us-central1 | grep ^endpoint` -2. Open [infoblox](https://infoblox.net.berkeley.edu) and change the wildcard entry for datahub to the IP from the previous step. -3. Create a new static IP. -4. Update `support/values.yaml`, under `ingress-nginx` with the newly created IP from infoblox: `loadBalancerIP: xx.xx.xx.xx` +## Create a new static endpoint IP and switch DNS to point our new deployment at it. +1. Create a new static endpoint IP in the [GCP console](https://console.cloud.google.com/networking/addresses/add?project=ucb-datahub-2018). +2. Grab the new endpoint: `gcloud container clusters describe --region us-central1 | grep ^endpoint` +3. Open [infoblox](https://infoblox.net.berkeley.edu) and change the wildcard and empty entries for datahub.berkeley.edu to point to the IP from the previous step. +4. Update `support/values.yaml`, under `ingress-nginx` with the newly created IP from infoblox: `loadBalancerIP: xx.xx.xx.xx`. 5. Add and commit this change to your feature branch (still do not push). +You will re-deploy the support chart in the next step. + ## Manually deploy the support and prometheus pools First, update any node pools in the configs to point to the new cluster. Typically, this is just for the `ingress-nginx` controllers in `support/values.yaml`. @@ -58,6 +60,8 @@ Now we will manually deploy the `support` helm chart: sops -d support/secrets.yaml > /tmp/secrets.yaml helm install -f support/values.yaml -f /tmp/secrets.yaml -n support support support/ --set installCRDs=true --debug --create-namespace +Before continuing, confirm via the GCP console that the IP that was defined in step 1 is now [bound to a forwarding rule](https://console.cloud.google.com/networking/addresses/list?project=ucb-datahub-2018). You can further confirm by listing the services in the [support chart](https://github.com/berkeley-dsep-infra/datahub/blob/staging/support/requirements.yaml) and making sure the ingress-controller is using the newly defined IP. + One special thing to note: our `prometheus` instance uses a persistent volume that contains historical monitoring data. This is specified in `support/values.yaml`, under the `prometheus:` block: persistentVolume: From 0558bb96f43f4fe25308d2af4f48b102c4735046 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Mon, 17 Jun 2024 13:26:15 -0700 Subject: [PATCH 15/19] adding some reasoning behind doing this herculean task --- docs/admins/howto/clusterswitch.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 4ab024d17..391a0fad1 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -1,8 +1,10 @@ # Switching over a hub to a new cluster -This document describes how to switch an existing hub to a new cluster. The example used here refers to the data8x hub. +This document describes how to switch an existing hub to a new cluster. The example used here refers to moving all UC Berkeley Datahubs. -## Make a new cluster +You might find it easier to switch to a new cluster if you're running a [very old k8s version](https://cloud.google.com/kubernetes-engine/docs/release-notes), or in lieu of performing a [cluster credential rotation](https://cloud.google.com/kubernetes-engine/docs/how-to/credential-rotation). + +## Create a new cluster 1. Create a new cluster using the specifications here: https://docs.datahub.berkeley.edu/en/latest/admins/cluster-config.html 2. Set up helm on the cluster according to the instructions here: From a788ff38d3fcc878d4ad994c8536566fb56e46fd Mon Sep 17 00:00:00 2001 From: shane knapp Date: Mon, 17 Jun 2024 13:27:57 -0700 Subject: [PATCH 16/19] more verbiage --- docs/admins/howto/clusterswitch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 391a0fad1..8abe29574 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -2,7 +2,7 @@ This document describes how to switch an existing hub to a new cluster. The example used here refers to moving all UC Berkeley Datahubs. -You might find it easier to switch to a new cluster if you're running a [very old k8s version](https://cloud.google.com/kubernetes-engine/docs/release-notes), or in lieu of performing a [cluster credential rotation](https://cloud.google.com/kubernetes-engine/docs/how-to/credential-rotation). +You might find it easier to switch to a new cluster if you're running a [very old k8s version](https://cloud.google.com/kubernetes-engine/docs/release-notes), or in lieu of performing a [cluster credential rotation](https://cloud.google.com/kubernetes-engine/docs/how-to/credential-rotation). Sometimes starting from scratch is easier than an iterative and potentially destructive series of operations. ## Create a new cluster 1. Create a new cluster using the specifications here: From 7e477530b186f6536eac76bbfbdd8846f279e1e2 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Mon, 17 Jun 2024 13:32:03 -0700 Subject: [PATCH 17/19] more end-of-switchover task verbiage --- docs/admins/howto/clusterswitch.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 8abe29574..7bef54ff7 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -104,6 +104,9 @@ Now you can finally push your changes to github. Create a PR, merge to `staging Create another PR to merge to `prod` and that deploy should work just fine. +## Update log and billing sinks, BigQuery queries, etc. +I would recommend searching GCP console for all occurrences of the old cluster name, and fixing any bits that might be left over. This should only take a few minutes, but should definitely be done. + FIN! ## Deleting the old cluster From b82684e05f5079465a208d6b38606cbd5eb41b08 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Mon, 17 Jun 2024 13:55:59 -0700 Subject: [PATCH 18/19] update dns/ip stuff acccording to felders feedback --- docs/admins/howto/clusterswitch.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index 7bef54ff7..f9348f820 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -47,10 +47,9 @@ The [calendar autoscaler](https://docs.datahub.berkeley.edu/en/latest/admins/how ## Create a new static endpoint IP and switch DNS to point our new deployment at it. 1. Create a new static endpoint IP in the [GCP console](https://console.cloud.google.com/networking/addresses/add?project=ucb-datahub-2018). -2. Grab the new endpoint: `gcloud container clusters describe --region us-central1 | grep ^endpoint` -3. Open [infoblox](https://infoblox.net.berkeley.edu) and change the wildcard and empty entries for datahub.berkeley.edu to point to the IP from the previous step. -4. Update `support/values.yaml`, under `ingress-nginx` with the newly created IP from infoblox: `loadBalancerIP: xx.xx.xx.xx`. -5. Add and commit this change to your feature branch (still do not push). +2. Open [infoblox](https://infoblox.net.berkeley.edu) and change the wildcard and empty entries for datahub.berkeley.edu to point to the IP from the previous step. +3. Update `support/values.yaml`, under `ingress-nginx` with the newly created IP from infoblox: `loadBalancerIP: xx.xx.xx.xx`. +4. Add and commit this change to your feature branch (still do not push). You will re-deploy the support chart in the next step. From cec4201079743b301d57828058852fe4ba5ca6ae Mon Sep 17 00:00:00 2001 From: felder Date: Mon, 17 Jun 2024 14:10:06 -0700 Subject: [PATCH 19/19] Update clusterswitch.md --- docs/admins/howto/clusterswitch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/admins/howto/clusterswitch.md b/docs/admins/howto/clusterswitch.md index f9348f820..de1d16b8d 100644 --- a/docs/admins/howto/clusterswitch.md +++ b/docs/admins/howto/clusterswitch.md @@ -45,8 +45,8 @@ The [calendar autoscaler](https://docs.datahub.berkeley.edu/en/latest/admins/how kubectl create namespace node-placeholder -## Create a new static endpoint IP and switch DNS to point our new deployment at it. -1. Create a new static endpoint IP in the [GCP console](https://console.cloud.google.com/networking/addresses/add?project=ucb-datahub-2018). +## Create a new static IP and switch DNS to point our new deployment at it. +1. Create a new static IP in the [GCP console](https://console.cloud.google.com/networking/addresses/add?project=ucb-datahub-2018). 2. Open [infoblox](https://infoblox.net.berkeley.edu) and change the wildcard and empty entries for datahub.berkeley.edu to point to the IP from the previous step. 3. Update `support/values.yaml`, under `ingress-nginx` with the newly created IP from infoblox: `loadBalancerIP: xx.xx.xx.xx`. 4. Add and commit this change to your feature branch (still do not push).