diff --git a/.nojekyll b/.nojekyll index 054e1cb48..0d6820a75 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -2b60b09f \ No newline at end of file +69080d1e \ No newline at end of file diff --git a/admins/cluster-config.html b/admins/cluster-config.html index 4f81d4831..d02987929 100644 --- a/admins/cluster-config.html +++ b/admins/cluster-config.html @@ -469,7 +469,7 @@

Kubernetes Cluster Configuration

Google Kubernetes Engine

In our experience, Google Kubernetes Engine (GKE) has been the most stable, performant, and reliable managed kubernetes service. We prefer running on this when possible.

-

A gcloud container clusters create command can succintly express the configuration of our kubernetes cluster. The following command represents the currently favored configuration.

+

A gcloud container clusters create command can succinctly express the configuration of our kubernetes cluster. The following command represents the currently favored configuration.

This creates the GKE cluster. It may host one or more node pools:

gcloud container clusters create \
      --enable-ip-alias \
diff --git a/admins/howto/calendar-scaler.html b/admins/howto/calendar-scaler.html
index 877fcc462..45f0af95b 100644
--- a/admins/howto/calendar-scaler.html
+++ b/admins/howto/calendar-scaler.html
@@ -493,7 +493,7 @@ 

pip install -r requirements.txt

Any changes to the scaler code will require you to run chartpress to redeploy the scaler to GCP.

Here is an example of how you can test any changes to scaler/calendar.py locally in the python interpreter:

-
# these tests will use somes dates culled from the calendar with varying numbers of events.
+
# these tests will use some dates culled from the calendar with varying numbers of events.
 import scaler.calendar
 import datetime
 import zoneinfo
diff --git a/admins/howto/course-config.html b/admins/howto/course-config.html
index de2d155bc..8060e4652 100644
--- a/admins/howto/course-config.html
+++ b/admins/howto/course-config.html
@@ -457,12 +457,12 @@ 

Course Configuration

Allocating Resources

-

It is possible to alter administrative priviliges or resources allocations (such as memory or extra volumes) of user servers from within the deployment configuration. This is mostly useful for when resources need to be increased based on users' class enrollments. The hub must be configured to use the CanvasOAuthenticator which is our default. Hubs that use dummy, Google, Generic OAuth, or other authenticators are not configured to allocate additional resources in this way.

+

It is possible to alter administrative privileges or resources allocations (such as memory or extra volumes) of user servers from within the deployment configuration. This is mostly useful for when resources need to be increased based on users' class enrollments. The hub must be configured to use the CanvasOAuthenticator which is our default. Hubs that use dummy, Google, Generic OAuth, or other authenticators are not configured to allocate additional resources in this way.

Additionally, it is also possible to allocate resources based on the students membership of Canvas groups. This is useful if the instructor wants to dynamically grant additional resources without CI round-trips. Group management can be performed by the course staff directly from bCourses.

Implementation

-

The authenticator reads users Canvas enrollments when they login, and then assigns them to JupyterHub groups based on those affiliations. Groups are named with the format "course::{canvas_id}::enrollment_type::{canvas_role}", e.g. "course::123456::enrollment_type::teacher" or "course::234567::enrollment_type::student". Our custom kubespawner, which we define in hub/values.yaml, reads users' group memberships prior to spawning. It then overrides various KubeSpawner paramters based on configuration we define, using the canvas ID as the key. (see below)

+

The authenticator reads users Canvas enrollments when they login, and then assigns them to JupyterHub groups based on those affiliations. Groups are named with the format "course::{canvas_id}::enrollment_type::{canvas_role}", e.g. "course::123456::enrollment_type::teacher" or "course::234567::enrollment_type::student". Our custom kubespawner, which we define in hub/values.yaml, reads users' group memberships prior to spawning. It then overrides various KubeSpawner parameters based on configuration we define, using the canvas ID as the key. (see below)

Note that if a user is assigned to a new Canvas group (e.g. by the instructor manually, or by an automated Canvas/SIS system) while their server is already running, they will need to logout and then log back in in order for the authenticator to see the new affiliations. Restarting the user server is not sufficient.

The canvas ID is somewhat opaque to infrastructure staff -- we cannot look it up ourselves nor predict what it would be based on the name of the course. This is why we must request it from the instructor.

There are a number of other Canvas course attributes we could have substituted for the ID, but all had various drawbacks. An SIS ID attribute uses a consistent format that is relatively easy to predict, however it is only exposed to instructor accounts on hub login. In testing, when the Canvas admin configured student accounts to be able to read the SIS ID, we discovered that other protected SIS attributes would have been visible to all members of the course in the Canvas UI. Various friendly name attributes (e.g. "Statistics 123, Spring '24") were inconsistent in structure or were modifiable by the instructor. So while the Canvas ID is not predictable or easily discoverable by hub staff, it is immutable and the instructor can find it in the URL for their course.

@@ -487,7 +487,7 @@

Assigning Scopes groups: - course::1234567::enrollment_type::teacher - course::1234567::enrollment_type::ta

-

This configuration is headed by a comment which describes the course and term and links to the github issue where the staff made the request. It defines a new role, course-staff-1234567, for a course with bCourse ID 1234567. It assigns scopes for accessing and administering the servers for users in group course::1234567. Members of that group include all students and course staff. It also assigns scopes for viewing lists of users at /hub/admin. It assignes these scopes to members of the affiliated course staff groups.

+

This configuration is headed by a comment which describes the course and term and links to the github issue where the staff made the request. It defines a new role, course-staff-1234567, for a course with bCourse ID 1234567. It assigns scopes for accessing and administering the servers for users in group course::1234567. Members of that group include all students and course staff. It also assigns scopes for viewing lists of users at /hub/admin. It assigns these scopes to members of the affiliated course staff groups.

This stanza is more verbose than inserting lists of users under admin_users, but it the privileges are more granular. We don't need to know who the individual course staff and they won't have more permissions than they need.

The configuration causes JupyterHub to update information in its jupyterhub.sqlite database file. When this configuraition is removed, the hub does not automatically flush out the roles and scopes from the database. So after the semester is over, it is advisable to remove this configuration and also to flush out the information in the database. There is no formal process for this, although we should develop one. We can delete the database, or we can manually remove entries from the sqlite file.

diff --git a/admins/howto/managing-multiple-user-image-repos.html b/admins/howto/managing-multiple-user-image-repos.html index a9194b219..f82c70267 100644 --- a/admins/howto/managing-multiple-user-image-repos.html +++ b/admins/howto/managing-multiple-user-image-repos.html @@ -556,7 +556,7 @@

stage

List of files to stage in the repositories. Optional, and defaults to all modified files in the repository -m MESSAGE, --message MESSAGE Commit message to use for the changes.
-

stage combines both git add ... and git commit -m, adding and commiting one or more files to the staging area before you push to a remote.

+

stage combines both git add ... and git commit -m, adding and committing one or more files to the staging area before you push to a remote.

The commit message must be a text string enclosed in quotes.

By default, --files is set to ., which will add all modified files to the staging area. You can also specify any number of files, separated by a space.

diff --git a/admins/howto/new-hub.html b/admins/howto/new-hub.html index 1a482428f..0032971cc 100644 --- a/admins/howto/new-hub.html +++ b/admins/howto/new-hub.html @@ -511,7 +511,7 @@

Determine deplo

Small courses (and some general usage courses) can use either or both of a shared node pool and filestore to save money (Basic HDD filestore instances start at 1T).

This is also a good time to determine if there are any specific software packages/libraries that need to be installed, as well as what language(s) the course will be using. This will determine which image to use, and if we will need to add additional packages to the image build.

If you’re going to use an existing node pool and/or filestore instance, you can skip either or both of the following steps and pick back up at the cookiecutter.

-

When creating a new hub, we also make sure to label the filestore and GKE/node pool resouces with both hub and <nodepool|filestore>-deployment. 99.999% of the time, the values for all three of these labels will be <hubname>.

+

When creating a new hub, we also make sure to label the filestore and GKE/node pool resources with both hub and <nodepool|filestore>-deployment. 99.999% of the time, the values for all three of these labels will be <hubname>.

Creating a new node pool

@@ -645,7 +645,7 @@

Create placeh
  • Get the node name (it will look something like gke-spring-2024-user-datahub-2023-01-04-fc70ea5b-67zs): kubectl get nodes | grep *hubname* | awk '{print $1}'

  • Get the total amount of memory allocatable to pods on this node and convert to bytes: bash kubectl get node <nodename> -o jsonpath='{.status.allocatable.memory}'

  • Get the total memory used by non-user pods/containers on this node. We explicitly ignore notebook and pause. Convert to bytes and get the sum: bash kubectl get -A pod -l 'component!=user-placeholder' \ --field-selector spec.nodeName=<nodename> \ -o jsonpath='{range .items[*].spec.containers[*]}{.name}{"\t"}{.resources.requests.memory}{"\n"}{end}' \ | egrep -v 'pause|notebook'

  • -
  • Subract the second number from the first, and then subtract another 277872640 bytes (256Mi) for “wiggle room”.

  • +
  • Subtract the second number from the first, and then subtract another 277872640 bytes (256Mi) for “wiggle room”.

  • Add an entry for the new placeholder node config in values.yaml:

  • data102:
    diff --git a/admins/howto/new-packages.html b/admins/howto/new-packages.html
    index d4fa73b14..bfdafcfff 100644
    --- a/admins/howto/new-packages.html
    +++ b/admins/howto/new-packages.html
    @@ -456,7 +456,7 @@ 

    Testing and Upgrading New Packages

    -

    It is helpful to test package additions and upgrades for yourself before they are installed for all users. You can make sure the change behaves as you think it should, and does not break anything else. Once tested, request that the change by installed for all users by by creating a new issue in github,contacting cirriculum support staff, or creating a new pull request. Ultimately, thouroughly testing changes locally and submitting a pull request will result in the software being rolled out to everyone much faster.

    +

    It is helpful to test package additions and upgrades for yourself before they are installed for all users. You can make sure the change behaves as you think it should, and does not break anything else. Once tested, request that the change by installed for all users by by creating a new issue in github,contacting cirriculum support staff, or creating a new pull request. Ultimately, thoroughly testing changes locally and submitting a pull request will result in the software being rolled out to everyone much faster.

    Install a python package in your notebook

    When testing a notebook with new version of the package, add the following line to a cell at the beginning of your notebook.

    diff --git a/admins/howto/rebuild-hub-image.html b/admins/howto/rebuild-hub-image.html index 89761c1ab..5c5f646fc 100644 --- a/admins/howto/rebuild-hub-image.html +++ b/admins/howto/rebuild-hub-image.html @@ -416,7 +416,7 @@

    Customize the Hub Docker Image

  • Run gcloud auth configure-docker us-central1-docker.pkg.dev once per machine to setup docker for authentication with the gcloud credential helper.
  • Modify the image in images/hub and make a git commit.
  • Run chartpress --push. This will build and push the hub image, and modify hub/values.yaml appropriately.
  • -
  • Make a commit with the hub/values.yaml file, so the new hub image name and tag are comitted.
  • +
  • Make a commit with the hub/values.yaml file, so the new hub image name and tag are committed.
  • Proceed to deployment as normal.
  • Some of the following commands may be required to configure your environment to run the above chartpress workflow successfully:

    diff --git a/admins/howto/rebuild-postgres-image.html b/admins/howto/rebuild-postgres-image.html index 43e67e05f..8c990c0a9 100644 --- a/admins/howto/rebuild-postgres-image.html +++ b/admins/howto/rebuild-postgres-image.html @@ -410,11 +410,11 @@

    Customize the Per-User Postgres Docker Image

    -

    We provide each student on data100 witha postgresql server. We want the python extension installed. So we inherit from the upstream postgresql docker image, and add the appropriate package.

    +

    We provide each student on data100 with a postgresql server. We want the python extension installed. So we inherit from the upstream postgresql docker image, and add the appropriate package.

    This image is in images/postgres. If you update it, you need to rebuild and push it.

    1. Modify the image in images/postgres and make a git commit.
    2. -
    3. Run chartpress --push. This will build and push the image, but not put anything in YAML. There is no place we can put thi in values.yaml, since this is only used for data100.
    4. +
    5. Run chartpress --push. This will build and push the image, but not put anything in YAML. There is no place we can put this in values.yaml, since this is only used for data100.
    6. Notice the image name + tag from the chartpress --push command, and put it in the appropriate place (under extraContainers) in data100/config/common.yaml.
    7. Make a commit with the new tag in data100/config/common.yaml.
    8. Proceed to deploy as normal.
    9. diff --git a/datahub.svg b/datahub.svg index 58439789f..46d84ae98 100644 --- a/datahub.svg +++ b/datahub.svg @@ -16,4 +16,4 @@ - \ No newline at end of file + diff --git a/incidents/2017-03-20-too-many-volumes.html b/incidents/2017-03-20-too-many-volumes.html index 558d446ff..398ed8a34 100644 --- a/incidents/2017-03-20-too-many-volumes.html +++ b/incidents/2017-03-20-too-many-volumes.html @@ -446,7 +446,7 @@

      Summary

      Timeline

      March 18, 16:30

      -

      RAM per student is reduced from 2G to 1G, as a resource optimization measure. The size of our nodes remains the same (26G RAM), and many are cordonned off and slowly decomissioned over the coming few days.

      +

      RAM per student is reduced from 2G to 1G, as a resource optimization measure. The size of our nodes remains the same (26G RAM), and many are cordonned off and slowly decommissioned over the coming few days.

      Life seems fine, given the circumstances.

      @@ -467,7 +467,7 @@

      13:03

      13:04

      -

      The simple autoscaler is stopped, on fear that it’ll be confused by the unusal mixed state of the nodes and do something wonky.

      +

      The simple autoscaler is stopped, on fear that it’ll be confused by the unusual mixed state of the nodes and do something wonky.

      13:11

      diff --git a/incidents/2017-03-23-kernel-deaths-incident.html b/incidents/2017-03-23-kernel-deaths-incident.html index a74a18634..07d8fac7f 100644 --- a/incidents/2017-03-23-kernel-deaths-incident.html +++ b/incidents/2017-03-23-kernel-deaths-incident.html @@ -518,7 +518,7 @@

      15:30

      17:25

      A very involved and laborious revert of the offending part of the patch is done in https://github.com/jupyterhub/kubespawner/pull/37. Core Jupyter Notebook dev continues to confirm this makes no sense.

      -

      https://github.com/data-8/jupyterhub-k8s/pull/152 is also merged, and deployed shortly after verifiying that everything (including starting kernels & executing code) works fine on dev. Deployed to prod and everything is fine.

      +

      https://github.com/data-8/jupyterhub-k8s/pull/152 is also merged, and deployed shortly after verifying that everything (including starting kernels & executing code) works fine on dev. Deployed to prod and everything is fine.

      diff --git a/incidents/2019-05-01-service-account-leak.html b/incidents/2019-05-01-service-account-leak.html index 976277846..cc47f55be 100644 --- a/incidents/2019-05-01-service-account-leak.html +++ b/incidents/2019-05-01-service-account-leak.html @@ -443,7 +443,7 @@

      Impact

      Timeline

      May 1 2019, 3:18 PM

      -

      A template + documentation for creating new hubs easily is pushed to GitHub as a pull request. This inadvertantly contained live credentials for pushing & pulling our (already public) docker images, and for access to our kubernetes clusters.

      +

      A template + documentation for creating new hubs easily is pushed to GitHub as a pull request. This inadvertently contained live credentials for pushing & pulling our (already public) docker images, and for access to our kubernetes clusters.

      Google immediately notified us via email within seconds that this might be a breach.

      diff --git a/incidents/2022-01-20-package-dependency-upgrade-incident.html b/incidents/2022-01-20-package-dependency-upgrade-incident.html index 96da80ed4..75bf3a6e0 100644 --- a/incidents/2022-01-20-package-dependency-upgrade-incident.html +++ b/incidents/2022-01-20-package-dependency-upgrade-incident.html @@ -449,7 +449,7 @@

      Hubs throwing 505 errors

      Summary

      PR 1 and PR 2 were merged to prod between 2 AM and 2.30 AM PST on 1/20. Difference due to the commits can be viewed here

      Due to these changes, image rebuild happened which broke multiple hubs which used that image including Datahub, ISchool, R, Data 100 and Data 140 hubs.

      -

      One of the dependenices highlighted as part of the image build had an upgrade which resulted in R hub throwing 505 error and Data 100/140 hub throwing “Error starting Kernel”. [Yuvi to fill in the right technical information]

      +

      One of the dependencies highlighted as part of the image build had an upgrade which resulted in R hub throwing 505 error and Data 100/140 hub throwing “Error starting Kernel”. [Yuvi to fill in the right technical information]

      User Impact: