berkeley-dsep-infra · shaneknapp · Oct 28, 2024 · Oct 24, 2024 · Oct 24, 2024 · Oct 24, 2024
diff --git a/deployments/dev/hubploy.yaml b/deployments/dev/hubploy.yaml
@@ -1,7 +1,3 @@
-# you will also need to update config/common.yaml to include the following for
-# the secondary image tag:
-# kubespawner_override:
-#   image: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/dev-secondary-image:df11f4f1caa1
 images:
   images:
     - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/dev-primary-image:6000a5694eab

diff --git a/deployments/logodev/hubploy.yaml b/deployments/logodev/hubploy.yaml
@@ -1,7 +1,7 @@
 images:
   images:
     # temporary update
-    - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/logodev-user-image:6432da59b518
+    - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/logodev-user-image:bc7919aa9814
 
 cluster:
   provider: gcloud

diff --git a/deployments/nature/hubploy.yaml b/deployments/nature/hubploy.yaml
@@ -1,6 +1,6 @@
 images:
   images:
-    - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/nature-user-image:fc53f089643a
+    - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/nature-user-image:829049c6fba4
 
 cluster:
   provider: gcloud

diff --git a/docs/.gitignore b/docs/.gitignore
@@ -1,2 +1,3 @@
 /.quarto/
 _site
+en/
diff --git a/docs/_quarto.yml b/docs/_quarto.yml
@@ -13,7 +13,7 @@ website:
       - icon: github
         href: https://github.com/berkeley-dsep-infra/datahub
     left:
-      - text: "Contributing"
+      - text: "Architecture and contributing"
         href: admins/pre-reqs.qmd
       - text: "Admin Tasks"
         href: tasks/documentation.qmd
@@ -28,13 +28,14 @@ website:
         text: Home
       - href: hubs.qmd
         text: JupyterHub Deployments
-      - section: "Contributing to DataHub"
+      - section: "Architecture and Contribution Overview"
         contents:
           - admins/pre-reqs.qmd
           - admins/structure.qmd
-          - admins/storage.qmd
+          - admins/cicd-github-actions.qmd
           - admins/cluster-config.qmd
           - admins/credentials.qmd
+          - admins/storage.qmd
       - section: "Common Administrator Tasks"
         contents:
           - tasks/documentation.qmd

diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd
@@ -0,0 +1,172 @@
+---
+title: Continuous Integration and Deployment
+---
+
+## Overview
+
+DataHub's continuous integration and deployment system uses both
+[Github Actions](https://github.com/features/actions) and
+[workflows](https://docs.github.com/en/actions/writing-workflows).
+
+These workflows are stored in the DataHub repo in the
+[.github/workflows/](https://github.com/berkeley-dsep-infra/datahub/tree/staging/.github/workflows) directory.
+
+The basic order of operations is as follows:
+
+1. A pull request is created in the datahub repo.
+1. The labeler workflow applies labels based on the [file type and/or location](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml).
+1. When the pull request is merged to `staging`, if the labels match any hub, support or node placeholder deployments those specific systems are deployed.
+1. When the pull request is merged to prod, only the hubs that have been modified are deployed (again based on labels).
+
+The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy),
+which is our custom wrapper for `gcloud`, `sops` and `helm`.
+
+## Github Actions Architecture
+
+### Secrets and Variables
+
+All of these workflows depend on a few Actions secrets and variables, with
+some at the organization level, and others at the repository level.
+
+#### Organization secrets and variables
+
+[GitHub Actions settings](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions) contain all of the organizational secrets and variables.
+
+##### Organization Secrets
+
+DATAHUB_CREATE_PR
+: This secret is a fine-grained personal [access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), and has the following permissions defined:
+
+  * Select repositories (only berkeley-dsep-infra/datahub)
+  * Repository permissions: Contents (read/write), Metadata (read only), Pull requests (read/write)
+
+  When adding a new image repository in the berkeley-dsep-infra org, you must
+edit this secret and manually add this repository to the access list.
+
+::: {.callout-important}
+This PAT has an lifetime of 366 days, and should be rotated at the beginning of
+every maintenance window.
+:::
+
+GAR_SECRET_KEY and GAR_SECRET_KEY_EDX
+: These secrets are for the GCP IAM roles for each GCP project given `roles/storage.admin` permissions. This allows us to push the built images to the Artifact Registry.
+
+  When adding a new image repository in the berkeley-dsep-infra org, you must
+edit this secret and manually add this repository to the access list.
+
+##### Organization Variables
+
+IMAGE_BUILDER_BOT_EMAIL and IMAGE_BUILDER_BOT_NAME
+: These are used to set the git identity in the image build workflow step that pushes a commit and creates a PR in the datahub repo.
+
+###### DataHub repository secrets
+
+GCP_PROJECT_ID
+: This is the name of our GCP project.
+
+GKE_KEY
+: This key is used in the workflows that deploy the `support` and `node-placeholder` namespaces.  It's attached to the `hubploy` service account, and has the assigned roles of `roles/container.clusterViewer` and `roles/container.developer`.
+
+SOPS_KEY
+: This key is used to decrypt our secrets using `sops`, and is attached to the `sopsaccount` service account and provides KMS access.
+
+##### User Image Repository Variables
+
+Each image repository contains two variables, which are used to identify the
+name of the hub, and the path within the Artifact Registry that it's published
+to.
+
+HUB
+: The name of the hub, natch! `datahub`, `data100`, etc.
+
+IMAGE
+: The path within the artifact registry:  `ucb-datahub-2018/user-images/<hubname>-user-image`
+
+### Single user server image modification workflow
+
+Each hub's user image is located in the berkeley-dsep-infra's organization.
+When a pull request is submitted, there are two workflows that run:
+
+1. [YAML lint](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/yaml-lint.yaml)
+2. [Build and test the image](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/build-test-image.yaml)
+
+When both tests pass, and the pull request is merged in to the `main` branch,
+a third and final workflow is run:
+
+3. [Build push and create PR](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/build-push-create-pr.yaml)
+
+This builds the image again, and when successful pushes it to our Google
+Artifact Registry and creates a pull request in the datahub repository with the
+updated image tag for that hub's deployment.
+
+### Updating the datahub repository
+
+#### Single user server image tag updates
+
+When a pull request is opened to update one or more image tags for deployments,
+the [labeler](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml)
+will apply the `hub: <hubname>` label upon creation.  When this pull request is
+merged, the [deploy-hubs workflow](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-hubs.yaml)
+is triggered.
+
+This workflow will then grab the labels from the merged pull request, see if
+any hubs need to be deployed and if so, execute a [python script](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/scripts/determine-hub-deployments.py)
+that checks the environment variables within that workflow for hubs, and emits
+a list of what's to be deployed.
+
+That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy)
+is used to deploy only the flagged hubs.
+
+
+```{mermaid}
+%% State diagram documentation at
+%% https://mermaid.js.org/syntax/stateDiagram.html
+
+stateDiagram-v2
+    image_repo: github.com/berkeley-dsep-infra/hubname-user-image
+    user_repo: github.com/username/hubname-user-image
+    image_test_build: Image is built and tested
+    image_push_build: Image is built and pushed to registry
+    pr_created: A pull request is automatically<br/>created in the Datahub repo
+    deploy_to_staging: Hub is deployed to staging
+    contributor_tests: The contributor logs into the<br/>staging hub and tests the image.
+    deploy_to_prod: Hub is deployed to prod
+
+    image_repo --> user_repo: Contributor forks the image repo.
+    user_repo --> image_repo: Contributor creates a PR.
+    image_repo --> image_test_build
+    image_test_build --> image_push_build: Test build passes and Datahub staff merge pull request
+    image_push_build --> pr_created
+    pr_created --> deploy_to_staging: Datahub staff review and merge to staging
+    deploy_to_staging --> contributor_tests
+    contributor_tests --> deploy_to_prod: Datahub staff create a PR to merge to prod
+```
+
+#### Support and node-placeholder charts
+
+Each of these deployments has their own workflow, which only runs on pushes to
+`staging`:
+
+* [deploy-support.yaml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-support.yaml)
+* [deploy-node-placeholder.yaml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-support.yaml)
+
+If the correct labels are found, it will use the **GKE_KEY** secret to run
+`helm upgrade` for the necessary deployments.
+
+#### Miscellaneous workflows
+
+There are also a couple of other workflows in the datahub repository:
+
+[ prevent-prod-merges.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/prevent-prod-merges.yml)
+: This workflow will only allow us to merge to `prod` from `staging`.
+
+[quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml)
+: This builds, renders and pushes our docs to Github Pages.
+
+### Documentation's Workflow
+
+This documentation is also [deployed by GitHub Actions](../tasks/documentation.html#action).
+
+## Original Design Document
+
+[Slides](https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing) describe the process in some more detail.
diff --git a/docs/admins/index.qmd b/docs/admins/index.qmd
@@ -1,22 +1,12 @@
-=======================
-Contributing to DataHub
-=======================
-
-.. toctree::
-   :titlesonly:
-   :maxdepth: 2
-
-   pre-reqs
-   structure
-   storage
-   cluster-config
-   credentials
-   incidents/index
-
-.. toctree::
-   :titlesonly:
-   :maxdepth: 2
-
-   howto/index
-
-   deployments/index
+---
+title: Architecture and Contribution Overview
+listing:
+  contents:
+    - pre-reqs.qmd
+    - structure.qmd
+    - cicd-github-actions.qmd
+    - cluster-config.qmd
+    - credentials.qmd
+    - storage.qmd
+  sort: false
+---
diff --git a/docs/admins/storage.qmd b/docs/admins/storage.qmd
@@ -4,19 +4,22 @@ title: User home directory storage
 
 All users on all the hubs get a home directory with persistent storage.
 
-## Why NFS?
+## Why Google Filestore?
 
-NFS isn\'t a particularly cloud-native technology. It isn\'t highly
-available nor fault tolerant by default, and is a single point of
-failure. However, it is currently the best of the alternatives available
-for user home directories, and so we use it.
+After hosting our own NFS server for user home directories, we found that NFS
+is much more difficult to manage at the scale we were at.
+
+Filestore has been rock-solid after moving to it in early 2023, and we are
+happy with the performance and cost.
+
+Our basic requirements for user storage are as follows:
 
 1.  Home directories need to be fully POSIX compliant file systems that
     work with minimal edge cases, since this is what most instructional
     code assumes. This rules out object-store backed filesystems such as
     [s3fs](https://github.com/s3fs-fuse/s3fs-fuse).
 
-2.  Users don\'t usually need guaranteed space or IOPS, so providing
+2.  Users don't usually need guaranteed space or IOPS, so providing
     them each a [persistent cloud
     disk](https://cloud.google.com/persistent-disk/) gets unnecessarily
     expensive - since we are paying for it whether it is used or not.
@@ -56,24 +59,28 @@ Filestore](https://cloud.google.com/filestore/). This was mostly due to
 NFS daemon stability issues, which caused many outages and impacted
 thousands of our users and courses.
 
-Currently each hub has it\'s own filestore instance, except for a few
+Currently each hub has it's own filestore instance, except for a few
 small courses that share one. This has proven to be much more stable and
 able to handle the load.
 
+We also still have our legacy NFS server VM running, which we use to mount the
+Filestore shares and access home directories for troubleshooting and running
+the archiver tool at the end of each semester.
+
 ## Home directory paths
 
 Each user on each hub gets their own directory on the server that gets
 treated as their home directory. The staging & prod servers share home
 directory paths, so users get the same home directories on both.
 
-For most hubs, the user\'s home directory path relative to the exported
+For most hubs, the user's home directory path relative to the exported
 filestore share is
 `<hub-name>-filestore/<hub-name>/<prod|staging>/home/<user-name>`.
 
 ## NFS Client
 
-We currently have two approaches for mounting the user\'s home directory
-into each user\'s pod.
+We currently have two approaches for mounting the user's home directory
+into each user's pod.
 
 1.  Mount the NFS Share once per node to a well known location, and use
     [hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)

diff --git a/docs/admins/structure.qmd b/docs/admins/structure.qmd
@@ -9,20 +9,6 @@ for that particular hub is stored in a standard format. For example, all
 the configuration for the primary hub used on campus (*datahub*) is
 stored under `deployments/datahub/`.
 
-### User Image (`image/`)
-
-The contents of the `image/` directory determine the environment
-provided to the user. For example, it controls:
-
-1.  Versions of Python / R / Julia available
-2.  Libraries installed, and which versions of those are installed
-3.  Specific config for Jupyter Notebook or IPython
-
-[repo2docker](https://repo2docker.readthedocs.io) is used to
-build the actual user image, so you can use any of the [supported config
-files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
-to customize the image as you wish.
-
 ### Hub Config (`config/` and `secrets/`)
 
 All our JupyterHubs are based on [Zero to JupyterHub
@@ -53,7 +39,7 @@ Files are further split into:
 
 ### `hubploy.yaml`
 
-We use [hubploy](https://github.com/yuvipanda/hubploy) to deploy our
+We use [hubploy](https://github.com/berkeley-dsep-infra/hubploy) to deploy our
 hubs in a repeatable fashion. `hubploy.yaml` contains information
 required for hubploy to work - such as cluster name, region, provider,
 etc.
@@ -68,3 +54,20 @@ Documentation is under the `docs/` folder, and is generated with
 [markdown](https://quarto.org/docs/authoring/markdown-basics.html).
 Documentation is published to <https://docs.datahub.berkeley.edu/> via a
 [GitHub Action workflow](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/quarto-docs.yml).
+
+## User Images
+
+Each user image is stored in it's own repository in the `berkeley-dsep-infra`
+organization.  You can find them [here](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all).
+
+These repositories determine the environment provided to the user. For example,
+it controls:
+
+1.  Versions of Python / R / Julia available
+2.  Libraries installed, and which versions of those are installed
+3.  Specific config for Jupyter Notebook or IPython
+
+[repo2docker](https://repo2docker.readthedocs.io) is used to
+build the actual user image, so you can use any of the [supported config
+files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
+to customize the image as you wish.