From 2b4a0c883ac0b042a902bc556f142d76d5f05cef Mon Sep 17 00:00:00 2001 From: "Image Builder Bot[tm]" Date: Thu, 24 Oct 2024 20:18:53 +0000 Subject: [PATCH 01/17] update logodev image tag to bc7919aa9814 --- deployments/logodev/hubploy.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deployments/logodev/hubploy.yaml b/deployments/logodev/hubploy.yaml index 22b37d2be..d4fc3195f 100644 --- a/deployments/logodev/hubploy.yaml +++ b/deployments/logodev/hubploy.yaml @@ -1,7 +1,7 @@ images: images: # temporary update - - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/logodev-user-image:6432da59b518 + - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/logodev-user-image:bc7919aa9814 cluster: provider: gcloud From cb4db73bf36f954ffeb60843de7541b1ef1da8d3 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Thu, 24 Oct 2024 14:19:04 -0700 Subject: [PATCH 02/17] add CI/CD documentation --- docs/.gitignore | 1 + docs/admins/cicd-github-actions.qmd | 156 ++++++++++++++++++++++++++++ 2 files changed, 157 insertions(+) create mode 100644 docs/admins/cicd-github-actions.qmd diff --git a/docs/.gitignore b/docs/.gitignore index 4d7fb32e9..224974b30 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -1,2 +1,3 @@ /.quarto/ _site +en/ diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd new file mode 100644 index 000000000..c0937b70c --- /dev/null +++ b/docs/admins/cicd-github-actions.qmd @@ -0,0 +1,156 @@ +--- +title: Datahub CI/CD +--- + +## Datahub CI/CD + +Datahub's continuous integration and deployment system uses both +[Github Actions](https://github.com/features/actions) and +[workflows](https://docs.github.com/en/actions/writing-workflows). + +These workflows are stored in the Datahub repo in the +[.github/workflows/ directory](https://github.com/berkeley-dsep-infra/datahub/tree/staging/.github/workflows). + +The basic order of operations is as follows: + +1. PR is created in the datahub repo. +2. The labeler workflow applies labels based on the [file type and/or location](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml). +3. On PR merge to staging, if the labels match any hub, support or node placeholder deployments those specific systems are deployed. +4. On PR merge to prod, only hubs are deployed (again based on labels). + +The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy), +which is our custom wrapper for `gcloud`, `sops` and `helm`. + +## Github Actions architecture + +### Secrets and variables + +All of these workflows depend on a few Actions secrets and variables, with +some at the organization level, and others at the repository level. + +#### Organization secrets and variables + +All of the organizational secrets and variables are located [here](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). + +##### Organization Secrets + +**DATAHUB_CREATE_PR** + +This secret is a fine-grained personal [access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), and has the following permissions defined: + +* Select repositories (only berkeley-dsep-infra/datahub) +* Repository permissions: Contents (read/write), Metadata (read only), Pull requests (read/write) + +When adding a new image repository in the berkeley-dsep-infra org, you must +edit this secret and manually add this repository to the access list. + +*IMPORTANT!* This PAT has an lifetime of 366 days, and should be rotated at the +beginning of every maintenance window! + +**GAR_SECRET_KEY** and **GAR_SECRET_KEY_EDX** + +These secrets are for the GCP IAM roles for each GCP project given +`roles/storage.admin` permissions. This allows us to push the built images to +the Artifact Registry. + +When adding a new image repository in the berkeley-dsep-infra org, you must +edit this secret and manually add this repository to the access list. + +##### Organization Variables + +**IMAGE_BUILDER_BOT_EMAIL** and **IMAGE_BUILDER_BOT_NAME** + +These are used to set the git identity in the image build workflow step that +pushes a commit and creates a PR in the datahub repo. + +#### Repository secrets and variables + +##### Datahub repository secrets + +**GCP_PROJECT_ID** + +This is the name of our GCP project. + +**GKE_KEY** + +This key is used in the workflows that deploy the `support` and +`node-placeholder` namespaces. It's attached to the `hubploy` service account, +and has the assigned roles of `roles/container.clusterViewer` and +`roles/container.developer`. + +**SOPS_KEY** + +This key is used to decrypt our secrets using `sops`, and is attached to the +`sopsaccount` service account and provides KMS access. + +##### Image repository variables + +Each image repository contains two variables, which are used to identify the +name of the hub, and the path within the Artifact Registry that it's published +to. + +**HUB** + +The name of the hub, natch! datahub, data100, etc. + +**IMAGE** + +The path within the artifact registry: `ucb-datahub-2018/user-images/-user-image` + +### Single user server image modification workflow + +Each hub's user image is located in the berkeley-dsep-infra's organization. +When a pull request is submitted, there are two workflows that run: + +1. [YAML lint](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/yaml-lint.yaml) +2. [Build and test the image](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/build-test-image.yaml) + +When both tests pass, and the pull request is merged in to the `main` branch, +a third and final workflow is run: + +3. [Build push and create PR](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/build-push-create-pr.yaml) + +This builds the image again, and when successful pushes it to our Google +Artifact Registry and creates a pull request in the datahub repository with the +updated image tag for that hub's deployment. + +### Updating the datahub repository + +#### Single user server image tag updates + +When a pull request is opened to update one or more image tags for deployments, +the [labeler](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml) +will apply the `hub: ` label upon creation. When this pull request is +merged, the [deploy-hubs workflow](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-hubs.yaml) +is triggered. + +This workflow will then grab the labels from the merged pull request, see if +any hubs need to be deployed and if so, execute a [python script](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/scripts/determine-hub-deployments.py) +that checks the environment variables within that workflow for hubs, and emits +a list of what's to be deployed. + +That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy) +is used to deploy only the flagged hubs. + +#### Support and node-placeholder charts + +Each of these deployments has their own workflow, which only runs on pushes to +`staging`: + +* [deploy-support.yaml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-support.yaml) +* [deploy-node-placeholder.yaml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-support.yaml) + +If the correct labels are found, it will use the **GKE_KEY** secret to run +`helm upgrade` for the necessary deployments. + +#### Misc workflows + +There are also a couple of other workflows in the datahub repository: + +* [ prevent-prod-merges.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/prevent-prod-merges.yml) + +This workflow will only allow us to merge to `prod` from `staging`. + +* [quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml) + +This builds, renders and pushes our docs to Github Pages. From 2bbcd58c33700105103be21257ce586e214409ee Mon Sep 17 00:00:00 2001 From: shane knapp Date: Thu, 24 Oct 2024 14:22:17 -0700 Subject: [PATCH 03/17] stick a link to the proposal --- docs/admins/cicd-github-actions.qmd | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index c0937b70c..866034ebe 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -154,3 +154,7 @@ This workflow will only allow us to merge to `prod` from `staging`. * [quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml) This builds, renders and pushes our docs to Github Pages. + +## Original design document + +https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing From d22eb91c9e8065ea00f12087de1702a9aff99f2a Mon Sep 17 00:00:00 2001 From: shane knapp Date: Thu, 24 Oct 2024 14:47:39 -0700 Subject: [PATCH 04/17] remove comments --- deployments/dev/hubploy.yaml | 4 ---- 1 file changed, 4 deletions(-) diff --git a/deployments/dev/hubploy.yaml b/deployments/dev/hubploy.yaml index e69af6021..60990c8c0 100644 --- a/deployments/dev/hubploy.yaml +++ b/deployments/dev/hubploy.yaml @@ -1,7 +1,3 @@ -# you will also need to update config/common.yaml to include the following for -# the secondary image tag: -# kubespawner_override: -# image: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/dev-secondary-image:df11f4f1caa1 images: images: - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/dev-primary-image:6000a5694eab From beaa60b1c61a04d9c5e71c13fb3136bc7a331b13 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Thu, 24 Oct 2024 21:48:21 -0700 Subject: [PATCH 05/17] small formatting updates --- docs/admins/cicd-github-actions.qmd | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 866034ebe..8fff61dfe 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -2,7 +2,7 @@ title: Datahub CI/CD --- -## Datahub CI/CD +# Datahub CI/CD Datahub's continuous integration and deployment system uses both [Github Actions](https://github.com/features/actions) and @@ -21,18 +21,18 @@ The basic order of operations is as follows: The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy), which is our custom wrapper for `gcloud`, `sops` and `helm`. -## Github Actions architecture +# Github Actions architecture -### Secrets and variables +## Secrets and variables All of these workflows depend on a few Actions secrets and variables, with some at the organization level, and others at the repository level. -#### Organization secrets and variables +### Organization secrets and variables All of the organizational secrets and variables are located [here](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). -##### Organization Secrets +#### Organization Secrets **DATAHUB_CREATE_PR** @@ -44,7 +44,7 @@ This secret is a fine-grained personal [access token](https://docs.github.com/en When adding a new image repository in the berkeley-dsep-infra org, you must edit this secret and manually add this repository to the access list. -*IMPORTANT!* This PAT has an lifetime of 366 days, and should be rotated at the +*IMPORTANT:* This PAT has an lifetime of 366 days, and should be rotated at the beginning of every maintenance window! **GAR_SECRET_KEY** and **GAR_SECRET_KEY_EDX** @@ -56,7 +56,7 @@ the Artifact Registry. When adding a new image repository in the berkeley-dsep-infra org, you must edit this secret and manually add this repository to the access list. -##### Organization Variables +#### Organization Variables **IMAGE_BUILDER_BOT_EMAIL** and **IMAGE_BUILDER_BOT_NAME** @@ -83,7 +83,7 @@ and has the assigned roles of `roles/container.clusterViewer` and This key is used to decrypt our secrets using `sops`, and is attached to the `sopsaccount` service account and provides KMS access. -##### Image repository variables +#### Image repository variables Each image repository contains two variables, which are used to identify the name of the hub, and the path within the Artifact Registry that it's published @@ -97,7 +97,7 @@ The name of the hub, natch! datahub, data100, etc. The path within the artifact registry: `ucb-datahub-2018/user-images/-user-image` -### Single user server image modification workflow +## Single user server image modification workflow Each hub's user image is located in the berkeley-dsep-infra's organization. When a pull request is submitted, there are two workflows that run: @@ -114,9 +114,9 @@ This builds the image again, and when successful pushes it to our Google Artifact Registry and creates a pull request in the datahub repository with the updated image tag for that hub's deployment. -### Updating the datahub repository +## Updating the datahub repository -#### Single user server image tag updates +### Single user server image tag updates When a pull request is opened to update one or more image tags for deployments, the [labeler](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml) @@ -132,7 +132,7 @@ a list of what's to be deployed. That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy) is used to deploy only the flagged hubs. -#### Support and node-placeholder charts +### Support and node-placeholder charts Each of these deployments has their own workflow, which only runs on pushes to `staging`: @@ -143,7 +143,7 @@ Each of these deployments has their own workflow, which only runs on pushes to If the correct labels are found, it will use the **GKE_KEY** secret to run `helm upgrade` for the necessary deployments. -#### Misc workflows +### Misc workflows There are also a couple of other workflows in the datahub repository: @@ -155,6 +155,6 @@ This workflow will only allow us to merge to `prod` from `staging`. This builds, renders and pushes our docs to Github Pages. -## Original design document +# Original design document https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing From 6c14be4b63ae6f303d099a7f30f7ce112f5ee7ca Mon Sep 17 00:00:00 2001 From: shane knapp Date: Thu, 24 Oct 2024 21:54:28 -0700 Subject: [PATCH 06/17] more formatting updates --- docs/admins/cicd-github-actions.qmd | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 8fff61dfe..350b17a15 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -2,7 +2,7 @@ title: Datahub CI/CD --- -# Datahub CI/CD +# Datahub CI/CD overview Datahub's continuous integration and deployment system uses both [Github Actions](https://github.com/features/actions) and @@ -157,4 +157,5 @@ This builds, renders and pushes our docs to Github Pages. # Original design document -https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing +Here are the [slides](https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing) +that describe the process in some more detail. From ba1f6b1b9530c3478c610b149d3516d3085ea4b8 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Fri, 25 Oct 2024 12:25:57 -0700 Subject: [PATCH 07/17] additional docs for the merge and pr creation commands --- .../managing-multiple-user-image-repos.qmd | 58 ++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/docs/tasks/managing-multiple-user-image-repos.qmd b/docs/tasks/managing-multiple-user-image-repos.qmd index 7e0436e60..b801155bd 100644 --- a/docs/tasks/managing-multiple-user-image-repos.qmd +++ b/docs/tasks/managing-multiple-user-image-repos.qmd @@ -40,6 +40,11 @@ those changes usable without reinstalling or needing to hack your `PATH`. python3 -m pip install --no-cache git+https://github.com/berkeley-dsep-infra/manage-repos ``` +### Installing the `gh` tool + +To use the `pr` and `merge` sub-commands, you will also need to install the +Github CLI tool: https://github.com/cli/cli#installation + ## Usage ### Overview of git operations included in `manage-repos`: @@ -50,8 +55,10 @@ of similar repositories: * `branch`: Create a feature branch * `clone`: Clone all repositories in the config file to a location on the filesystem specified by the `--destination` argument. +* `merge`: Merge the most recent pull request in the managed repositories. * `patch`: Apply a [git patch](https://git-scm.com/docs/git-apply) to all repositories in the config file. +* `pr`: Create pull requests in the managed repositories. * `push`: Push a branch from all repos to a remote. The remote defaults to `origin`. * `stage`: Performs a `git add` and `git commit` to stage changes before @@ -78,6 +85,7 @@ options: directory. -d DESTINATION, --destination DESTINATION Location on the filesystem of the directory containing the managed repositories. Defaults to the current working directory. + --version show program's version number and exit ``` `--config` is required, and setting `--destination` is recommended. @@ -126,6 +134,31 @@ passing a different remote name with the `--set-remote` argument). After cloning, `git remote -v` will be executed for each repository to allow you to confirm that the remotes are properly set. +#### `merge` + +``` +$ usage: manage-repos merge [-h] [-b BODY] [-d] [-s {merge,rebase,squash}] + +Using the gh tool, merge the most recent pull request in the managed +repositories. Before using this command, you must authenticate with gh to +ensure that you have the correct permission for the required scopes. + +options: + -h, --help show this help message and exit + -b BODY, --body BODY The commit message to apply to the merge (optional). + -d, --delete Delete your local feature branch after the pull request + is merged (optional). + -s {merge,rebase,squash}, --strategy {merge,rebase,squash} + The pull request merge strategy to use, defaults to + 'merge'. +``` + +Be aware that the default behavior is to merge only the newest pull request in +the managed repositories. The reasoning behind this is that if you have created +pull requests across many repositories, the pull request numbers will almost +certainly be different, and adding interactive steps to merge specific pull +requests will be cumbersome. + #### `patch` ``` @@ -154,6 +187,29 @@ and the script will attempt to apply the patch to all of the repositories. If it is unable to apply the patch, the script will continue to run and notify you when complete which repositories failed to accept the patch. +#### `pr` +``` +$ manage-repos pr --help +usage: manage-repos pr [-h] [-t TITLE] [-b BODY] [-B BRANCH_DEFAULT] + [-g GITHUB_USER] + +Using the gh tool, create a pull request after pushing. + +options: + -h, --help show this help message and exit + -t TITLE, --title TITLE + Title of the pull request. + -b BODY, --body BODY Body of the pull request (optional). + -B BRANCH_DEFAULT, --branch-default BRANCH_DEFAULT + Default remote branch that the pull requests will be + merged to. This is optional and defaults to 'main'. + -g GITHUB_USER, --github-user GITHUB_USER + The GitHub username used to create the pull request. +``` + +After you've `stage`d and `push`ed your changes, this command will then create +a pull request using the `gh` tool. + #### `push` ``` @@ -191,7 +247,7 @@ options: -f FILES [FILES ...], --files FILES [FILES ...] Space-delimited list of files to stage in the repositories. Optional, and if left blank will default - to all modified files. + to all modified files in the directory. -m MESSAGE, --message MESSAGE Commit message to use for the changes. ``` From 131fa8a17048dc1103069d27f0830dd90dd85164 Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Fri, 25 Oct 2024 13:04:56 -0700 Subject: [PATCH 08/17] Add CI/CD doc to navigation. --- docs/_quarto.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/_quarto.yml b/docs/_quarto.yml index f8759eee9..eb09e7c28 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -35,6 +35,7 @@ website: - admins/storage.qmd - admins/cluster-config.qmd - admins/credentials.qmd + - admins/cicd-github-actions.qmd - section: "Common Administrator Tasks" contents: - tasks/documentation.qmd From 29739088010c3904f6810465ebb85d268a709ccf Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Fri, 25 Oct 2024 13:05:19 -0700 Subject: [PATCH 09/17] Fix headings. Use callouts, term lists. Fix a11y. In-page headings should start with level 2 since the title is considered 1. Replace some syntactic sugar with semantic syntax. Also replace use of "here" links for accessibility. --- docs/admins/cicd-github-actions.qmd | 114 ++++++++++++---------------- 1 file changed, 48 insertions(+), 66 deletions(-) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 350b17a15..dfced3c0c 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -2,102 +2,87 @@ title: Datahub CI/CD --- -# Datahub CI/CD overview +## Datahub CI/CD overview Datahub's continuous integration and deployment system uses both [Github Actions](https://github.com/features/actions) and [workflows](https://docs.github.com/en/actions/writing-workflows). These workflows are stored in the Datahub repo in the -[.github/workflows/ directory](https://github.com/berkeley-dsep-infra/datahub/tree/staging/.github/workflows). +[.github/workflows/](https://github.com/berkeley-dsep-infra/datahub/tree/staging/.github/workflows) directory. The basic order of operations is as follows: 1. PR is created in the datahub repo. -2. The labeler workflow applies labels based on the [file type and/or location](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml). -3. On PR merge to staging, if the labels match any hub, support or node placeholder deployments those specific systems are deployed. -4. On PR merge to prod, only hubs are deployed (again based on labels). +1. The labeler workflow applies labels based on the [file type and/or location](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml). +1. On PR merge to staging, if the labels match any hub, support or node placeholder deployments those specific systems are deployed. +1. On PR merge to prod, only hubs are deployed (again based on labels). The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy), which is our custom wrapper for `gcloud`, `sops` and `helm`. -# Github Actions architecture +## Github Actions architecture -## Secrets and variables +### Secrets and Variables All of these workflows depend on a few Actions secrets and variables, with some at the organization level, and others at the repository level. -### Organization secrets and variables +#### Organization secrets and variables -All of the organizational secrets and variables are located [here](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). +[GitHub Actions settings](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions) contain all of the organizational secrets and variables. -#### Organization Secrets +##### Organization Secrets -**DATAHUB_CREATE_PR** +DATAHUB_CREATE_PR +: This secret is a fine-grained personal [access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), and has the following permissions defined: -This secret is a fine-grained personal [access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), and has the following permissions defined: + * Select repositories (only berkeley-dsep-infra/datahub) + * Repository permissions: Contents (read/write), Metadata (read only), Pull requests (read/write) -* Select repositories (only berkeley-dsep-infra/datahub) -* Repository permissions: Contents (read/write), Metadata (read only), Pull requests (read/write) - -When adding a new image repository in the berkeley-dsep-infra org, you must + When adding a new image repository in the berkeley-dsep-infra org, you must edit this secret and manually add this repository to the access list. -*IMPORTANT:* This PAT has an lifetime of 366 days, and should be rotated at the -beginning of every maintenance window! - -**GAR_SECRET_KEY** and **GAR_SECRET_KEY_EDX** +::: {.callout-important} +This PAT has an lifetime of 366 days, and should be rotated at the beginning of +every maintenance window. +::: -These secrets are for the GCP IAM roles for each GCP project given -`roles/storage.admin` permissions. This allows us to push the built images to -the Artifact Registry. +GAR_SECRET_KEY and GAR_SECRET_KEY_EDX +: These secrets are for the GCP IAM roles for each GCP project given `roles/storage.admin` permissions. This allows us to push the built images to the Artifact Registry. -When adding a new image repository in the berkeley-dsep-infra org, you must + When adding a new image repository in the berkeley-dsep-infra org, you must edit this secret and manually add this repository to the access list. -#### Organization Variables - -**IMAGE_BUILDER_BOT_EMAIL** and **IMAGE_BUILDER_BOT_NAME** - -These are used to set the git identity in the image build workflow step that -pushes a commit and creates a PR in the datahub repo. - -#### Repository secrets and variables - -##### Datahub repository secrets +##### Organization Variables -**GCP_PROJECT_ID** +IMAGE_BUILDER_BOT_EMAIL and IMAGE_BUILDER_BOT_NAME +: These are used to set the git identity in the image build workflow step that pushes a commit and creates a PR in the datahub repo. -This is the name of our GCP project. +###### Datahub repository secrets -**GKE_KEY** +GCP_PROJECT_ID +: This is the name of our GCP project. -This key is used in the workflows that deploy the `support` and -`node-placeholder` namespaces. It's attached to the `hubploy` service account, -and has the assigned roles of `roles/container.clusterViewer` and -`roles/container.developer`. +GKE_KEY +: This key is used in the workflows that deploy the `support` and `node-placeholder` namespaces. It's attached to the `hubploy` service account, and has the assigned roles of `roles/container.clusterViewer` and `roles/container.developer`. -**SOPS_KEY** +SOPS_KEY +: This key is used to decrypt our secrets using `sops`, and is attached to the `sopsaccount` service account and provides KMS access. -This key is used to decrypt our secrets using `sops`, and is attached to the -`sopsaccount` service account and provides KMS access. - -#### Image repository variables +##### User Image Repository Variables Each image repository contains two variables, which are used to identify the name of the hub, and the path within the Artifact Registry that it's published to. -**HUB** - -The name of the hub, natch! datahub, data100, etc. - -**IMAGE** +HUB +: The name of the hub, natch! `datahub`, `data100`, etc. -The path within the artifact registry: `ucb-datahub-2018/user-images/-user-image` +IMAGE +: The path within the artifact registry: `ucb-datahub-2018/user-images/-user-image` -## Single user server image modification workflow +### Single user server image modification workflow Each hub's user image is located in the berkeley-dsep-infra's organization. When a pull request is submitted, there are two workflows that run: @@ -114,9 +99,9 @@ This builds the image again, and when successful pushes it to our Google Artifact Registry and creates a pull request in the datahub repository with the updated image tag for that hub's deployment. -## Updating the datahub repository +### Updating the datahub repository -### Single user server image tag updates +#### Single user server image tag updates When a pull request is opened to update one or more image tags for deployments, the [labeler](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml) @@ -132,7 +117,7 @@ a list of what's to be deployed. That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy) is used to deploy only the flagged hubs. -### Support and node-placeholder charts +#### Support and node-placeholder charts Each of these deployments has their own workflow, which only runs on pushes to `staging`: @@ -143,19 +128,16 @@ Each of these deployments has their own workflow, which only runs on pushes to If the correct labels are found, it will use the **GKE_KEY** secret to run `helm upgrade` for the necessary deployments. -### Misc workflows +#### Miscellaneous workflows There are also a couple of other workflows in the datahub repository: -* [ prevent-prod-merges.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/prevent-prod-merges.yml) - -This workflow will only allow us to merge to `prod` from `staging`. - -* [quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml) +[ prevent-prod-merges.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/prevent-prod-merges.yml) +: This workflow will only allow us to merge to `prod` from `staging`. -This builds, renders and pushes our docs to Github Pages. +[quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml) +: This builds, renders and pushes our docs to Github Pages. -# Original design document +## Original Design Document -Here are the [slides](https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing) -that describe the process in some more detail. +[Slides](https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing) describe the process in some more detail. From 11a16c3dad8630de6a7da3d88b04d5fe53598b54 Mon Sep 17 00:00:00 2001 From: shane knapp Date: Fri, 25 Oct 2024 13:05:36 -0700 Subject: [PATCH 10/17] updates to a bunch of docs --- docs/_quarto.yml | 7 ++++--- docs/admins/index.qmd | 9 +++++---- docs/admins/storage.qmd | 27 +++++++++++++++++---------- docs/admins/structure.qmd | 33 ++++++++++++++++++--------------- 4 files changed, 44 insertions(+), 32 deletions(-) diff --git a/docs/_quarto.yml b/docs/_quarto.yml index f8759eee9..33f401eba 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -13,7 +13,7 @@ website: - icon: github href: https://github.com/berkeley-dsep-infra/datahub left: - - text: "Contributing" + - text: "Architecture and contributing" href: admins/pre-reqs.qmd - text: "Admin Tasks" href: tasks/documentation.qmd @@ -28,13 +28,14 @@ website: text: Home - href: hubs.qmd text: JupyterHub Deployments - - section: "Contributing to DataHub" + - section: "Datahub architecture and contribution overview" contents: - admins/pre-reqs.qmd - admins/structure.qmd - - admins/storage.qmd + - admins/cicd-github-actions.qmd - admins/cluster-config.qmd - admins/credentials.qmd + - admins/storage.qmd - section: "Common Administrator Tasks" contents: - tasks/documentation.qmd diff --git a/docs/admins/index.qmd b/docs/admins/index.qmd index 943ba64b8..ee3a27a5b 100644 --- a/docs/admins/index.qmd +++ b/docs/admins/index.qmd @@ -1,6 +1,6 @@ -======================= -Contributing to DataHub -======================= +============================================== +Datahub architecture and contribution overview +============================================== .. toctree:: :titlesonly: @@ -8,9 +8,10 @@ Contributing to DataHub pre-reqs structure - storage + cicd-github-actions cluster-config credentials + storage incidents/index .. toctree:: diff --git a/docs/admins/storage.qmd b/docs/admins/storage.qmd index 6063706ee..040b6fcce 100644 --- a/docs/admins/storage.qmd +++ b/docs/admins/storage.qmd @@ -4,19 +4,22 @@ title: User home directory storage All users on all the hubs get a home directory with persistent storage. -## Why NFS? +## Why Google Filestore? -NFS isn\'t a particularly cloud-native technology. It isn\'t highly -available nor fault tolerant by default, and is a single point of -failure. However, it is currently the best of the alternatives available -for user home directories, and so we use it. +After hosting our own NFS server for user home directories, we found that NFS +is much more difficult to manage at the scale we were at. + +Filestore has been rock-solid after moving to it in early 2023, and we are +happy with the performance and cost. + +Our basic requirements for user storage are as follows: 1. Home directories need to be fully POSIX compliant file systems that work with minimal edge cases, since this is what most instructional code assumes. This rules out object-store backed filesystems such as [s3fs](https://github.com/s3fs-fuse/s3fs-fuse). -2. Users don\'t usually need guaranteed space or IOPS, so providing +2. Users don't usually need guaranteed space or IOPS, so providing them each a [persistent cloud disk](https://cloud.google.com/persistent-disk/) gets unnecessarily expensive - since we are paying for it whether it is used or not. @@ -56,24 +59,28 @@ Filestore](https://cloud.google.com/filestore/). This was mostly due to NFS daemon stability issues, which caused many outages and impacted thousands of our users and courses. -Currently each hub has it\'s own filestore instance, except for a few +Currently each hub has it's own filestore instance, except for a few small courses that share one. This has proven to be much more stable and able to handle the load. +We also still have our legacy NFS server VM running, which we use to mount the +Filestore shares and access home directories for troubleshooting and running +the archiver tool at the end of each semester. + ## Home directory paths Each user on each hub gets their own directory on the server that gets treated as their home directory. The staging & prod servers share home directory paths, so users get the same home directories on both. -For most hubs, the user\'s home directory path relative to the exported +For most hubs, the user's home directory path relative to the exported filestore share is `-filestore///home/`. ## NFS Client -We currently have two approaches for mounting the user\'s home directory -into each user\'s pod. +We currently have two approaches for mounting the user's home directory +into each user's pod. 1. Mount the NFS Share once per node to a well known location, and use [hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) diff --git a/docs/admins/structure.qmd b/docs/admins/structure.qmd index 5cc715872..ecfe74177 100644 --- a/docs/admins/structure.qmd +++ b/docs/admins/structure.qmd @@ -9,20 +9,6 @@ for that particular hub is stored in a standard format. For example, all the configuration for the primary hub used on campus (*datahub*) is stored under `deployments/datahub/`. -### User Image (`image/`) - -The contents of the `image/` directory determine the environment -provided to the user. For example, it controls: - -1. Versions of Python / R / Julia available -2. Libraries installed, and which versions of those are installed -3. Specific config for Jupyter Notebook or IPython - -[repo2docker](https://repo2docker.readthedocs.io) is used to -build the actual user image, so you can use any of the [supported config -files](https://repo2docker.readthedocs.io/en/latest/config_files.html) -to customize the image as you wish. - ### Hub Config (`config/` and `secrets/`) All our JupyterHubs are based on [Zero to JupyterHub @@ -53,7 +39,7 @@ Files are further split into: ### `hubploy.yaml` -We use [hubploy](https://github.com/yuvipanda/hubploy) to deploy our +We use [hubploy](https://github.com/berkeley-dsep-infra/hubploy) to deploy our hubs in a repeatable fashion. `hubploy.yaml` contains information required for hubploy to work - such as cluster name, region, provider, etc. @@ -68,3 +54,20 @@ Documentation is under the `docs/` folder, and is generated with [markdown](https://quarto.org/docs/authoring/markdown-basics.html). Documentation is published to via a [GitHub Action workflow](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/quarto-docs.yml). + +## User Images + +Each user image is stored in it's own repository in the `berkeley-dsep-infra` +organization. You can find them [here](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all). + +These repositories determine the environment provided to the user. For example, +it controls: + +1. Versions of Python / R / Julia available +2. Libraries installed, and which versions of those are installed +3. Specific config for Jupyter Notebook or IPython + +[repo2docker](https://repo2docker.readthedocs.io) is used to +build the actual user image, so you can use any of the [supported config +files](https://repo2docker.readthedocs.io/en/latest/config_files.html) +to customize the image as you wish. From 9347e73fed98ecd05f4bfc1d5fc968c3e33fc21a Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Fri, 25 Oct 2024 13:11:59 -0700 Subject: [PATCH 11/17] Fix capitalization. --- docs/admins/cicd-github-actions.qmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index dfced3c0c..3ff3ebece 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -1,14 +1,14 @@ --- -title: Datahub CI/CD +title: DataHub CI/CD --- -## Datahub CI/CD overview +## Overview -Datahub's continuous integration and deployment system uses both +DataHub's continuous integration and deployment system uses both [Github Actions](https://github.com/features/actions) and [workflows](https://docs.github.com/en/actions/writing-workflows). -These workflows are stored in the Datahub repo in the +These workflows are stored in the DataHub repo in the [.github/workflows/](https://github.com/berkeley-dsep-infra/datahub/tree/staging/.github/workflows) directory. The basic order of operations is as follows: @@ -59,7 +59,7 @@ edit this secret and manually add this repository to the access list. IMAGE_BUILDER_BOT_EMAIL and IMAGE_BUILDER_BOT_NAME : These are used to set the git identity in the image build workflow step that pushes a commit and creates a PR in the datahub repo. -###### Datahub repository secrets +###### DataHub repository secrets GCP_PROJECT_ID : This is the name of our GCP project. From e5ef2ced059024ab85da441ac8b7346120a20fd4 Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Fri, 25 Oct 2024 13:28:11 -0700 Subject: [PATCH 12/17] Example of a listing on an index page. --- docs/_quarto.yml | 2 +- docs/admins/index.qmd | 35 ++++++++++++----------------------- 2 files changed, 13 insertions(+), 24 deletions(-) diff --git a/docs/_quarto.yml b/docs/_quarto.yml index 33f401eba..f7b946b1d 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -28,7 +28,7 @@ website: text: Home - href: hubs.qmd text: JupyterHub Deployments - - section: "Datahub architecture and contribution overview" + - section: "Architecture and Contribution Overview" contents: - admins/pre-reqs.qmd - admins/structure.qmd diff --git a/docs/admins/index.qmd b/docs/admins/index.qmd index ee3a27a5b..64e70605d 100644 --- a/docs/admins/index.qmd +++ b/docs/admins/index.qmd @@ -1,23 +1,12 @@ -============================================== -Datahub architecture and contribution overview -============================================== - -.. toctree:: - :titlesonly: - :maxdepth: 2 - - pre-reqs - structure - cicd-github-actions - cluster-config - credentials - storage - incidents/index - -.. toctree:: - :titlesonly: - :maxdepth: 2 - - howto/index - - deployments/index +--- +title: Architecture and Contribution Overview +listing: + contents: + - pre-reqs.qmd + - structure.qmd + - cicd-github-actions.qmd + - cluster-config.qmd + - credentials.qmd + - storage.qmd + sort: false +--- From 32265d6b0ed9f4c6c2930c39ccb78db5f6f91502 Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Fri, 25 Oct 2024 14:06:50 -0700 Subject: [PATCH 13/17] Reference documentation's CI/CD process. --- docs/admins/cicd-github-actions.qmd | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 3ff3ebece..7f29b1868 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -1,5 +1,5 @@ --- -title: DataHub CI/CD +title: Continuous Integration and Deployment --- ## Overview @@ -21,7 +21,7 @@ The basic order of operations is as follows: The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy), which is our custom wrapper for `gcloud`, `sops` and `helm`. -## Github Actions architecture +## Github Actions Architecture ### Secrets and Variables @@ -138,6 +138,10 @@ There are also a couple of other workflows in the datahub repository: [quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml) : This builds, renders and pushes our docs to Github Pages. +### Documentation's Workflow + +This documentation is also [deployed by GitHub Actions](../tasks/documentation.html#action). + ## Original Design Document [Slides](https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing) describe the process in some more detail. From fa32240c677e0b4120e59e57f4c310430c1f1abc Mon Sep 17 00:00:00 2001 From: shane knapp Date: Fri, 25 Oct 2024 14:45:35 -0700 Subject: [PATCH 14/17] adding workflow diagram --- docs/admins/ci-cd-workflow.svg | 1 + docs/admins/cicd-github-actions.qmd | 8 +++++--- 2 files changed, 6 insertions(+), 3 deletions(-) create mode 100644 docs/admins/ci-cd-workflow.svg diff --git a/docs/admins/ci-cd-workflow.svg b/docs/admins/ci-cd-workflow.svg new file mode 100644 index 000000000..25eb04206 --- /dev/null +++ b/docs/admins/ci-cd-workflow.svg @@ -0,0 +1 @@ + diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 7f29b1868..8491cb936 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -13,10 +13,10 @@ These workflows are stored in the DataHub repo in the The basic order of operations is as follows: -1. PR is created in the datahub repo. +1. A pull request is created in the datahub repo. 1. The labeler workflow applies labels based on the [file type and/or location](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml). -1. On PR merge to staging, if the labels match any hub, support or node placeholder deployments those specific systems are deployed. -1. On PR merge to prod, only hubs are deployed (again based on labels). +1. When the pull request is merged to `staging`, if the labels match any hub, support or node placeholder deployments those specific systems are deployed. +1. When the pull request is merged to prod, only the hubs that have been modified are deployed (again based on labels). The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy), which is our custom wrapper for `gcloud`, `sops` and `helm`. @@ -117,6 +117,8 @@ a list of what's to be deployed. That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy) is used to deploy only the flagged hubs. +![CI/CD workflow for single-user server images](ci-cd-workflow.svg) + #### Support and node-placeholder charts Each of these deployments has their own workflow, which only runs on pushes to From 19bd87c9136da385c90f343e16b302f8400e2f7e Mon Sep 17 00:00:00 2001 From: shane knapp Date: Fri, 25 Oct 2024 16:03:05 -0700 Subject: [PATCH 15/17] use a mermaid diagram of ultimate coolness --- docs/admins/ci-cd-workflow.svg | 1 - docs/admins/cicd-github-actions.qmd | 25 ++++++++++++++++++++++++- 2 files changed, 24 insertions(+), 2 deletions(-) delete mode 100644 docs/admins/ci-cd-workflow.svg diff --git a/docs/admins/ci-cd-workflow.svg b/docs/admins/ci-cd-workflow.svg deleted file mode 100644 index 25eb04206..000000000 --- a/docs/admins/ci-cd-workflow.svg +++ /dev/null @@ -1 +0,0 @@ - diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 8491cb936..03353d051 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -117,7 +117,30 @@ a list of what's to be deployed. That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy) is used to deploy only the flagged hubs. -![CI/CD workflow for single-user server images](ci-cd-workflow.svg) + +```{mermaid} +%% State diagram documentation at +%% https://mermaid.js.org/syntax/stateDiagram.html + +stateDiagram-v2 + image_repo: github.com/berkeley-dsep-infra/hubname-user-image + forked_repo: github.com/github username/hubname-user-image + image_test_build: Image is built and tested + image_push_build: Image is built and pushed to registry + pr_created: A pull request is automatically
created in the Datahub repo + deploy_to_staging: Hub is deployed to staging + contributor_tests: The contributor logs into the
staging hub and tests the image. + deploy_to_prod: Hub is deployed to prod + + image_repo --> forked_repo: Contributor forks the image repo. + forked_repo --> image_repo: Contributor creates a PR. + image_repo --> image_test_build + image_test_build --> image_push_build: Test build passes and Datahub staff merge pull request + image_push_build --> pr_created + pr_created --> deploy_to_staging: Datahub staff review and merge to staging + deploy_to_staging --> contributor_tests + contributor_tests --> deploy_to_prod: Datahub staff create a PR to merge to prod +``` #### Support and node-placeholder charts From d37e4be7518474b10790c493d68fb66d5b5d893b Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Fri, 25 Oct 2024 16:55:38 -0700 Subject: [PATCH 16/17] This fixes the alternative color scheme. Not sure why -- it was only affecting one node. --- docs/admins/cicd-github-actions.qmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/admins/cicd-github-actions.qmd b/docs/admins/cicd-github-actions.qmd index 03353d051..eb60320b7 100644 --- a/docs/admins/cicd-github-actions.qmd +++ b/docs/admins/cicd-github-actions.qmd @@ -124,7 +124,7 @@ is used to deploy only the flagged hubs. stateDiagram-v2 image_repo: github.com/berkeley-dsep-infra/hubname-user-image - forked_repo: github.com/github username/hubname-user-image + user_repo: github.com/username/hubname-user-image image_test_build: Image is built and tested image_push_build: Image is built and pushed to registry pr_created: A pull request is automatically
created in the Datahub repo @@ -132,8 +132,8 @@ stateDiagram-v2 contributor_tests: The contributor logs into the
staging hub and tests the image. deploy_to_prod: Hub is deployed to prod - image_repo --> forked_repo: Contributor forks the image repo. - forked_repo --> image_repo: Contributor creates a PR. + image_repo --> user_repo: Contributor forks the image repo. + user_repo --> image_repo: Contributor creates a PR. image_repo --> image_test_build image_test_build --> image_push_build: Test build passes and Datahub staff merge pull request image_push_build --> pr_created From c90202fd17fcd99e70cb261f14231714000c5042 Mon Sep 17 00:00:00 2001 From: "Image Builder Bot[tm]" Date: Sun, 27 Oct 2024 18:36:55 +0000 Subject: [PATCH 17/17] update nature image tag to 829049c6fba4: deployments/nature/hubploy.yaml --- deployments/nature/hubploy.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/deployments/nature/hubploy.yaml b/deployments/nature/hubploy.yaml index 27499607d..fee06c382 100644 --- a/deployments/nature/hubploy.yaml +++ b/deployments/nature/hubploy.yaml @@ -1,6 +1,6 @@ images: images: - - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/nature-user-image:fc53f089643a + - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/nature-user-image:829049c6fba4 cluster: provider: gcloud