Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merging 6416/6418/6419/6422/6423/6424/6425/6426/6427/6431 to prod #6432

Merged
merged 28 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2b4a0c8
update logodev image tag to bc7919aa9814
Oct 24, 2024
fc818a0
Merge pull request #6416 from berkeley-dsep-infra/update-logodev-imag…
shaneknapp Oct 24, 2024
cb4db73
add CI/CD documentation
shaneknapp Oct 24, 2024
2bbcd58
stick a link to the proposal
shaneknapp Oct 24, 2024
d22eb91
remove comments
shaneknapp Oct 24, 2024
c735196
Merge pull request #6419 from shaneknapp/remove-comments-from-dev-hub…
shaneknapp Oct 24, 2024
beaa60b
small formatting updates
shaneknapp Oct 25, 2024
6c14be4
more formatting updates
shaneknapp Oct 25, 2024
ba1f6b1
additional docs for the merge and pr creation commands
shaneknapp Oct 25, 2024
7ca3574
Merge pull request #6418 from shaneknapp/add-cicd-docs
shaneknapp Oct 25, 2024
131fa8a
Add CI/CD doc to navigation.
ryanlovett Oct 25, 2024
2973908
Fix headings. Use callouts, term lists. Fix a11y.
ryanlovett Oct 25, 2024
11a16c3
updates to a bunch of docs
shaneknapp Oct 25, 2024
26b085d
Merge pull request #6423 from shaneknapp/add-cicd-doc-to-index
shaneknapp Oct 25, 2024
9347e73
Fix capitalization.
ryanlovett Oct 25, 2024
1893489
Merge branch 'staging' into docs-cicd-update
ryanlovett Oct 25, 2024
e5ef2ce
Example of a listing on an index page.
ryanlovett Oct 25, 2024
32265d6
Reference documentation's CI/CD process.
ryanlovett Oct 25, 2024
45a2459
Merge pull request #6424 from ryanlovett/docs-cicd-update
ryanlovett Oct 25, 2024
d6f6ec9
Merge pull request #6425 from ryanlovett/docs-admin-listing
ryanlovett Oct 25, 2024
fa32240
adding workflow diagram
shaneknapp Oct 25, 2024
19bd87c
use a mermaid diagram of ultimate coolness
shaneknapp Oct 25, 2024
a10420f
Merge pull request #6426 from shaneknapp/add-cicd-workflow-diagram
shaneknapp Oct 25, 2024
d37e4be
This fixes the alternative color scheme.
ryanlovett Oct 25, 2024
1325039
Merge pull request #6427 from ryanlovett/docs-mermaid-colors
ryanlovett Oct 25, 2024
e0d1139
Merge pull request #6422 from shaneknapp/update-manage-repo-doc
shaneknapp Oct 26, 2024
c90202f
update nature image tag to 829049c6fba4: deployments/nature/hubploy.yaml
Oct 27, 2024
a467f76
Merge pull request #6431 from berkeley-dsep-infra/update-nature-image…
shaneknapp Oct 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions deployments/dev/hubploy.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
# you will also need to update config/common.yaml to include the following for
# the secondary image tag:
# kubespawner_override:
# image: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/dev-secondary-image:df11f4f1caa1
images:
images:
- name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/dev-primary-image:6000a5694eab
Expand Down
2 changes: 1 addition & 1 deletion deployments/logodev/hubploy.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
images:
images:
# temporary update
- name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/logodev-user-image:6432da59b518
- name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/logodev-user-image:bc7919aa9814

cluster:
provider: gcloud
Expand Down
2 changes: 1 addition & 1 deletion deployments/nature/hubploy.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
images:
images:
- name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/nature-user-image:fc53f089643a
- name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/nature-user-image:829049c6fba4

cluster:
provider: gcloud
Expand Down
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
/.quarto/
_site
en/
7 changes: 4 additions & 3 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ website:
- icon: github
href: https://github.com/berkeley-dsep-infra/datahub
left:
- text: "Contributing"
- text: "Architecture and contributing"
href: admins/pre-reqs.qmd
- text: "Admin Tasks"
href: tasks/documentation.qmd
Expand All @@ -28,13 +28,14 @@ website:
text: Home
- href: hubs.qmd
text: JupyterHub Deployments
- section: "Contributing to DataHub"
- section: "Architecture and Contribution Overview"
contents:
- admins/pre-reqs.qmd
- admins/structure.qmd
- admins/storage.qmd
- admins/cicd-github-actions.qmd
- admins/cluster-config.qmd
- admins/credentials.qmd
- admins/storage.qmd
- section: "Common Administrator Tasks"
contents:
- tasks/documentation.qmd
Expand Down
172 changes: 172 additions & 0 deletions docs/admins/cicd-github-actions.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
---
title: Continuous Integration and Deployment
---

## Overview

DataHub's continuous integration and deployment system uses both
[Github Actions](https://github.com/features/actions) and
[workflows](https://docs.github.com/en/actions/writing-workflows).

These workflows are stored in the DataHub repo in the
[.github/workflows/](https://github.com/berkeley-dsep-infra/datahub/tree/staging/.github/workflows) directory.

The basic order of operations is as follows:

1. A pull request is created in the datahub repo.
1. The labeler workflow applies labels based on the [file type and/or location](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml).
1. When the pull request is merged to `staging`, if the labels match any hub, support or node placeholder deployments those specific systems are deployed.
1. When the pull request is merged to prod, only the hubs that have been modified are deployed (again based on labels).

The hubs are deployed via [hubploy](https://github.com/berkeley-dsep-infra/hubploy),
which is our custom wrapper for `gcloud`, `sops` and `helm`.

## Github Actions Architecture

### Secrets and Variables

All of these workflows depend on a few Actions secrets and variables, with
some at the organization level, and others at the repository level.

#### Organization secrets and variables

[GitHub Actions settings](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions) contain all of the organizational secrets and variables.

##### Organization Secrets

DATAHUB_CREATE_PR
: This secret is a fine-grained personal [access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), and has the following permissions defined:

* Select repositories (only berkeley-dsep-infra/datahub)
* Repository permissions: Contents (read/write), Metadata (read only), Pull requests (read/write)

When adding a new image repository in the berkeley-dsep-infra org, you must
edit this secret and manually add this repository to the access list.

::: {.callout-important}
This PAT has an lifetime of 366 days, and should be rotated at the beginning of
every maintenance window.
:::

GAR_SECRET_KEY and GAR_SECRET_KEY_EDX
: These secrets are for the GCP IAM roles for each GCP project given `roles/storage.admin` permissions. This allows us to push the built images to the Artifact Registry.

When adding a new image repository in the berkeley-dsep-infra org, you must
edit this secret and manually add this repository to the access list.

##### Organization Variables

IMAGE_BUILDER_BOT_EMAIL and IMAGE_BUILDER_BOT_NAME
: These are used to set the git identity in the image build workflow step that pushes a commit and creates a PR in the datahub repo.

###### DataHub repository secrets

GCP_PROJECT_ID
: This is the name of our GCP project.

GKE_KEY
: This key is used in the workflows that deploy the `support` and `node-placeholder` namespaces. It's attached to the `hubploy` service account, and has the assigned roles of `roles/container.clusterViewer` and `roles/container.developer`.

SOPS_KEY
: This key is used to decrypt our secrets using `sops`, and is attached to the `sopsaccount` service account and provides KMS access.

##### User Image Repository Variables

Each image repository contains two variables, which are used to identify the
name of the hub, and the path within the Artifact Registry that it's published
to.

HUB
: The name of the hub, natch! `datahub`, `data100`, etc.

IMAGE
: The path within the artifact registry: `ucb-datahub-2018/user-images/<hubname>-user-image`

### Single user server image modification workflow

Each hub's user image is located in the berkeley-dsep-infra's organization.
When a pull request is submitted, there are two workflows that run:

1. [YAML lint](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/yaml-lint.yaml)
2. [Build and test the image](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/build-test-image.yaml)

When both tests pass, and the pull request is merged in to the `main` branch,
a third and final workflow is run:

3. [Build push and create PR](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/.github/workflows/build-push-create-pr.yaml)

This builds the image again, and when successful pushes it to our Google
Artifact Registry and creates a pull request in the datahub repository with the
updated image tag for that hub's deployment.

### Updating the datahub repository

#### Single user server image tag updates

When a pull request is opened to update one or more image tags for deployments,
the [labeler](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/labeler.yml)
will apply the `hub: <hubname>` label upon creation. When this pull request is
merged, the [deploy-hubs workflow](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-hubs.yaml)
is triggered.

This workflow will then grab the labels from the merged pull request, see if
any hubs need to be deployed and if so, execute a [python script](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/scripts/determine-hub-deployments.py)
that checks the environment variables within that workflow for hubs, and emits
a list of what's to be deployed.

That list is iterated over, and [hubploy](https://github.com/berkeley-dsep-infra/hubploy)
is used to deploy only the flagged hubs.


```{mermaid}
%% State diagram documentation at
%% https://mermaid.js.org/syntax/stateDiagram.html

stateDiagram-v2
image_repo: github.com/berkeley-dsep-infra/hubname-user-image
user_repo: github.com/username/hubname-user-image
image_test_build: Image is built and tested
image_push_build: Image is built and pushed to registry
pr_created: A pull request is automatically<br/>created in the Datahub repo
deploy_to_staging: Hub is deployed to staging
contributor_tests: The contributor logs into the<br/>staging hub and tests the image.
deploy_to_prod: Hub is deployed to prod

image_repo --> user_repo: Contributor forks the image repo.
user_repo --> image_repo: Contributor creates a PR.
image_repo --> image_test_build
image_test_build --> image_push_build: Test build passes and Datahub staff merge pull request
image_push_build --> pr_created
pr_created --> deploy_to_staging: Datahub staff review and merge to staging
deploy_to_staging --> contributor_tests
contributor_tests --> deploy_to_prod: Datahub staff create a PR to merge to prod
```

#### Support and node-placeholder charts

Each of these deployments has their own workflow, which only runs on pushes to
`staging`:

* [deploy-support.yaml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-support.yaml)
* [deploy-node-placeholder.yaml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/deploy-support.yaml)

If the correct labels are found, it will use the **GKE_KEY** secret to run
`helm upgrade` for the necessary deployments.

#### Miscellaneous workflows

There are also a couple of other workflows in the datahub repository:

[ prevent-prod-merges.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/prevent-prod-merges.yml)
: This workflow will only allow us to merge to `prod` from `staging`.

[quarto-docs.yml](https://github.com/berkeley-dsep-infra/datahub/blob/staging/.github/workflows/quarto-docs.yml)
: This builds, renders and pushes our docs to Github Pages.

### Documentation's Workflow

This documentation is also [deployed by GitHub Actions](../tasks/documentation.html#action).

## Original Design Document

[Slides](https://docs.google.com/presentation/d/1o_P4H8XfbdgI5NMPnjojHZOcSNHRoP5twl0E8Ern1z4/edit?usp=sharing) describe the process in some more detail.
34 changes: 12 additions & 22 deletions docs/admins/index.qmd
Original file line number Diff line number Diff line change
@@ -1,22 +1,12 @@
=======================
Contributing to DataHub
=======================

.. toctree::
:titlesonly:
:maxdepth: 2

pre-reqs
structure
storage
cluster-config
credentials
incidents/index

.. toctree::
:titlesonly:
:maxdepth: 2

howto/index

deployments/index
---
title: Architecture and Contribution Overview
listing:
contents:
- pre-reqs.qmd
- structure.qmd
- cicd-github-actions.qmd
- cluster-config.qmd
- credentials.qmd
- storage.qmd
sort: false
---
27 changes: 17 additions & 10 deletions docs/admins/storage.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,22 @@ title: User home directory storage

All users on all the hubs get a home directory with persistent storage.

## Why NFS?
## Why Google Filestore?

NFS isn\'t a particularly cloud-native technology. It isn\'t highly
available nor fault tolerant by default, and is a single point of
failure. However, it is currently the best of the alternatives available
for user home directories, and so we use it.
After hosting our own NFS server for user home directories, we found that NFS
is much more difficult to manage at the scale we were at.

Filestore has been rock-solid after moving to it in early 2023, and we are
happy with the performance and cost.

Our basic requirements for user storage are as follows:

1. Home directories need to be fully POSIX compliant file systems that
work with minimal edge cases, since this is what most instructional
code assumes. This rules out object-store backed filesystems such as
[s3fs](https://github.com/s3fs-fuse/s3fs-fuse).

2. Users don\'t usually need guaranteed space or IOPS, so providing
2. Users don't usually need guaranteed space or IOPS, so providing
them each a [persistent cloud
disk](https://cloud.google.com/persistent-disk/) gets unnecessarily
expensive - since we are paying for it whether it is used or not.
Expand Down Expand Up @@ -56,24 +59,28 @@ Filestore](https://cloud.google.com/filestore/). This was mostly due to
NFS daemon stability issues, which caused many outages and impacted
thousands of our users and courses.

Currently each hub has it\'s own filestore instance, except for a few
Currently each hub has it's own filestore instance, except for a few
small courses that share one. This has proven to be much more stable and
able to handle the load.

We also still have our legacy NFS server VM running, which we use to mount the
Filestore shares and access home directories for troubleshooting and running
the archiver tool at the end of each semester.

## Home directory paths

Each user on each hub gets their own directory on the server that gets
treated as their home directory. The staging & prod servers share home
directory paths, so users get the same home directories on both.

For most hubs, the user\'s home directory path relative to the exported
For most hubs, the user's home directory path relative to the exported
filestore share is
`<hub-name>-filestore/<hub-name>/<prod|staging>/home/<user-name>`.

## NFS Client

We currently have two approaches for mounting the user\'s home directory
into each user\'s pod.
We currently have two approaches for mounting the user's home directory
into each user's pod.

1. Mount the NFS Share once per node to a well known location, and use
[hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)
Expand Down
33 changes: 18 additions & 15 deletions docs/admins/structure.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,6 @@ for that particular hub is stored in a standard format. For example, all
the configuration for the primary hub used on campus (*datahub*) is
stored under `deployments/datahub/`.

### User Image (`image/`)

The contents of the `image/` directory determine the environment
provided to the user. For example, it controls:

1. Versions of Python / R / Julia available
2. Libraries installed, and which versions of those are installed
3. Specific config for Jupyter Notebook or IPython

[repo2docker](https://repo2docker.readthedocs.io) is used to
build the actual user image, so you can use any of the [supported config
files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
to customize the image as you wish.

### Hub Config (`config/` and `secrets/`)

All our JupyterHubs are based on [Zero to JupyterHub
Expand Down Expand Up @@ -53,7 +39,7 @@ Files are further split into:

### `hubploy.yaml`

We use [hubploy](https://github.com/yuvipanda/hubploy) to deploy our
We use [hubploy](https://github.com/berkeley-dsep-infra/hubploy) to deploy our
hubs in a repeatable fashion. `hubploy.yaml` contains information
required for hubploy to work - such as cluster name, region, provider,
etc.
Expand All @@ -68,3 +54,20 @@ Documentation is under the `docs/` folder, and is generated with
[markdown](https://quarto.org/docs/authoring/markdown-basics.html).
Documentation is published to <https://docs.datahub.berkeley.edu/> via a
[GitHub Action workflow](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/quarto-docs.yml).

## User Images

Each user image is stored in it's own repository in the `berkeley-dsep-infra`
organization. You can find them [here](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all).

These repositories determine the environment provided to the user. For example,
it controls:

1. Versions of Python / R / Julia available
2. Libraries installed, and which versions of those are installed
3. Specific config for Jupyter Notebook or IPython

[repo2docker](https://repo2docker.readthedocs.io) is used to
build the actual user image, so you can use any of the [supported config
files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
to customize the image as you wish.
Loading
Loading