Skip to content

Commit

Permalink
Merge branch 'staging' into docs-cicd-update
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanlovett authored Oct 25, 2024
2 parents 9347e73 + 26b085d commit 1893489
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 33 deletions.
8 changes: 4 additions & 4 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ website:
- icon: github
href: https://github.com/berkeley-dsep-infra/datahub
left:
- text: "Contributing"
- text: "Architecture and contributing"
href: admins/pre-reqs.qmd
- text: "Admin Tasks"
href: tasks/documentation.qmd
Expand All @@ -28,14 +28,14 @@ website:
text: Home
- href: hubs.qmd
text: JupyterHub Deployments
- section: "Contributing to DataHub"
- section: "Datahub architecture and contribution overview"
contents:
- admins/pre-reqs.qmd
- admins/structure.qmd
- admins/storage.qmd
- admins/cicd-github-actions.qmd
- admins/cluster-config.qmd
- admins/credentials.qmd
- admins/cicd-github-actions.qmd
- admins/storage.qmd
- section: "Common Administrator Tasks"
contents:
- tasks/documentation.qmd
Expand Down
9 changes: 5 additions & 4 deletions docs/admins/index.qmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
=======================
Contributing to DataHub
=======================
==============================================
Datahub architecture and contribution overview
==============================================

.. toctree::
:titlesonly:
:maxdepth: 2

pre-reqs
structure
storage
cicd-github-actions
cluster-config
credentials
storage
incidents/index

.. toctree::
Expand Down
27 changes: 17 additions & 10 deletions docs/admins/storage.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,22 @@ title: User home directory storage

All users on all the hubs get a home directory with persistent storage.

## Why NFS?
## Why Google Filestore?

NFS isn\'t a particularly cloud-native technology. It isn\'t highly
available nor fault tolerant by default, and is a single point of
failure. However, it is currently the best of the alternatives available
for user home directories, and so we use it.
After hosting our own NFS server for user home directories, we found that NFS
is much more difficult to manage at the scale we were at.

Filestore has been rock-solid after moving to it in early 2023, and we are
happy with the performance and cost.

Our basic requirements for user storage are as follows:

1. Home directories need to be fully POSIX compliant file systems that
work with minimal edge cases, since this is what most instructional
code assumes. This rules out object-store backed filesystems such as
[s3fs](https://github.com/s3fs-fuse/s3fs-fuse).

2. Users don\'t usually need guaranteed space or IOPS, so providing
2. Users don't usually need guaranteed space or IOPS, so providing
them each a [persistent cloud
disk](https://cloud.google.com/persistent-disk/) gets unnecessarily
expensive - since we are paying for it whether it is used or not.
Expand Down Expand Up @@ -56,24 +59,28 @@ Filestore](https://cloud.google.com/filestore/). This was mostly due to
NFS daemon stability issues, which caused many outages and impacted
thousands of our users and courses.

Currently each hub has it\'s own filestore instance, except for a few
Currently each hub has it's own filestore instance, except for a few
small courses that share one. This has proven to be much more stable and
able to handle the load.

We also still have our legacy NFS server VM running, which we use to mount the
Filestore shares and access home directories for troubleshooting and running
the archiver tool at the end of each semester.

## Home directory paths

Each user on each hub gets their own directory on the server that gets
treated as their home directory. The staging & prod servers share home
directory paths, so users get the same home directories on both.

For most hubs, the user\'s home directory path relative to the exported
For most hubs, the user's home directory path relative to the exported
filestore share is
`<hub-name>-filestore/<hub-name>/<prod|staging>/home/<user-name>`.

## NFS Client

We currently have two approaches for mounting the user\'s home directory
into each user\'s pod.
We currently have two approaches for mounting the user's home directory
into each user's pod.

1. Mount the NFS Share once per node to a well known location, and use
[hostpath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)
Expand Down
33 changes: 18 additions & 15 deletions docs/admins/structure.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,6 @@ for that particular hub is stored in a standard format. For example, all
the configuration for the primary hub used on campus (*datahub*) is
stored under `deployments/datahub/`.

### User Image (`image/`)

The contents of the `image/` directory determine the environment
provided to the user. For example, it controls:

1. Versions of Python / R / Julia available
2. Libraries installed, and which versions of those are installed
3. Specific config for Jupyter Notebook or IPython

[repo2docker](https://repo2docker.readthedocs.io) is used to
build the actual user image, so you can use any of the [supported config
files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
to customize the image as you wish.

### Hub Config (`config/` and `secrets/`)

All our JupyterHubs are based on [Zero to JupyterHub
Expand Down Expand Up @@ -53,7 +39,7 @@ Files are further split into:

### `hubploy.yaml`

We use [hubploy](https://github.com/yuvipanda/hubploy) to deploy our
We use [hubploy](https://github.com/berkeley-dsep-infra/hubploy) to deploy our
hubs in a repeatable fashion. `hubploy.yaml` contains information
required for hubploy to work - such as cluster name, region, provider,
etc.
Expand All @@ -68,3 +54,20 @@ Documentation is under the `docs/` folder, and is generated with
[markdown](https://quarto.org/docs/authoring/markdown-basics.html).
Documentation is published to <https://docs.datahub.berkeley.edu/> via a
[GitHub Action workflow](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/quarto-docs.yml).

## User Images

Each user image is stored in it's own repository in the `berkeley-dsep-infra`
organization. You can find them [here](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all).

These repositories determine the environment provided to the user. For example,
it controls:

1. Versions of Python / R / Julia available
2. Libraries installed, and which versions of those are installed
3. Specific config for Jupyter Notebook or IPython

[repo2docker](https://repo2docker.readthedocs.io) is used to
build the actual user image, so you can use any of the [supported config
files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
to customize the image as you wish.

0 comments on commit 1893489

Please sign in to comment.