-
Notifications
You must be signed in to change notification settings - Fork 65
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Write docs on how to setup workload-identity - Create features/ subsection of the docs - Remove older instructions
- Loading branch information
Showing
17 changed files
with
259 additions
and
150 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
basehub: | ||
userServiceAccount: | ||
annotations: | ||
iam.gke.io/gcp-service-account: [email protected] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
basehub: | ||
userServiceAccount: | ||
annotations: | ||
iam.gke.io/gcp-service-account: meom-ige-staging-workload-sa@meom-ige-cnrs.iam.gserviceaccount.com |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,5 +4,4 @@ | |
auth-management.md | ||
update-env.md | ||
culling.md | ||
data-access.md | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
# Enable user access to cloud features | ||
|
||
Users of our hubs often need to be granted specific cloud permissions | ||
so they can use features of the cloud provider they are on, without | ||
having to do a bunch of cloud-provider specific setup themselves. This | ||
helps keep code cloud provider agnostic as much as possible, while also | ||
improving the security posture of our hubs. | ||
|
||
This page lists various features we offer around access to cloud resources, | ||
and how to enable them. | ||
|
||
## GCP | ||
|
||
### How it works | ||
|
||
On Google Cloud Platform, we use [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) | ||
to map a particular [Kubernetes Service Account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) | ||
to a particular [Google Cloud Service Account](https://cloud.google.com/iam/docs/service-accounts). | ||
All pods using the Kubernetes Service Account (user's jupyter notebook pods | ||
as well as dask worker pods) | ||
will have the permissions assigned to the Google Cloud Service Account. | ||
This Google Cloud Service Account is managed via terraform. | ||
|
||
(howto:features:cloud-access:gcp:access-perms)= | ||
### Enabling specific cloud access permissions | ||
|
||
1. In the `.tfvars` file for the project in which this hub is based off | ||
create (or modify) the `hub_cloud_permissions` variable. The config is | ||
like: | ||
|
||
```terraform | ||
hub_cloud_permissions = { | ||
"<hub-name-slug>": { | ||
requestor_pays : true, | ||
bucket_admin_access : ["bucket-1", "bucket-2"] | ||
hub_namespace : "<hub-name>" | ||
} | ||
} | ||
``` | ||
|
||
where: | ||
|
||
1. `<hub-name-slug>` is the name of the hub, but restricted in length. This | ||
and the cluster name together can't be more than 29 characters. `terraform` | ||
will complain if you go over this limit, so in general just use the name | ||
of the hub and shorten it only if `terraform` complains. | ||
2. `requestor_pays` enables permissions for user pods and dask worker | ||
pods to identify as the project while making requests to Google Cloud Storage | ||
buckets marked as 'requestor pays'. More details [here](topic:features:cloud:gcp:requestor-pays). | ||
3. `bucket_admin_access` lists bucket names (as specified in `user_buckets` | ||
terraform variable) all users on this hub should have full read/write | ||
access to. Used along with the [user_buckets](howto:features:cloud-access:gcp:storage-buckets) | ||
terraform variable to enable the [scratch buckets](topic:features:cloud:gcp:scratch-buckets) | ||
feature. | ||
3. `hub_namespace` is the full name of the hub, as hubs are put in Kubernetes | ||
Namespaces that are the same as their names. This is explicitly specified here | ||
because `<hub-name-slug>` could possibly be truncated. | ||
|
||
2. Run `terraform apply -var-file=projects/<cluster-var-file>.tfvars`, and look at the | ||
plan carefully. It should only be creating or modifying IAM related objects (such as roles | ||
and service accounts), and not really touch anything else. When it looks good, accept | ||
the changes and apply it. This provisions a Google Cloud Service Account (if needed) | ||
and grants it the appropriate permissions. | ||
|
||
3. We will need to connect the Kubernetes Service Account used by the jupyter and dask pods | ||
with this Google Cloud Service Account. This is done by setting an annotation on the | ||
Kubernetes Service Account. | ||
|
||
4. Run `terraform output kubernetes_sa_annotations`, this should | ||
show you a list of hubs and the annotation required to be set on them: | ||
|
||
``` | ||
$ terraform output kubernetes_sa_annotations | ||
{ | ||
"prod" = "iam.gke.io/gcp-service-account: [email protected]" | ||
"staging" = "iam.gke.io/gcp-service-account: [email protected]" | ||
} | ||
``` | ||
|
||
This shows all the annotations for all the hubs configured to provide cloud access | ||
in this cluster. You only need to care about the hub you are currently dealing with. | ||
|
||
5. (If needed) create a `.values.yaml` file specific to this hub under `config/clusters/<cluster-name>`, | ||
and add it under `helm_chart_values_files` for the appropriate hub in `config/clusters/<cluster-name>/cluster.yaml`. | ||
|
||
6. Specify the annotation from step 4, nested under `userServiceAccount.annotations`. | ||
|
||
```yaml | ||
userServiceAccount: | ||
annotations: | ||
iam.gke.io/gcp-service-account: [email protected]" | ||
``` | ||
```{note} | ||
If the hub is a `daskhub`, nest the config under a `basehub` key | ||
``` | ||
|
||
7. Get this change deployed, and users should now be able to use the requestor pays feature! | ||
Currently running users might have to restart their pods for the change to take effect. | ||
|
||
(howto:features:cloud-access:gcp:storage-buckets)= | ||
### Creating storage buckets for use with the hub | ||
|
||
See [the relevant topic page](topic:features:cloud:gcp:scratch-buckets) for | ||
users want this! | ||
|
||
1. In the `.tfvars` file for the project in which this hub is based off | ||
create (or modify) the `user_buckets` variable. The config is | ||
like: | ||
|
||
```terraform | ||
user_buckets = ["bucket1", "bucket2"] | ||
``` | ||
|
||
Since storage buckets need to be globally unique across all of Google Cloud, | ||
the actual created names are `<prefix>-<bucket-name>`, where `<prefix>` is | ||
set by the `prefi` variable in the `.tfvars` file | ||
|
||
2. Enable access to these buckets from the hub by [editing `hub_cloud_permissions`](howto:features:cloud-access:gcp:access-perms) | ||
in the same `.tfvars` file. Follow all the steps listed there - this | ||
should create the storage buckets and provide all users access to them! | ||
|
||
3. You can set the `SCRATCH_BUCKET` (and the deprecated `PANGEO_SCRATCH`) | ||
env vars on all user pods so users can use the created bucket without | ||
having to hard-code the bucket name in their code. In the hub-specific | ||
`.values.yaml` file in `config/clusters/<cluster-name>/<hub-name>.values.yaml`, | ||
set: | ||
|
||
```yaml | ||
jupyterhub: | ||
singleuser: | ||
extraEnv: | ||
SCRATCH_BUCKET: gcs://<bucket-name>/$(JUPYTERHUB_USER) | ||
``` | ||
|
||
```{note} | ||
If the hub is a `daskhub`, nest the config under a `basehub` key | ||
``` | ||
|
||
The `$(JUPYTERHUB_USER)` expands to the name of the current user for | ||
each user, so everyone gets a little prefix inside the bucket to store | ||
their own stuff without stepping on other people's objects. But this is | ||
**not a security mechanism** - everyone can access everyone else's objects! | ||
|
||
You can also add other env vars pointing to other buckets users requested. | ||
|
||
4. Get this change deployed, and users should now be able to use the buckets! | ||
Currently running users might have to restart their pods for the change to take effect. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Hub Features | ||
|
||
```{toctree} | ||
cloud-access.md | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Features available on the hubs | ||
|
||
This document is a concise description of various features we can | ||
optionally enable on a given JupyterHub. Explicit instructions on how to | ||
do so should be provided in a linked how-to document. | ||
|
||
## Cloud Permissions | ||
|
||
Users of our hubs often need to be granted specific cloud permissions | ||
so they can use features of the cloud provider they are on, without | ||
having to do a bunch of cloud-provider specific setup themselves. This | ||
helps keep code cloud provider agnostic as much as possible, while also | ||
improving the security posture of our hubs. | ||
|
||
### GCP | ||
|
||
(topic:features:cloud:gcp:requestor-pays)= | ||
#### 'Requestor Pays' access to Google Cloud Storage buckets | ||
|
||
By default, the organization *hosting* data on Google Cloud pays for both | ||
storage and bandwidth costs of the data. However, Google Cloud also offers | ||
a [requestor pays](https://cloud.google.com/storage/docs/requester-pays) | ||
option, where the bandwidth costs are paid for by the organization *requesting* | ||
the data. This is very commonly used by organizations that provide big datasets | ||
on Google Cloud storage, to sustainably share costs of maintaining the data. | ||
|
||
When this feature is enabled, users on a hub accessing cloud buckets from | ||
other organizations marked as 'requestor pays' will increase our cloud bill. | ||
Hence, this is an opt-in feature. | ||
|
||
(topic:features:cloud:gcp:scratch-buckets)= | ||
#### 'Scratch' Buckets on Google Cloud Storage | ||
|
||
Users often want one or more Google Cloud Storage [buckets](https://cloud.google.com/storage/docs/json_api/v1/buckets) | ||
to store intermediate results, share big files with other users, or | ||
to store raw data that should be accessible to everyone within the hub. | ||
We can create one more more buckets and provide *all* users on the hub | ||
*equal* access to these buckets, allowing users to create objects in them. | ||
A single bucket can also be designated as as *scratch bucket*, which will | ||
set a `SCRATCH_BUCKET` (and a deprecated `PANGEO_SCRATCH`) environment variable | ||
of the form `gcs://<bucket-name>/<user-name>`. This can be used by individual | ||
users to store objects temporarily for their own use, although there is nothing | ||
preventing other users from accessing these objects! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,7 @@ | ||
{{ if .Values.userServiceAccount.enabled -}} | ||
apiVersion: v1 | ||
kind: ServiceAccount | ||
metadata: | ||
annotations: | ||
{{- if .Values.jupyterhub.custom.cloudResources.scratchBucket.enabled }} | ||
{{- if eq .Values.jupyterhub.custom.cloudResources.provider "gcp" }} | ||
iam.gke.io/gcp-service-account: {{ include "cloudResources.gcp.serviceAccountName" .}}@{{ .Values.jupyterhub.custom.cloudResources.gcp.projectId }}.iam.gserviceaccount.com | ||
{{- end }} | ||
{{- end }} | ||
annotations: {{ .Values.userServiceAccount.annotations | toJson}} | ||
name: user-sa | ||
{{- end }} |
Oops, something went wrong.