From 90833135cce82e3aac74315648d1225dd49315c1 Mon Sep 17 00:00:00 2001 From: YuviPanda Date: Fri, 22 Mar 2024 14:17:45 -0700 Subject: [PATCH 1/4] Cross-link setting up GPUs in AWS cluster creation When we are *creating* an AWS cluster, if we already know that GPUs are needed, we should add the nodegroup at that point already. This cross-links that, to streamline new hub turn-ups. Ref https://github.com/2i2c-org/meta/issues/897 --- docs/howto/features/gpu.md | 1 + docs/hub-deployment-guide/new-cluster/aws.md | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/docs/howto/features/gpu.md b/docs/howto/features/gpu.md index 006621aa1d..48f70c19ef 100644 --- a/docs/howto/features/gpu.md +++ b/docs/howto/features/gpu.md @@ -100,6 +100,7 @@ series nodes. 7. Ask for the increase, and wait. This can take *several working days*, so do it as early as possible! +(howto:features:gpu:aws:nodegroup)= #### Setup GPU nodegroup on eksctl We use `eksctl` with `jsonnet` to provision our kubernetes clusters on diff --git a/docs/hub-deployment-guide/new-cluster/aws.md b/docs/hub-deployment-guide/new-cluster/aws.md index f517c18d59..6c985d698f 100644 --- a/docs/hub-deployment-guide/new-cluster/aws.md +++ b/docs/hub-deployment-guide/new-cluster/aws.md @@ -72,6 +72,11 @@ This will generate the following files: 4. `terraform/aws/projects/$CLUSTER_NAME.tfvars`, a terraform variables file that will setup most of the non EKS infrastructure. +### Add GPU nodegroup if needed + +If this cluster is going to have GPUs, you should edit the generated jsonnet file +to [include a GPU nodegroups](howto:features:gpu:aws:nodegroup). + ### Create and render an eksctl config file We use an eksctl [config file](https://eksctl.io/usage/schema/) in YAML to specify From 73392f60f3f180bfb8cc3696db6c3f491481217a Mon Sep 17 00:00:00 2001 From: YuviPanda Date: Fri, 22 Mar 2024 17:01:14 -0700 Subject: [PATCH 2/4] Document that we don't need to ask for extra quota on AWS Fixes https://github.com/2i2c-org/infrastructure/issues/3780 --- .../cloud-accounts/new-aws-account.md | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md b/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md index 7e0000c3c2..1bbc76f6ee 100644 --- a/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md +++ b/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md @@ -51,13 +51,9 @@ increase_[^2] for any substantial use of their services. Quotas act as an upper bound of for example the number of CPUs from a certain machine type and the amount of public IPs that the account can acquire. -When an AWS account is created under our AWS Organization, a Service Quota -increase request is automatically submitted thanks to what AWS refer to -"Organization templates", "Quota request template", and "Template -association"[^3]. - -Following account creation, make sure to check our emails to see what is being -requested and if its approved. +When an AWS account is created under our AWS Organization, the default quotas +that AWS applies to our organization are already set up for for the new account. +By default, we don't need to request quota increases here. We typically need to increase three kinds of quotas described below. The values of these are all 'Total CPUs' and hence larger nodes consume more quota. @@ -67,7 +63,7 @@ of these are all 'Total CPUs' and hence larger nodes consume more quota. These instances are what we use for everything besides the exceptions noted below. - All our hubs will require an increase in this quota. + By default, AWS grants us 640 quota here. - **Spot instance quota** (`All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests`) @@ -75,7 +71,7 @@ of these are all 'Total CPUs' and hence larger nodes consume more quota. standard instances are. We configure these to be used by dask worker pods as created for dask-gateway provided clusters. - Our `daskhub` hubs will require an increase in this quota. + By default, AWS grants us 640 quota here. - **GPU instance or high memory instance quota** @@ -84,7 +80,8 @@ of these are all 'Total CPUs' and hence larger nodes consume more quota. High Memory instances`) is requested specifically to be able to use GPU powered machines or machines with high amounts of RAM memory. - Our custom tailored hubs will require an increase in this quota. + By default, AWS grants us 640 quota here for GPU instances and 448 for + high memory instances. ### Manually requesting a quota increase @@ -96,4 +93,3 @@ of these are all 'Total CPUs' and hence larger nodes consume more quota. [^1]: AWS documentation on creating new accounts in an Organization: [^2]: AWS documentation on service quotas: -[^3]: AWS documentation on request templates: From 4d363093261c179e244333d615461e5dd6b6fb2b Mon Sep 17 00:00:00 2001 From: YuviPanda Date: Fri, 22 Mar 2024 17:06:53 -0700 Subject: [PATCH 3/4] Get rid of a pesky 0 --- docs/hub-deployment-guide/cloud-accounts/new-aws-account.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md b/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md index 1bbc76f6ee..58feb5e404 100644 --- a/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md +++ b/docs/hub-deployment-guide/cloud-accounts/new-aws-account.md @@ -80,7 +80,7 @@ of these are all 'Total CPUs' and hence larger nodes consume more quota. High Memory instances`) is requested specifically to be able to use GPU powered machines or machines with high amounts of RAM memory. - By default, AWS grants us 640 quota here for GPU instances and 448 for + By default, AWS grants us 64 quota here for GPU instances and 448 for high memory instances. ### Manually requesting a quota increase From 6156e9d4565121566c58ae557f5df3519c01168f Mon Sep 17 00:00:00 2001 From: YuviPanda Date: Fri, 22 Mar 2024 19:11:24 -0700 Subject: [PATCH 4/4] Enable 'allusers' for smithsonian hubs Also amend the allusers config to have it setup in the rstudio home as well --- config/clusters/smithsonian/common.values.yaml | 16 ++++++++++++++++ docs/topic/infrastructure/storage-layer.md | 4 ++++ 2 files changed, 20 insertions(+) diff --git a/config/clusters/smithsonian/common.values.yaml b/config/clusters/smithsonian/common.values.yaml index 07f200e906..80d357e233 100644 --- a/config/clusters/smithsonian/common.values.yaml +++ b/config/clusters/smithsonian/common.values.yaml @@ -19,6 +19,22 @@ basehub: add_staff_user_ids_of_type: "github" jupyterhubConfigurator: enabled: false + + singleuserAdmin: + extraVolumeMounts: + - name: home + mountPath: /home/jovyan/allusers + - name: home + mountPath: /home/rstudio/allusers + # mounts below are copied from basehub's values that we override by + # specifying extraVolumeMounts (lists get overridden when helm values + # are combined) + - name: home + mountPath: /home/jovyan/shared-readwrite + subPath: _shared + - name: home + mountPath: /home/rstudio/shared-readwrite + subPath: _shared homepage: templateVars: org: diff --git a/docs/topic/infrastructure/storage-layer.md b/docs/topic/infrastructure/storage-layer.md index ec7570357b..75655f9d5b 100644 --- a/docs/topic/infrastructure/storage-layer.md +++ b/docs/topic/infrastructure/storage-layer.md @@ -87,6 +87,10 @@ jupyterhub: mountPath: /home/jovyan/allusers # Uncomment the line below to make the directory readonly for admins # readOnly: true + - name: home + mountPath: /home/rstudio/allusers + # Uncomment the line below to make the directory readonly for admins + # readOnly: true # mounts below are copied from basehub's values that we override by # specifying extraVolumeMounts (lists get overridden when helm values # are combined)