[Hub] - Jupyter Meets the Earth #433

consideRatio · 2021-05-27T08:58:12Z

Background

A collaboration space created for Jupyter Meets the Earth.

Setup Information

Hub auth type: GitHub
Hub administrators: @consideRatio, @andersy005
Hub url: hub.jupytearth.org
Hub logo:
Hub logo URL: https://pangeo-data.github.io/jupyter-earth/_static/jupyter-earth.png
Hub type: z2jh, dask-gateway, shared filesystem storage, shared object storage space
Hub cluster: External AWS account 286354552638 in us-west-2 region
Hub image: Want to use something based on pangeo-notebook for example, perhaps via a custom Dockerfile with a FROM statement referencing pangeo-notebook.

Important Information

Link to leads issue: Private discussion in zoom call May 26th between me, @andersy005 and @fperez
Hub config name: jmte
Community champion: @consideRatio / Erik Sundell
Hub start date: As soon as possible
Hub end date: No
Hub important dates:

Deploy To Do

Initial Hub deployment
Administrators able to log on
Community Champion satisfied with hub environment
Hub now in steady-state

yuvipanda · 2021-05-27T17:14:06Z

@consideRatio \o/. #391 (comment) is our draft docs for setting up a hub on AWS in this repo. I'd love for you try them out and we can see how it goes?

consideRatio · 2021-05-27T17:23:41Z

@yuvipanda thanks for that pointer, it is very relevant for me to have right now so I better align with various infrastructure choices. I figure its better for me to go for kops even though my experience is with eksctl at this point to not setup something different than other clusters.

At the same time, is the main reason for abandoning eks the inability to scale a managed node to zero? I ended up opting for non-managed node and everything has been fine doing that. Wait! This belongs in #431, I'll ask it there.

yuvipanda · 2021-05-27T17:25:21Z

I opened #431 to discuss EKS vs kops - I think your experience there will be invaluble. 2i2c-org/farallon-image#28 has current information on the switch - it eventually came down to costs, but I've found EKS somewhat clunky to use.

consideRatio · 2021-05-27T23:51:05Z

@yuvipanda I've now read through all documentation at:

Questions raised and answered

Reading through that with the perspective that I'll create the cloud infrastructure myself, I was uncertain what would be created by myself and what would be done by 2i2c's various scripts and such.

Will a NFS server be deployed within the k8s cluster?

No, it must be self hosted alongside or within the k8s cluster you deploy. I'll use AWS EFS.

Does the hubs in pilot-hubs assume they run in the 2i2c managed cloud projects?

No, I can provide a kubeconfig for access to any k8s cluster.

Does the hubs in pilot-hubs configure something besides the things within the k8s cluster?

No, but it is possible in GCP. There is automation to setup scratch buckets for in GCP projects by creating k8s resources of a GCP specific kind.

Does the hubs in pilot-hubs assume i setup some keychain or similar in some KMS service?

No, secrets encrypted in the repo/decrypted during deployment are encrypted/decrypted by Google KMS managed by 2i2c of which only 2i2c engineers have access (I assume).

Does the hub allow for managing custom images?

Yes, but only via the /services/configurator where the image can be set, but this means the image must first have been built and such manually by the user wanting to provide a custom image.

List of misc questions:

What is included in basehub?
- JupyterHub
- A k8s ServiceAccount for the users (user-sa)
- GCP Cloud resources for scratch buckets
- Docs k8s service and k8 deployment
- NFS PVC to reference
- NFS Share creator job
  - Seem to ensure a folder on the NFS server is created and chowned
What is included in daskhub?
- Basehub
- Dask-Gateway
What is centralized within each k8s cluster?
- The support chart including: prometheus, grafana, cert-manager, ingress-nginx
What is centralized outside the various k8s clusters?
- Google KMS for use by SOPS?
- Auth0 and an OAuth2 application registered for the hub specifically?

How does the configurator influence the user image chosen, and how does the various settings override the helm charts passed configuration?

A custom Spawner class is defined augmenting KubeSpawner with the ConfiguratorSpawnerMixin
The ConfiguratorSpawnerMixin accesses the configurator service assumedly running on localhost and it responds with the state of the configuration - this state is then used to overwrites traitlets whenever the Spawner is about to start a user pod.
I believe the state of the configurator is lost on restart of the hub pod, because its StorageBackend writes to a file that won't be persisted in a hub container think.

consideRatio · 2021-05-28T00:10:28Z

For transparency and to help me think clearly, I'm making this write up thoughts regarding using 2i2c-org/pilot-hubs as a base of configuration for the JMTE deployment.

I think it will be more likely that open source contributions become fruitful as part of 2i2c-org than in a standalone repo.
I'd like to better learn about the 2i2c-org infrastructure
I'd like to deliver a deployed hub quickly
I'm worried about abandoning my experience with hubploy which is a functional standalone project in favor of the repo's deployer script that is new to me. It worried that the deployer script is locked in to 2i2c infrastructure by being part of this repo compared to how hubploy isn't. A contribution to hubploy would feel more valuable for the open source community than to the deployer script in 2i2c due to this.
I'm not confident about either choosing kops or not instead of EKS via eksctl to deploy the k8s cluster.
- EKS costing ~50$ / month more is not an issue
- Maximum of ~30 users per node is not an issue
- I have no experience with kops, but I have experience with eksctl.
I'm not confident about what it would mean to use Auth0 instead of GitHub directly and is worried that coupling to Auth0 instead of GitHub directly can cause trouble if we would need to decouple from 2i2c.
I'm worried about the added complexity of the configurator
I'm positive about the shared / shared-readwrite folders
I'm worried about what happens if we want to use a custom Helm chart because we want to deploy some additional template because we want to make some very custom thing. Then we wouldn't change the basehub, but instead create another meta-chart.

yuvipanda · 2021-05-28T00:53:53Z

This is all beautiful, thank you for writing this up, @consideRatio!

The configurator is not required - in fact in most cases I just set the image tag in the config right now - like in https://github.com/2i2c-org/pilot-hubs/blob/fef7da6a93284d006a8536b144f3fd0a0be5a936/config/hubs/carbonplan.cluster.yaml#L59. So you can basically ignore the configurator now.

I actually think it'll be great for you to use eksctl in this case. I think we should pick and choose which one we want to use based on the circumstances - I think the ideal outcome of #431 is to determine when to use kops vs EKS. I suspect we'll end up using both for a while.

hubploy is in an interesting space. I think the current setup of 1 directory per deployment hasn't scaled well in repos with a lot of deployments IME. Too much duplication. I also think that possibly introducing jsonnet will have a longer term reduction in complexity. This is a radical change to hubploy since the set structure is one of the core parts of it. Many parts of the current scripts are cannibalized from it (particularly around sops - but perhaps that should be its own small library). hubploy grew out of the deploy scripts I had for https://github.com/berkeley-dsep-infra/datahub, and perhaps something can grow out of what we have here? I personally don't plan on doing any more work on hubploy...

But, it's extremely important to have an off-ramp from this repo - that's an essential part of right-to-replicate. My earlier thinking was to make sure we have a way to just extract out a values.yaml + base-chart config from this repo so people can continue using the same deployment without any changes. But perhaps a better way is to use this opportunity to think of a way to decouple the deployment script from this repo?

auth0 is primarily used for automating the creation of credentials, you don't have to use it! However, currently we don't have a way to store secret values to be merged in in the repo (something that hubploy has), so we'll have to build that in.

Can you give me an example of a super-custom thing you might want to deploy? My intuition would be that anything useful for JMTE will also be useful for others, so we could just incorporate that into one of the charts we have. Alternatively we can create a new meta chart.

I hope this is all helpful. Everything is nascent and malleable in this repo - I look forward to your experience and contributions shaping how things happen in this repo :)

consideRatio · 2021-05-28T01:11:33Z

@yuvipanda ❤️ thanks for your quick and thorough response!

An example of a super custom thing could be a conda-store server, but that is of course quite standalone and could run in a separate namespace and such. But, if we want to maintain that we would want to setup some automation in a separate repo, setup sops, setup a KMS location, etc also for that repo.

I'd like to get some sleep now, but I'd love to speak with you 1on1 and brainstorm a bit and then try go at full speed with practical steps towards having a functional hub for JMTE.

Would you have time to chat with me sometime during 14.00-18 in your timezone today? I assume your clock is now 06.40 btw and I'll sleep ~7 hours. I'll be available on slack at your convenience if you do, and you could if you want also schedule a time slot here.

yuvipanda · 2021-05-28T01:22:51Z

An example of a super custom thing could be a conda-store server, but that is of course quite standalone and could run in a separate namespace and such. But, if we want to maintain that we would want to setup some automation in a separate repo, setup sops, setup a KMS location, etc also for that repo.

I think of conda-store as something that would indeed be broadly useful! It could live in basehub, in fact.

I'll try book a slot now - I see it's almost 4AM for you?!

consideRatio · 2021-05-28T12:33:54Z

@yuvipanda thank you so much for your care and effort to help me learn a lot about the 2i2c setup!

Here are some notes i scribbled down while speaking with you for future reference:

eksctl folder created in 2i2c-org/pilot-hubs
- eksctl config regarding k8s cluster setup
- cloudformation config regarding s3 buckets etc
- EFS: setup-efs.py script regarding security groups etc
  - need to be in the same VPC and security groups and stuff need to happen
  - https://github.com/2i2c-org/pilot-hubs/blob/master/kops/setup-efs.py
- Cloudformation stuff should become terraform stuff in the long run
Our image is defined in pangeo-data/jupyter-earth repo
- The image could be updated by:
  - JupyterHub admin using the configurator UI
  - GitHub workflow to automatically create a PR if image was built and pushed
  - A manual PR to 2i2c-org/pilot-hubs
  - A workflow is dispatched in 2i2c-org/pilot-hubs following a image build/push in some remote repo
- We start out having the JupyterHub admins use the configurator UI under https://hub.jupytearth.org/services/configurator
Z2JH native configuration options hub.extraFiles / singleuser.extraFiles are used to mount additional files

consideRatio · 2021-05-31T00:58:32Z

PRs representing current progress

AWS infrastructure + Helm chart configuration: For reference: work towards adding JMTE hub (AWS, eksctl, cloudformation) #436
Dockerfile: image: add hub.jupytearth.org user environment image pangeo-data/jupyter-earth#52

Status summary

I have an issue getting proxy-public (or any k8s Service of type LoadBalancer) to provide a public IP, this is the error: AWS Load Balancer Controller v2.1.3 + EKSCTL (Multiple tagged SGs) eksctl-io/eksctl#3459

Events:
  Type     Reason                  Age               From                Message
  ----     ------                  ----              ----                -------
  Normal   EnsuringLoadBalancer    3s (x2 over 10s)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  3s (x2 over 8s)   service-controller  Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-0a24c650cd9fbfc68; ensure only the k8s security group is tagged; the tagged groups were sg-0731df37a5a8e6844(eksctl-jmte-cluster-ClusterSharedNodeSecurityGroup-2LRGJW3MQ4O8) sg-00efe23d69c9c0c4a(eksctl-jmte-nodegroup-core-a-SG-LGMTEQ7JLTNX)

I have an issue mounting the EFS server from the k8s pods
I've created MountTargets and an AccessPoint for the FileSystem resource. Kubelet reports a timeout trying to mount a PVC, which in turn is reported to be bound to the a NFS specific PV.

Events:
  Type     Reason       Age    From                                                   Message
  ----     ------       ----   ----                                                   -------
  Normal   Scheduled    2m36s  default-scheduler                                      Successfully assigned prod/nfs-test-fsmrn to ip-192-168-31-119.us-west-2.compute.internal
  Warning  FailedMount  33s    kubelet, ip-192-168-31-119.us-west-2.compute.internal  Unable to attach or mount volumes: unmounted volumes=[home-base], unattached volumes=[home-base default-token-7pm2j]: timed out waiting for the condition
  Warning  FailedMount  26s    kubelet, ip-192-168-31-119.us-west-2.compute.internal  MountVolume.SetUp failed for volume "home-base" : mount failed: exit status 32
  Mounting command: systemd-run
  Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/18402c62-bfe0-454c-a84c-439f5be4b319/volumes/kubernetes.io~nfs/home-base --scope -- mount -t nfs fs-01707b06.efs.us-west-2.amazonaws.com:/homes/ /var/lib/kubelet/pods/18402c62-bfe0-454c-a84c-439f5be4b319/volumes/kubernetes.io~nfs/home-base
  Output: Running scope as unit run-27728.scope.
  mount.nfs: Connection timed out

No s3 storage buckets fixed yet
No user env image build automation fixed yet, but a Dockerfile is defined for use.

@2i2c-org/tech-team anyone with a guess on what to do about the first to situations I'm in above? Note for the first issue, I've provided quite an exhaustive report in the linked issue.

yuvipanda · 2021-05-31T05:40:27Z

Yay!

For EFS, you need one mount target per subnet your EKS cluster is in, added to all the security groups in that subnet. Access points aren't used yet.

yuvipanda · 2021-05-31T05:43:11Z

What kinda functionality do you want for storage buckets? A PANGEO_SCRATCH and SCRATCH_BUCKET setup per hub?

consideRatio · 2021-05-31T06:38:23Z

Yay!

For EFS, you need one mount target per subnet your EKS cluster is in, added to all the security groups in that subnet. Access points aren't used yet.

Noooooo! Yuck, ALL security groups? Not just one or few? It turns out you can max have five per mounttarget and i have more multiple worker nodegroups...

consideRatio · 2021-05-31T06:40:35Z

A PANGEO_SCRATCH and SCRATCH_BUCKET setup per hub?

Is this documented somewhere what it means? It is not clearly defined in my mind yet even though i saw a reference about an environment variable in a startup script for pangeo-notebooks image.

yuvipanda · 2021-05-31T08:06:19Z

Is this documented somewhere what it means?

pangeo-data/pangeo-cloud-federation#610 is the upstream discussion. With https://github.com/2i2c-org/pilot-hubs/blob/fef7da6a93284d006a8536b144f3fd0a0be5a936/hub-templates/basehub/values.yaml#L322 the customization in the image is not necessary.

consideRatio · 2021-08-26T21:37:40Z

We have a JMTE deployment active and functional, I think this can be closed now!

consideRatio added the type: hub label May 27, 2021

consideRatio changed the title ~~[Hub] - [Hub name]~~ [Hub] - Jupyter Meets the Earth May 27, 2021

consideRatio mentioned this issue May 31, 2021

For reference: work towards adding JMTE hub (AWS, eksctl, cloudformation) #436

Closed

5 tasks

consideRatio closed this as completed Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hub] - Jupyter Meets the Earth #433

[Hub] - Jupyter Meets the Earth #433

consideRatio commented May 27, 2021 •

edited

Loading

yuvipanda commented May 27, 2021

consideRatio commented May 27, 2021 •

edited

Loading

yuvipanda commented May 27, 2021

consideRatio commented May 27, 2021

consideRatio commented May 28, 2021 •

edited

Loading

yuvipanda commented May 28, 2021

consideRatio commented May 28, 2021

yuvipanda commented May 28, 2021

consideRatio commented May 28, 2021

consideRatio commented May 31, 2021 •

edited

Loading

yuvipanda commented May 31, 2021

yuvipanda commented May 31, 2021

consideRatio commented May 31, 2021

consideRatio commented May 31, 2021

yuvipanda commented May 31, 2021

consideRatio commented Aug 26, 2021

[Hub] - Jupyter Meets the Earth #433

[Hub] - Jupyter Meets the Earth #433

Comments

consideRatio commented May 27, 2021 • edited Loading

Background

Setup Information

Important Information

Deploy To Do

yuvipanda commented May 27, 2021

consideRatio commented May 27, 2021 • edited Loading

yuvipanda commented May 27, 2021

consideRatio commented May 27, 2021

Questions raised and answered

consideRatio commented May 28, 2021 • edited Loading

yuvipanda commented May 28, 2021

consideRatio commented May 28, 2021

yuvipanda commented May 28, 2021

consideRatio commented May 28, 2021

consideRatio commented May 31, 2021 • edited Loading

PRs representing current progress

Status summary

yuvipanda commented May 31, 2021

yuvipanda commented May 31, 2021

consideRatio commented May 31, 2021

consideRatio commented May 31, 2021

yuvipanda commented May 31, 2021

consideRatio commented Aug 26, 2021

consideRatio commented May 27, 2021 •

edited

Loading

consideRatio commented May 27, 2021 •

edited

Loading

consideRatio commented May 28, 2021 •

edited

Loading

consideRatio commented May 31, 2021 •

edited

Loading