[New Hub] LEAP Pangeo #1050

choldgraf · 2022-03-02T22:54:36Z

Hub Description

LEAP Pangeo is an extension of the Pangeo project to new communities around research and education with Machine Learning. The hub's environment will be nearly identical to the Pangeo Hubs, and run on GKE, though the setup might be slightly different and we should get clarifications from @rabernat.

This website has a lot of useful information about the project: https://leap-stc.github.io/leap-pangeo/architecture.html
Lead ref: https://github.com/2i2c-org/leads/issues/63

Community Representative(s)

@rabernat

Not sure if there are others serving as leads on the project.

Important dates

Required start date: March 14th
Target start date: ASAP - they would like to get this running whenever we can get the hub set up
Any important dates for usage: Not that I know of

Hub Authentication Type

Other (may not be possible, please specify in comments)

Hub logo information

TODO: @rabernat does this look correct?

URL to Hub Image: https://leap-stc.github.io/_static/LEAP_logo.png
URL for Image Link: https://leap-stc.github.io/index.html

Hub user image

TODO: @rabernat can you advise here? Is this the Pangeo user image?

Repository for user image: { REPO LINK IF IT EXISTS }
User image registry: { REGISTRY IF ONE ALREADY EXISTS }
User image tag and name: { NAME AND TAG IF IT EXISTS }

Extra features you'd like to enable

TODO: @rabernat does it need to be in a specific data center?

Specific cloud provider or datacenter:
Dedicated Kubernetes cluster
Scalable Dask Cluster
GPUs available to users

Other relevant information

There is a GCP billing account with credits for this hub. It is under the 2i2c.org GCP organization. Here are the details:

Name: community-LEAP-NSF
ID: 01A164-923D17-3199D9

Hub URL

leap.pangeo.2i2c.cloud

Hub Type

daskhub

Tasks to deploy the hub

Engineer who will deploy the hub is assigned
Deploy information filled in above
Initial Hub deployment PR: Add LEAP hub #1074
Administrators able to log on
Community Representative satisfied with hub environment
Hub now in steady-state

Follow-up issues

choldgraf · 2022-03-02T22:56:15Z

Hey all - I put down some details for the new LEAP Pangeo hub that we're deploying for @rabernat . I think that we need to clarify some of the information above in order to know what kind of environment / hardware to set up. @rabernat could you take a look at the questions in the top comment and resolve them w/ answers or discussion?

Ref 2i2c-org#1050

yuvipanda · 2022-03-09T10:24:41Z

@rabernat ok I've deployed a standard dask based hub at https://leap.2i2c.cloud! It's configured to be similar to the pangeo hub.

Next steps:

Create a GitHub team under https://github.com/leap-stc for people who will have access to this hub and let me know, so I can grant access to people who are part of that team? Right now, everyone who has access to the pangeo hub has access to this
Try it out and let me know what else needs to change.

I'll work on adding GPUs as well.

rabernat · 2022-03-09T12:38:27Z

Hi Folks! This is awesome. Sorry for not responding earlier to this issue. Somehow I missed the notification.

Not sure if there are others serving as leads on the project.

Just me for now. May add others later.

URL to Hub Image: https://leap-stc.github.io/_static/LEAP_logo.png

URL for Image Link: https://leap-stc.github.io/index.html

👍

Repository for user image: { REPO LINK IF IT EXISTS }

User image registry: { REGISTRY IF ONE ALREADY EXISTS }

User image tag and name: { NAME AND TAG IF IT EXISTS }

We would like to use the latest image from https://github.com/pangeo-data/pangeo-docker-images/tags, currently at 2022.02.04. However, that is probably not possible due to #1031, which is preventing us from updating to the latest image due to dask gateway incompatibilities.

We also want all of the different machine types to have the option to launch the ML-version of the image. However, the ML notebook image has a small 🐛 right now (see pangeo-data/pangeo-docker-images#294).

We would like to add a larger machine type, something equivalent to e2-standard-8, with 8 vcpus and 32 GB memory.

We will need an option to attach a GPU. I am not sure which one and would appreciate an rundown of the options / costs.

Going forward, it would be great to be able to select any of the tags from a dropdown, as part of a matrix of profile-list spawner options (see jupyterhub/kubespawner#307).

Create a GitHub team under https://github.com/leap-stc for people who will have access to this hub and let me know,

I have created the leap-stc/leap-pangeo-users group. Please DO NOT retain access to the broader pangeo group.

TODO: @rabernat does it need to be in a specific data center?

GCP us-central-1 would probably be ideal, like the other cluster.

yuvipanda · 2022-03-10T06:12:28Z

@rabernat I've investigated #1031 (comment) and I think it's sorted out. The LEAP hub now has the latest pangeo image.

Next things to do:

Add an even larger instance
Investigate GPU options

yuvipanda · 2022-03-10T09:01:14Z

@rabernat I've redid the size options available:

These put one user per node as well, which I think is a better fit for research hubs.

rabernat · 2022-03-10T19:05:20Z

Has access been granted to the @leap-stc/leap-pangeo-users group? I just had a report that it was not working.

Also, where does this hub configuration live?

yuvipanda · 2022-03-10T19:14:11Z

@rabernat it's currently in this PR: #1074. Can you add me to that GitHub team, and I'll debug that?

rabernat · 2022-03-10T19:46:44Z

I actually just had a new report that it IS working, so I think we are good in terms of authorization.

rabernat · 2022-03-10T19:59:14Z

So I just got some good feedback from the LEAP Executive Committee about this hub. First off, everyone is very excited and happy to have the hub up! 🎉 So thanks for getting this off the ground. 🙏

Most of the feedback is from PIs who are very experienced at using HPC resources for supporting large research groups. I think these points will be quite universal for large and complex communities like LEAP, so I hope they can stimulate some useful discussion.

Onboarding Tutorials

As specified in our contract, 2i2c will provide onboarding training for the hubs.

Question for 2i2c: What is the timeframe and process for organizing these training sessions?

Offboarding

Over the 5-10 years of this project, people will exit the project. We need a sustainable approach to not only onboarding but offboarding.

Question for 2i2c: Beyond simply removing their access via the github group, what is the process for offboarding them and specifically purging their user data from storage so we don't continuously accumulate abandoned data?

Tiering of Access

There is a huge range of different types of participants in LEAP and users of LEAP-Pangeo: from high-schoolers who will participate in a hackathon for 1 day to senior faculty who will do cutting edge research over many users. It seems inevitable that we will need different tiers of access. Specifically, we would like to limit certain profile-list options (e.g. GPUs) to certain user groups.

Question for 2i2c: is it possible to associate distinct profiles with different user sub-groups?

Metrics and Report

This one will be difficult I think, but I am stating it clearly here: it is important for LEAP to have user-level breakdowns of hub usage and costs. This is what PIs who work on HPC centers are used to and this is what they expect here. Specifically, I would like to do a query for a specific user (e.g. myself rabernat) and see, on a weekly, daily, or monthly basis:

Total CPU hours used and associated cost
Total GPU hours used and associated cost
Total storage used and associated cost

The sum of these individual user costs should roughly add up to the total hub cost.

The reason for this is based on the PIs years of experience on HPC where a small number of users (sometimes maliciously) consume a disproportionate amount of resources. Identifying and diagnosing such situations is imperative.

Question for 2i2c: What technical developments are required to deliver this granularity of reporting? What is a reasonable time-frame for implementation?

yuvipanda · 2022-03-11T02:02:45Z

@rabernat Great questions! I opened 2i2c-org/features#8 to discuss offboarding. I'll let @choldgraf speak about some of the other questions. I also know we already have issues wrt reporting elsewhere...

choldgraf · 2022-03-14T22:25:34Z

Hey @rabernat - thanks for these follow-up questions and requests. Some of them there are plans in the works, and others will need more investigation and discussion before moving forward. I'll touch on each below:

What is the timeframe and process for organizing these training sessions?

Right now, we have a job position open for the person that will spearhead these efforts: https://2i2c.org/jobs/2022/product-community-lead/ . We expect to start reviewing applications in a week or so, and will hire somebody on a rolling basis once we find the right candidate. I expect that process to take another month at least.

In the meantime, I wonder how we can have the most impact with low-hanging fruit for the LEAP community. Can we discuss the most important things to focus on in the issues linked below? If there are specific needs that LEAP has right now, we can create focused issues for them.

Beyond simply removing their access via the github group, what is the process for offboarding them and specifically purging their user data from storage so we don't continuously accumulate abandoned data?

See the issue below where we're tracking this question. Semi-related: we have these offboarding docs but those are for an entire hub migrating off the service, not for the regular "churn" of users on a hub.

Design offboarding process that hub admins can use to offboard their users features#8

is it possible to associate distinct profiles with different user sub-groups?

I don't believe this is currently possible in JupyterHub. I looked around in KubeSpawner but didn't find anything about this specifically, so I've opened up the issue below to track and discuss:

Restrict profile_list options depending on the user jupyterhub/kubespawner#589

What technical developments are required to deliver this granularity of reporting? What is a reasonable time-frame for implementation?

We are tracking development efforts to improve reporting / monitoring in these two issues that are both actively under development. I'm not sure what the timeline is on them, but I think we'll be able to track hub-level usage/costs by the end of Q2 or so.

Formula for calculating hub-specific cloud costs on a shared cluster #730 - specifically for calculating monthly hub costs
Cloud usage monitoring and alerting infrastructure and process #328 - covers "usage monitoring and alerting"

Our current targets are to calculate "usage and costs" at the hub level, and at the user-level focus on "usage" (memory, CPU, etc) rather than calculate costs per-se. Let's discuss this one in those more specific issues?

sgibson91 · 2022-03-15T09:52:08Z

I don't believe this is currently possible in JupyterHub. I looked around in KubeSpawner but didn't find anything about this specifically, so I've opened up the issue below to track and discuss:

I actually think this is possible, it's just not default out-of-the-box and requires custom logic. See @consideRatio's wonderful Discourse post on the topic here: https://discourse.jupyter.org/t/tailoring-spawn-options-and-server-configuration-to-certain-users/8449 (I will add this to the related issue too. Edit: Ah, I see it's already been mentioned over there!)

choldgraf · 2022-03-15T22:34:24Z

@sgibson91 good point! Indeed @consideRatio provided some helpful comments there as well. I've opened up a 2i2c issue to track this one, since it seems the change wouldn't be in KubeSpawner but instead would be in our config / deployment: #1120

I believe that we have all major parts of this hub worked out, so once #1074 is merged I think we can close this issue and spot-check more feature improvements or issues in support channels + dev issues. Anybody object to that?

rabernat · 2022-03-24T18:48:48Z

It was great to read jupyterhub/kubespawner#589 (comment) and @consideRatio's suggestion of how to implement custom spawner logic. It sounds like this is technically feasible for 2i2c today. Based on this I would like to request that 2i2c implement this sort of customized spawner for the LEAP hub.

To begin, we would like two tiers:

tier	privileges
Public tier `leap-stc/leap-pangeo-users`	Access to "Small" and "Medium" machine types
Research tier `leap-stc/leap-pangeo-research`	Access to all machine types plus GPU option

Having tiered access is very important to the LEAP executive committee. Delivering this feature quickly will be a win for 2i2c in terms of demonstrating ability to be responsive to feature requests, building trust from the LEAP PIs.

consideRatio · 2022-03-24T19:31:06Z

It makes me happy you thought what I've written it was helpful @rabernat!

@rabernat are leap-stc/leap-pangeo-users and leap-stc/leap-pangeo-research teams defined in the leap-stc GitHub organization, and based on being part of those teams - different permissions should be granted with what machines are made available?

If so I think the following issue is of very high relevance to address: jupyterhub/oauthenticator#492, it is about retaining the information captured during authentication about github org/team membership for later use. That could for example be when a user is about to be presented spawn options - which is at a separate time than during login even though it can be something happening in a quick succession.

choldgraf · 2022-03-24T21:34:59Z

I've opened up an issue to track this action, since it is complex enough that I think it warrants its own description / implementation discussion, etc:

[LEAP Hub] Allow for profile list options based on GitHub team membership #1146

Also added it to our project backlog so that we can consider it in the context of the other development efforts we're undertaking. Agreed that having a nice story for this will be impactful for many, and it would be extra useful since LEAP could use this feature right now.

Ref 2i2c-org#1050

rabernat · 2022-03-24T22:22:16Z

are leap-stc/leap-pangeo-users and leap-stc/leap-pangeo-research teams defined in the leap-stc GitHub organization

yes, and both are public:

https://github.com/orgs/leap-stc/teams/leap-pangeo-users (parent group)
https://github.com/orgs/leap-stc/teams/leap-pangeo-research (child group)

There is also

https://github.com/orgs/leap-stc/teams/leap-pangeo-education (unused so far but may be part of a future configuration)

consideRatio · 2022-03-24T22:24:48Z

@rabernat note they don't look to be public to me, i get

rabernat · 2022-03-25T00:25:26Z

Ok I think I used the wrong term. These groups are "visible"

I believe the 2i2c oauth app has the scope to view them and see the membership. But clearly you're correct: they are not "public".

rabernat · 2022-04-07T15:38:52Z

I am checking in to see if there is any progress on the issue of the custom spawner for the LEAP hub? I would like to be able to share an update with the LEAP executive committee.

yuvipanda · 2022-04-08T03:53:58Z

@rabernat I am going to start actively working on it this week, and should have an update on how long this might take soon.

yuvipanda · 2022-04-08T07:25:04Z

@rabernat moving conversations about the github teams based profiles to #1146

damianavila · 2022-04-13T19:56:03Z

We agreed at the planning meeting this is completed and any follow-ups already have dedicated issues.

profile_list is now dynamically generated, based on the GH teams user is a part of. This list of teams is refreshed only during login - so user needs to log out and log back in to see new teams! This also means that users removed from teams on GH will still have access to the profiles until they are logged out from the admin panel too (to be fixed) This approach is taken over customizing options_form to protect against users just bypassing the options form and using the API directly to spawn servers. Deployed to the leap hub, except 'large' & 'huge' is only available to leap-stc:leap-pangeo-research members, not to leap-stc:leap-pangeo-users members - based on 2i2c-org#1050 (comment) Fixes 2i2c-org#1146

choldgraf added the type: hub label Mar 2, 2022

choldgraf added this to DEPRECATED Engineering and Product Backlog Mar 2, 2022

choldgraf moved this to Needs Shaping / Refinement in DEPRECATED Engineering and Product Backlog Mar 2, 2022

yuvipanda self-assigned this Mar 9, 2022

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Mar 9, 2022

Add LEAP hub

04e7f44

Ref 2i2c-org#1050

yuvipanda mentioned this issue Mar 9, 2022

Add LEAP hub #1074

Merged

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Mar 9, 2022

Add LEAP hub

93aba4e

Ref 2i2c-org#1050

yuvipanda mentioned this issue Mar 11, 2022

Design offboarding process that hub admins can use to offboard their users 2i2c-org/features#8

Open

3 tasks

choldgraf mentioned this issue Mar 14, 2022

Create ongoing training and learning opportunities for users 2i2c-org/docs#128

Open

choldgraf mentioned this issue Mar 24, 2022

[LEAP Hub] Allow for profile list options based on GitHub team membership #1146

Closed

3 tasks

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Mar 24, 2022

Add LEAP hub

3749d30

Ref 2i2c-org#1050

damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Mar 30, 2022

rabernat mentioned this issue Apr 4, 2022

[New Hub] M2LiNES Pangeo hub #1168

Closed

9 tasks

damianavila moved this from In progress to Complete in DEPRECATED Engineering and Product Backlog Apr 13, 2022

damianavila closed this as completed Apr 13, 2022

yuvipanda mentioned this issue Apr 25, 2022

Restrict access to profiles based on GH team membership #1239

Merged

2 tasks

rabernat mentioned this issue May 5, 2022

Metrics and Reporting for LEAP Hub #1279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Hub] LEAP Pangeo #1050

[New Hub] LEAP Pangeo #1050

choldgraf commented Mar 2, 2022 •

edited

Loading

choldgraf commented Mar 2, 2022

yuvipanda commented Mar 9, 2022 •

edited

Loading

rabernat commented Mar 9, 2022 •

edited

Loading

yuvipanda commented Mar 10, 2022 •

edited

Loading

yuvipanda commented Mar 10, 2022

rabernat commented Mar 10, 2022 •

edited

Loading

yuvipanda commented Mar 10, 2022

rabernat commented Mar 10, 2022 •

edited

Loading

rabernat commented Mar 10, 2022

yuvipanda commented Mar 11, 2022

choldgraf commented Mar 14, 2022

sgibson91 commented Mar 15, 2022 •

edited

Loading

choldgraf commented Mar 15, 2022 •

edited

Loading

rabernat commented Mar 24, 2022

consideRatio commented Mar 24, 2022 •

edited

Loading

choldgraf commented Mar 24, 2022

rabernat commented Mar 24, 2022

consideRatio commented Mar 24, 2022 •

edited

Loading

rabernat commented Mar 25, 2022

rabernat commented Apr 7, 2022

yuvipanda commented Apr 8, 2022

yuvipanda commented Apr 8, 2022

damianavila commented Apr 13, 2022

[New Hub] LEAP Pangeo #1050

[New Hub] LEAP Pangeo #1050

Comments

choldgraf commented Mar 2, 2022 • edited Loading

Hub Description

Community Representative(s)

Important dates

Hub Authentication Type

Hub logo information

Hub user image

Extra features you'd like to enable

Other relevant information

Hub URL

Hub Type

Tasks to deploy the hub

Follow-up issues

choldgraf commented Mar 2, 2022

yuvipanda commented Mar 9, 2022 • edited Loading

rabernat commented Mar 9, 2022 • edited Loading

yuvipanda commented Mar 10, 2022 • edited Loading

yuvipanda commented Mar 10, 2022

rabernat commented Mar 10, 2022 • edited Loading

yuvipanda commented Mar 10, 2022

rabernat commented Mar 10, 2022 • edited Loading

rabernat commented Mar 10, 2022

Onboarding Tutorials

Offboarding

Tiering of Access

Metrics and Report

yuvipanda commented Mar 11, 2022

choldgraf commented Mar 14, 2022

sgibson91 commented Mar 15, 2022 • edited Loading

choldgraf commented Mar 15, 2022 • edited Loading

rabernat commented Mar 24, 2022

consideRatio commented Mar 24, 2022 • edited Loading

choldgraf commented Mar 24, 2022

rabernat commented Mar 24, 2022

consideRatio commented Mar 24, 2022 • edited Loading

rabernat commented Mar 25, 2022

rabernat commented Apr 7, 2022

yuvipanda commented Apr 8, 2022

yuvipanda commented Apr 8, 2022

damianavila commented Apr 13, 2022

choldgraf commented Mar 2, 2022 •

edited

Loading

yuvipanda commented Mar 9, 2022 •

edited

Loading

rabernat commented Mar 9, 2022 •

edited

Loading

yuvipanda commented Mar 10, 2022 •

edited

Loading

rabernat commented Mar 10, 2022 •

edited

Loading

rabernat commented Mar 10, 2022 •

edited

Loading

sgibson91 commented Mar 15, 2022 •

edited

Loading

choldgraf commented Mar 15, 2022 •

edited

Loading

consideRatio commented Mar 24, 2022 •

edited

Loading

consideRatio commented Mar 24, 2022 •

edited

Loading