Allow JupyterHub admins different cloud permissions than standard users #9

yuvipanda · 2022-04-23T20:12:23Z

Context

@rabernat brought up the point that it's important for hubs to be able to create cloud buckets whenever they want, without entirely having to rely on 2i2c. This can be accomplished by giving hub admin accounts a different set of cloud credentials when they're logged in to the hub than regular users - that way, we can scope it to just the extra perms they want (probably full GCS / S3 access) without having to give them full ownership on the cloud project.

Proposal

We already provide cloud credentials via workload identity in GCP and IRSA on AWS. This is matching a kubernetes SA to a GCP / AWS SA. We can have a different kubernetes service account for admins and thus grant that different rights

Create a different KSA that is attached for hub admins
Write terraform config that optionally provisions an extra GCP Service Account for this specific account. This should be a superset of regular permissions
Optionally give extra rights to admins
Write documentation on how to create additional storage buckets

Updates and actions

No response

rabernat · 2022-04-23T21:35:47Z

How would the UI side of this work? Would they just run aws s3 commands from the terminal? I rely heavily on the aws / gcp console for this currently.

rabernat · 2022-04-25T21:46:17Z

Credentials for cloud storage use the cloud-provider IAM system. In my ideal world, credentials for these buckets would be automatically populate based on hub identity. However, since hub identity is different from cloud-provider identity, that's not trivial to do, and would require some kind of database matching hub users to projects to project storage buckets. The concept of "groups" in jupyterhub could be very helpful here. Developing a general solution to this problem as part of z2jh would have a huge impact.

yuvipanda · 2022-04-25T22:52:03Z

There are two separate parts here:

Different cloud credentials just for JupyterHub admins,
Different cloud credentials per-group

(1) is easier to do than (2) now, since we already have code that has special overrides for hub admins (that's how we do the shared dir). I want to focus this issue on (1).

And yes, any AWS command / tool should 'just work' - aws on the terminal would work with all the permissions granted.

scottyhq · 2022-06-07T20:16:05Z

Just wanted to chime in here to say this would be really useful! I can think of a couple cases that (might?) be relatively straightforward to implement before tackling group-based permissions.

admin creates a bucket without a lifecycle policy that everyone automatically has read-only access to (similar to current ~/shared folder)
admin modifies the base service account policy to add additional buckets everyone can access. for example, in AWS you have to explicitly list buckets that are in other accounts but "requestor pays". It seems many public datasets have the requestor pays configuration that would be nice to access in addition to the scratch bucket: https://registry.opendata.aws/usgs-landsat/

rabernat · 2022-09-10T21:50:58Z

As we begin the new semester, I am pinging this issue to remind the team that this is an extremely high-value feature that would really accelerate the use of data on our hubs.

yuvipanda · 2022-09-10T22:45:38Z

@rabernat ok, so to be more specific, we want to allow admins to create buckets, right? And implement that in a way that generalizes?

rabernat · 2022-09-11T00:05:34Z

Correct. This will empower the hub communities to manage their own cloud storage, rather than relying on 2i2c admins. Using object storage (rather than NFS mount) is key for more cloud-native-style workflows.

rabernat · 2022-10-13T21:11:30Z

I'm checking in on this issue. We continue to have requests from M2LInES and LEAP users to have a non-scratch bucket in which to store their data and share it with the hub team (but not the public).

Based on https://github.com/2i2c-org/infrastructure/issues/1230#issuecomment-1278183441

yuvipanda · 2022-10-14T18:28:11Z

I've dealt with the specific issue here in 2i2c-org/infrastructure#1776 by making PERSISTENT_BUCKET a feature. That PR will enable that for LEAP and m2lines. How do we make sure that it doesn't baloon costs super high by users just unexpectedly leaving stuff there?

jmunroe · 2022-10-14T18:28:16Z

I am caught between two ways of solving this issue of creating cloud storage by admins on hubs.

It is "easy" to create buckets using the Google Console or the command like assuming you have the right permissions. We could set it up to with instructions to default to "requestor-pays" and the give guidance on how to set life cycle rules. It would solve the immediate problem of letting admins create what ever storage buckets they want. It think this would only be an option on a "dedicated" cluster where the community partner is paying (either directly or via 2i2c) the entire cloud costs. It would then be the community partner's responsibilty to manage the costs and life cycle rules associated with that cloud storage.
But I think that is not the "right" way to set it up (the way I would expect 2i2c cloud enginneers to create and manage cloud storage on a hub). I assume would modify the the correct terraform configuration files so that we are practicing infrastructure-as-code and other devops goodness. I see this as being important especially in cases where we are asked to migrate a hub to another availability zone, decommision a hub, or facility right to replicate: if the entire infrastructure is not managed we run a risk of "forgetting" some resource at some future point down the road. Is there potential to automate this process using an UI so that hub admins could deploy cloud storage in a managed way?

Are cloud buckets something that needs to be created/destroyed frequently? What is the true "cost" to having 2i2c create this resource on behalf of users?

Waiting for "I.T." to deploy some resource like extra storage was frustrating when I knew it was "easy" to do if I just had admin access to my own infrastructure is definitely something I wanted as a research user. But thinking it from a sustainability side, I am more hesistant to bypass any recommend cloud engineering best practices.

To be clear, it may be that for M2LInES and LEAP we just create the hubs for them so they can proceed with their work. My comments here are about the more general of what 2i2c is providing in a "research hub" and how that should be represented on our product roadmap.

yuvipanda · 2024-04-19T04:26:19Z

This is currently being done in 2i2c-org/infrastructure#3932 for AWS

yuvipanda referenced this issue in yuvipanda/pilot-hubs Oct 14, 2022

Add PERSISTENT_BUCKET to LEAP too

6ca9bbf

Based on https://github.com/2i2c-org/infrastructure/issues/1230#issuecomment-1278183441

yuvipanda mentioned this issue Mar 7, 2024

Move all 'product feature ideas' away from this repo's issues into ProductBoard 2i2c-org/infrastructure#3789

Open

yuvipanda transferred this issue from 2i2c-org/infrastructure Apr 1, 2024

rabernat mentioned this issue Apr 1, 2024

Enhance user experience around cloud storage via better integration with jupyterlab and notebook server environment #10

Open

4 tasks

yuvipanda mentioned this issue Apr 19, 2024

Merge terraform PR for supporting multiple AWS roles for users vs admins 2i2c-org/infrastructure#3967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow JupyterHub admins different cloud permissions than standard users #9

Allow JupyterHub admins different cloud permissions than standard users #9

yuvipanda commented Apr 23, 2022 •

edited

Loading

rabernat commented Apr 23, 2022

rabernat commented Apr 25, 2022

yuvipanda commented Apr 25, 2022

scottyhq commented Jun 7, 2022

rabernat commented Sep 10, 2022

yuvipanda commented Sep 10, 2022

rabernat commented Sep 11, 2022

rabernat commented Oct 13, 2022

yuvipanda commented Oct 14, 2022

jmunroe commented Oct 14, 2022 •

edited

Loading

yuvipanda commented Apr 19, 2024

Allow JupyterHub admins different cloud permissions than standard users #9

Allow JupyterHub admins different cloud permissions than standard users #9

Comments

yuvipanda commented Apr 23, 2022 • edited Loading

Context

Proposal

Updates and actions

rabernat commented Apr 23, 2022

rabernat commented Apr 25, 2022

yuvipanda commented Apr 25, 2022

scottyhq commented Jun 7, 2022

rabernat commented Sep 10, 2022

yuvipanda commented Sep 10, 2022

rabernat commented Sep 11, 2022

rabernat commented Oct 13, 2022

yuvipanda commented Oct 14, 2022

jmunroe commented Oct 14, 2022 • edited Loading

yuvipanda commented Apr 19, 2024

yuvipanda commented Apr 23, 2022 •

edited

Loading

jmunroe commented Oct 14, 2022 •

edited

Loading