Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EVENT] ICESat-2 Hackweek 2023 #2889

Closed
4 of 9 tasks
scottyhq opened this issue Jul 28, 2023 · 9 comments · Fixed by #3021
Closed
4 of 9 tasks

[EVENT] ICESat-2 Hackweek 2023 #2889

scottyhq opened this issue Jul 28, 2023 · 9 comments · Fixed by #3021
Assignees

Comments

@scottyhq
Copy link
Contributor

scottyhq commented Jul 28, 2023

Summary

ICESat-2 Hackweek 2023 is focused on Cloud computing with NASA ICESat-2 data (https://icesat-2-2023.hackweek.io)

Event Info

Hub info

Task List

Before the event

  • Dates confirmed with the community representative and added to Hub Events Calendar.
  • Quotas from the cloud provider are high-enough to handle expected usage.
  • One week before event Hub is running.
  • Confirm with Community Representative that their workflows function as expected.
    • 👉Template message to send to community representative
      Hey {{ COMMUNITY REPRESENTATIVE }}, the date of your event is getting close!
      
      Could you please confirm that your hub environment is ready-to-go, and matches your hub's infrastructure setup, by ensuring the following things:
      - [ ] Confirm that the "Event Info" above is correct
      - [ ] On your hub: log-in and authentication works as-expected
      - [ ] `nbgitpuller` links you intend to use resolve properly
      - [ ] Your notebooks and content run as-expected
      
  • 1 day before event, either a separate nodegroup is provisioned for the event or the cluster is scaled up.

During and after event

  • Confirm event is finished.
  • Nodegroup created for the hub is decommissioned / cluster is scaled down.
  • Hub decommissioned (if needed).
  • Debrief with community representative.
    • 👉Template debrief to send to community representative
      Hey {{ COMMUNITY REPRESENTATIVE }}, your event appears to be over 🎉
      
      We hope that your hub worked out well for you! We are trying to understand where we can improve our hub infrastructure and setup around events, and would love any feedback that you're willing to give. Would you mind answering the following questions? If not, just let us know and that is no problem!
      
      - Did the infrastructure behave as expected?
      - Anything that was confusing or could be improved?
      - Any extra functionality you wish you would have had?
      - Could you share a story about how you used the hub?
      
      - Any other feedback that you'd like to share?
      
      
@damianavila
Copy link
Contributor

@consideRatio, feel free to ask @scottyhq any further questions/details needed to fulfill the above checkpoint.
Thanks!!

@consideRatio
Copy link
Contributor

@scottyhq I have some questions:

Authorization and machine types

  • Will workshop attendees be part of the github organization teams CryoInTheCloud:cryoclouduser, CryoInTheCloud:cryocloudadvanced, or another group?
  • How much resources are they to request when starting a server?
    I recommend they get to start on a machine type that can fit at least 10 users, but if they request for example ~8 GB of memory, only ~4 users would fit on a 32GB machine available to CryoInTheCloud:cryoclouduser.

Guessing what may be practical, I suggest that a new github team is created, and that this team is allowed to start on a 16 CPU / 128 GB machine by default, with perhaps 4, 8, or 16 GB of memory requested by default.

Pre-started nodes and pre-downloaded images

If you want, we can optimize the startup time by ensuring nodes are already started and images are already downloaded on them. This would have machines on standby, incurring more cost. Do you wish for this @scottyhq, and if so, for how many users should we guarantee a fast startup time by having capacity pre-allocated?

Also for this, its relevant that we know what image is to be used ahead of time as pre-downloading images must be specified via config rather than the configurator at https://hub.cryointhecloud.com/services/configurator/. Let me know what image(s) is to be used if you want pre-started nodes.

@scottyhq
Copy link
Contributor Author

scottyhq commented Aug 4, 2023

Thanks for your assistance @consideRatio !

Will workshop attendees be part of the github organization teams CryoInTheCloud:cryoclouduser

Yes. All CryoCloud JupyterHub Access is going through a Google Form as described here https://book.cryointhecloud.com/content/Getting_Started.html#getting-started. I don't know who all has access to transfer form responses to the GitHub Team (@tsnow03, can you clarify)?

We also have these two GitHub Teams in another Org. I don't know if cross-org permissions are possible but I added you to the 2023 organizers team in that Org: (https://github.com/orgs/ICESAT-2HackWeek/teams/2023-participants, https://github.com/orgs/ICESAT-2HackWeek/teams/2023_orgteam)

How much resources are they to request when starting a server?

I personally think a default of 2CPU and 16GB would be good for this event, and it's nice to have the 4CPU/32GB option. The current default of 0.5CPU/4GB feels low. Is that easy to adjust for the event?

If you want, we can optimize the startup time

I think this would be nice (having a +1 spot ready to go at all times next week) but not necessary. I think people are willing to wait 5 minutes and check email, etc while things spin up. My understanding is that the configurator is not currently being used on CryoCloud, so whatever image is specified in 2i2c config is what is being used.

@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Aug 4, 2023
@damianavila damianavila moved this from Todo 👍 to In Progress ⚡ in Sprint Board Aug 4, 2023
@consideRatio
Copy link
Contributor

Decision on machine type

With users starting up 16 GB or possibly 32 GB servers, it would fit 8 or 4 users on a 128 GB machine, or 32 or 16 users on a 512 GB machine.

Waiting for startup is typically something done per machine, so fitting more users on a machine improves startup experience in general. My current guesstimate on a good compromise is to aim for around 10-40 per node, so I'm planning use of a 512 GB machine for the attendees.

Decision on profile list config

I've added a new entry that is shown as first and default entry for users of the team cryoclouduser (also for cryocloudadvanced) in the github org CryoInTheCloud. It uses the same images, and provides two options - starting with a guarantee of 16 GB / 2 CPU or 32 GB / 4 CPU, limited to 20 GB / 4 CPU or 40 GB or 8 CPU.

Expected outcomes

  • I expect that all users should see the ICESAT-2 HackWeek entry, selected by default.
  • I expect that startup time should be slow for the first user (such as an instructor starting a server ahead of time), and fast for the following 31 users, then slow again for the 33rd user and fast for everyone else after two nodes are started.

image

@damianavila damianavila moved this from In Progress ⚡ to Waiting 🕛 in Sprint Board Aug 7, 2023
@damianavila damianavila moved this from In progress to Waiting in DEPRECATED Engineering and Product Backlog Aug 7, 2023
@scottyhq
Copy link
Contributor Author

The event is done and it was awesome :) Thank you 2i2c for a providing such a reliable and useful service!

@damianavila
Copy link
Contributor

@consideRatio, can you take care of the after-event task, please? Thanks!

@consideRatio
Copy link
Contributor

Great to hear @scottyhq, thanks for the followup!

I'll remove the "ICESAT-2 Hackweek" user server choice at this point, right @scottyhq?


We hope that your hub worked out well for you! We are trying to understand where we can improve our hub infrastructure and setup around events, and would love any feedback that you're willing to give. Would you mind answering the following questions? If not, just let us know and that is no problem!

  • Did the infrastructure behave as expected?
  • Anything that was confusing or could be improved?
  • Any extra functionality you wish you would have had?
  • Could you share a story about how you used the hub?
  • Any other feedback that you'd like to share?

@consideRatio
Copy link
Contributor

With a support request from @tsnow03 to remove the option, I went ahead and did it @scottyhq by merging Yuvi's PR #3021

@scottyhq
Copy link
Contributor Author

scottyhq commented Sep 8, 2023

Sorry for the delay, just getting back from vacation. Some answers below:

Did the infrastructure behave as expected?

Yes! It was great. We had about 50 people using the Hub every day for a week, startup times were fast, and it we didn't encounter any issues.

Anything that was confusing or could be improved?

This hub has ~/shared, ~/shared-public and ~/shared-readwrite, it took a while to realize that ~/shared-public is fully accessible to everyone. Also, whether or not user home directory file storage quotas are strictly enforced (EFS drive) is still unclear. I believe there still is no strict enforcement of this quota, so we just strongly encourage people to keep home directory storage less than 10GB.

Any extra functionality you wish you would have had?

Real-time-collaboration. GPU access. Report Docker Image Info (2i2c-org/features#16).

Could you share a story about how you used the hub?

We relied on this JupyterHub for introducing ~60 scientists to Cloud-computing and data-proximate computing with public datastets in AWS us-west-2 (In particular NASA's ICESat-2 archive). Teams of 4-8 scientists were able to hit the ground running with a curated Python environment and easily configurable computing resources (CPU, RAM) to run interactive tutorials and sprint on projects for one week. More here: https://github.com/ICESAT-2HackWeek/ICESat-2-Hackweek-2023

Any other feedback that you'd like to share?

The JupyterHub was really fantastic. Thanks again 2i2c!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants