-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Objective JH-8: Support workshop users using Hubs #23
Comments
I think we had a good test run of workshop support this quarter with supporting the Summer School on Inverse Modeling of Greenhouse Gases 2024 workshop (SSIM-GHG). We were in touch with Sourish Basu early on and were able to address specific needs of the workshop, and I think this experience can greatly help streamline our processes to support future workshops. Let's find a good place to document specific bits of workshop, but I'll outline here broadly the things we did, as well as things we could possibly do better in the future. It was extremely helpful that Sourish started testing the hub infrastructure well in advance and was able to articulate their specific computing needs, and we had sufficient time to test the specific profiles we created for the workshop. Shared Folder / File Access and EFS speedOne of the initial concerns was that there would be many students concurrently reading large files over the EFS share. We discussed the various options, including having students download files from s3. In the end, we settled on having a shared folder over EFS, and having students copy files to a local /tmp/ folder when they needed faster access. It was helpful to know things like:
We determined that students would have enough space on their local /tmp/ folders to fit the files needed, and since they only needed to read files from the shared folder, we could setup a system where Admins could put files into a shared folder, and students could copy those files into their local /tmp/ directories for faster access. It does seem like the speed of the EFS share was a bit of a bottle-neck during the workshop, and we received this feedback:
This is a known problem with EFS. What we could have done better here is done more thorough testing before the workshop so that expectations around data transfer speeds were clearer. In the future, we want to explore alternatives to EFS, and this use-case would be good to keep in mind. Custom Profiles based on Specific Computational NeedsIt was very helpful to have Sourish think deeply about the type of operations they wanted students to perform during the workshop, and run some tests to help determine the sizing of the containers, with regards to RAM and CPU allocations. It was determined that the heavier operations for the workshop were CPU-constrained more than RAM constrained, and the existing underlying Node Pool was using instances that were memory-optimized rather than CPU optimized. Based on these specific needs, we were able to configure a custom node pool with compute-optimized instances, and create specific profiles just for the workshop which set the default resource requirements. By creating custom profiles specifically for the workshop, and restricting access to the default profile options to users in the workshop group, we were able to reduce confusion in environment setup for the workshop students. Custom Images for Custom Environment requirementsSourish had some requests for custom packages for both the Python and R images used for the workshop. As part of #16, we had worked on simplifying our image build and publishing setup, and the workshop was a good real-world use-case to create custom images and provide a good template for similar use-cases in the future. We created https://github.com/NASA-IMPACT/ssim-ghg-workshop-2024-python-image/ and https://github.com/NASA-IMPACT/ssim-ghg-workshop-2024-r-image for the Python and R images, respectively. Both use the same base images that we use for the default Python and R images on VEDA, but have custom With the custom permission-scoped profile options, we were able to offer these custom images by default to workshop users, without affecting the default profiles for other users. Permissions and AccessThe model of being able to add users to particular Github teams to identify them as "workshop users" worked well. We were able to then use permission-scoping in the infrastructure configuration to restrict access to certain profiles based on Github team membership. This is the PR that setup the configuration for the workshop-specific profile options: https://github.com/2i2c-org/infrastructure/pull/4100/files This also helped us to test "tiered-access" to specific user groups, toward #19 . Learnings and FutureSupporting the SSIM-GHG workshop was a great learning experience and early and direct communication with workshop organizers was hugely useful to be able to get into details and dig into solutions that were feasible to implement and solved real end-user problems. This did end up taking a fair bit of our time, but a bit part of it was because we were doing things like configuring custom node-pools, profile options and creating custom images for the first time (for some of us). I feel good about streamlining this process greatly down the line. In the coming quarter, I'd like to see us better formalize the workshop process to allow us to better scale our support for workshops. We can use the experience from the SSIM-GHG workshop to come up with a list of questions to be answered in advance, and document examples of setting up groups, profile permissions and custom images. @freitagb @wildintellect it will be nice to discuss where we should collate documentation related to running workshops. I imagine the infrastructure stuff above is one part of it, but there are probably other things to think about, and it might be nice to collate a "workshop handbook" or similar somewhere? |
Pasting below feedback from Sourish Basu, the workshop conductor for the SSIM-GHG workshop. Overall, things seemed to have gone well, but there is some very useful feedback on the experience and I'll work on figuring how best to ticket / incorporate into our future work-plans:
Thanks much to @yuvipanda @slesaad and @sunu for all your work on this and many many thanks to Sourish for all the detailed coordination and feedback! |
cc @wildintellect - please let know if any of the feedback points mentioned above especially resonate and are things that we should definitely ticket. Thanks! |
@batpad most of that feedback is highly valuable and actionable in the next PI.
+1
We need to revisit how this is implemented and compare to MAAP which use a mix of EFS home directories and S3Fuse mounted shared directories.
I'm very curious about this. Do we know how much RAM was selected? I would never start an R session with less than 4 GB of Ram, and if doing anything Geospatial move to min of 16 GB. When I've run Rstudio servers in the past I tried to give users 32-64GB on a regular basis. This is very different from Python.
++ Need to document and look for easier ways to switch between instances. One of few perks of EclipseChe.
MAAP has this, the R workspace in MAAP is actually |
Motivation
Owner(s)
@batpad @yuvipanda
Success criteria
The text was updated successfully, but these errors were encountered: