Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constrain use of hook prePuller #3341

Closed

Conversation

consideRatio
Copy link
Contributor

@consideRatio consideRatio commented Oct 30, 2023

This is a followup on #3313 (comment) to still retain use of hook prePuller, but to constrain its use.

The use of hook image puller is constrained to:

  • Not using the jupyterhub-configurator at the same time
    This is enforced by the chart schema.
  • Not using one or more images set in any other way than by singleuser.image
    This isn't enforced as its hard to enforce surgically enough to not constrain use of profileList beyond what we want. Due to that, a comment is made next to prePuller.hook.enabled to help us keep track of this. I'm quite strongly opinionated that this comment is retained inline because I think its too hard for us to enforce this policy if its only declared in external docs.

Opt-out of jupyterhub-configurator

With jupyterhub.custom.jupyterhubConfigurator.enabled individual hubs can opt-out of using it. When disabled, it won't present itself to users via the JupyterHub menu under services, and any options set historically when enabled are ignored.

Hub specific changes

Cloudbank

No impact on the communities are expected except for the jupyterhub-configurator no longer being usable on the staging hub.

Hook prePuller was enabled for a few hubs but not all. Since they put users on the the same nodes, and all hubs configure the same singleuser.image, it doesn't help to have more than one hook prePuller as long as it does its work before hubs are upgraded. By enabling hook prePuller on the staging hub, we accomplish this.

With this, the jupyterhub-configurator is disabled for the staging hub though which in this case is an undesired consequence as its really pulling the image for other hubs, possibly using the configurator.

UToronto

The hook prePuller was enabled on staging and prod hubs. Staging hubs benefit from the configurator, and the prod hubs doesn't. As long as hook prePuller is enabled on either staging or prod, they should be fine. This led to:

- Staging Prod
hook prePuller now disabled remains enabled
jupyterhub-configurator remains enabled now disabled

@github-actions

This comment was marked as resolved.

@consideRatio consideRatio marked this pull request as ready for review October 31, 2023 11:07
@consideRatio consideRatio requested a review from a team as a code owner October 31, 2023 11:07
@consideRatio
Copy link
Contributor Author

@sean-morris note that this PR is going to disable the jupyterhub-configurator on staging.cloudbank.2i2c.cloud, but you could test new images in another hub still I've presumed.

@yuvipanda
Copy link
Member

What would the guidelines be for when one of the cloudbank hubs needs to be on a different image? Or if we're testing a different image on the utoronto staging hub before deploying to prod for an extended time (as we just did with ucmerced)?

@yuvipanda
Copy link
Member

And wouldn't testing images that can possibly break on a non-staging hub be disruptive to those users, or require specially timing these tests to make sure they are not in use at that time?

@consideRatio
Copy link
Contributor Author

What would the guidelines be for when one of the cloudbank hubs needs to be on a different image? Or if we're testing a different image on the utoronto staging hub before deploying to prod for an extended time (as we just did with ucmerced)?

Cloudbank is a multi-community cluster with a shared node pool where one image is made special - the one many defaults to and defined in common.values.yaml. If one of the cloudbank hubs wants to use another image, I think they should do it using the configurator, and they don't have it updated by the hook prePuller as requires it to be set via chart config.

I think hook prePuller shouldn't be enabled for multi-community clusters using shared node pools, but cloudbank is an exception because it has a single image for many hubs making that stand out - its a multi-similar-community cluster using shared node pools.

If jupyterhub-configurator and hook prePuller is considered mandatory on the staging hub, then I think the resolution is to add the additional complexity of adding an override jupyterhub.custom.jupyterhubConfigurator.prePullerCanAlsoBeEnabled or similar.

And wouldn't testing images that can possibly break on a non-staging hub be disruptive to those users, or require specially timing these tests to make sure they are not in use at that time?

Yes it would, but that is the situation independent of using hook prePuller for all shared clusters community hubs right - unless they have the "other" option as well in their hub.

@yuvipanda
Copy link
Member

Would you be able to describe what the policy for 'should hook pre-puller be enabled for this hub' be? What would someone setting up a new hub do to determine if this should be enabled or not? I think having that be written down and possibly be documented will help me, rather than discussing purely from examples.

@GeorgianaElena
Copy link
Member

In the context of #3095 I would really like to to move towards empowering communities to use their staging hubs to be able to test images before deploying them to their users and without our intervention every time.

That said, I wouldn't like us to compromise this for cloudbank.

I think having that be written down and possibly be documented will help me, rather than discussing purely from examples.

I agree with @yuvipanda that documenting this somewhere together with this PR will make more sense and will close the circle. Without docs, we just move from an undocumented use of prePuller to an undocumented not use of it if that makes sense.

Because I see this as a "feature", I would document it in the howto of features https://infrastructure.2i2c.org/howto/features/

@yuvipanda
Copy link
Member

I also just learnt about https://z2jh.jupyter.org/en/stable/resources/reference.html#prepuller-pullprofilelistimages, which seems to indicate that the pre puller also picks up images from profile_list! Is that true? If so, does this mean that the only time the hook prePuller is completely useless when we are using the configurator?

@consideRatio
Copy link
Contributor Author

If so, does this mean that the only time the hook prePuller is completely useless when we are using the configurator?

The hook prePuller can be both helpful and pointless, and both the cause of issues and not, depending on situations. My main concern isn't about it being pointless sometimes, but that its the cause of more issues than its helpful. Besides that, use of prePuller is adding complexity as a guaranteed added negative.

This is a very complicated topic I think. I'm not sure I can find a clear motivation for when to enable it, or clarify the motivation well enough without a significant time investment. @yuvipanda are you okay with letting the scope of this PR to be to just constraining the use of hook prePuller away from some situations, as compared to also defining exactly what situations to use it in?

If there is a week of effort available for me to focus on that, I can get it done with some z2jh docs improvements I think, but I expect the balancing act of specifying exactly when to use it would take a lot of additional work for me.

@consideRatio
Copy link
Contributor Author

consideRatio commented Nov 3, 2023

I also just learnt about https://z2jh.jupyter.org/en/stable/resources/reference.html#prepuller-pullprofilelistimages, which seems to indicate that the pre puller also picks up images from profile_list! Is that true?

Yes, both variants pull the same set of images, and it are images from the chart config under various sources including singleuser.profileList and the kubespawner_override it defines. This excludes those under profile_options currently, but that is considered by jupyterhub/zero-to-jupyterhub-k8s#3217 that isn't yet merged.

There is also the functionality to not start the hook prePuller unless one image to pull has changed, making the hook prePuller only pull conditionally by default.

@yuvipanda
Copy link
Member

@yuvipanda are you okay with letting the scope of this PR to be to just constraining the use of hook prePuller away from some situations

I think this PR as is adds a lot of assumptions into our infrastructure - such as staging and production hubs having the same image always, cloudbank not able to use staging to test images, etc. I am not sure what constraining it to be simpler would look like. I am happy to review it again if you are able to constrain it more.

I do have a suggested policy i am happy to share if so desired.

@yuvipanda
Copy link
Member

If you are able to document why the prePuller should not be used in some specific situation, and constrain the pr to that documentation + implementing that, that is fine too.

@yuvipanda
Copy link
Member

If there is a week of effort available for me to focus on that, I can get it done with some z2jh docs improvements I think, but I expect the balancing act of specifying exactly when to use it would take a lot of additional work for me.

I definitely don’t think this is a good use of your time right now. Let’s try to find something that works for the 90% use case than burn ourselves out in search for perfection

@yuvipanda
Copy link
Member

If you think this is too complex to work on right now, i think it is also ok to park it and come back to it later! Please don’t burn yourself out!

It can be useful to let the staging hubs still provide the configurator,
so we enable the hook image puller on the prod hubs specifically
instead where the configurator is disabled.
@consideRatio
Copy link
Contributor Author

I do have a suggested policy i am happy to share if so desired.

@yuvipanda I'd like to not work this further async due to the complexity of the topic and my struggles communicating async about complex topics. If you want, I'd really appreciate listening to and chatting with you about your policy idea in a meet!

I'll go for a close on this PR for now, it could always be re-opened or stripped for parts later anyhow.

@damianavila
Copy link
Contributor

@consideRatio @yuvipanda and @GeorgianaElena, I am happy about how this discussion evolved.
I know it might be a little bit frustrating to close a PR (instead of merging it), but I also think we got a LOT more from the exchanges we had here even more with the additional complexity of an async conversation about a non-trivial topic.
Congrats to all on the exchange and the achieved resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

4 participants