Stop using continuous prePuller #3313

consideRatio · 2023-10-24T12:55:47Z

Update 2023-10-30

Based on feedback from Yuvi in #3313 (comment), I've pivoted to only disable the continuous prePuller in this PR, and not also the hook prePuller. I'm opening a followup PR to this to constrain the use of hook prePuller in another PR to help us avoid issues with it.

Original PR

Fixes #2344 by inlining comments in basehub's default config that already disables prePuller logic by default, and by disabling both continous and hook prePuller where it is enabled for individual hubs.

prePuller is a somewhat complex machinery to reduce startup times in favorable conditions - conditions that also could change outside our chart configuration via the configurator for example. In unfavorable conditions, use of prePuller could instead increase startup times. This fragility and complexity has caused me some stress and cognitive load that I don't want to introduce to others, but instead help us all avoid.

This make me think the right call is to systematically disable both use of continious prePuller and hook prePuller.

Continue reading for some additional details

Disabling continuous pulling

The continuous prePuller could be beneficial if nodes are pre-started before a user pod arrives to them. It was initially designed with intent to be used with user-placeholder / node-placeholder pods, something we are not using in any of our clusters.

We have successfully made use of the continuous prePuller in the past when pre-starting many nodes for events. With the introduction of the practice to schedule multiple users per node, this is not as relevant. I think if we want to use it for that time, we can do a one off image pull just after pre-starting nodes instead of having it enabled by default - this is to be clarified by #2541.

Disabling hook pulling

The hook puller is active during helm ugprade, blocking the actual upgrade until pulling has been completed. I see some value in it compared to the current use of the continuous prePuller, but I think we are better off avoiding its complexity we introduce.

Read the inlined config notes about disabling it for some additional details.

After disabling, these are no longer relevant caveats

When I've heard "slow startup" be mentioned in support tickets etc, I've often ended up considering caveats below to ensure use of prePuller wasn't causing it. If we disable prePulling systematically with this PR, none of us need to consider caveats like below going onwards to be the cause of slow startups! We could focus on fixing issues caused by other things without even first ruling out its the prePuller causing it.

prePuller (continuous mainly) used for a hub scheduling user pods on nodes used by other hubs is likely to disturb other hubs, using a dedicated node pool has other caveats
prePuller doesn't pull all images we configure for use - not those via the configurator or profile_options in some profile lists
continuous prePuller pulling multiple images can even when pulling the right images delay startup times for non-pre-started nodes
hook prePuller requires use of ClusterRoleBinding resources, something that some k8s admins isn't granted if they exercise the right to replicate in a cluster they only control a namespace within

yuvipanda · 2023-10-24T14:02:51Z

Would you consider letting the hook prepuller be for what i consider the utoronto case? The specifics are:

configurator is not used
No profile list
The images are fairly large

In particular, without this, if we change the image tag here on the repo, the next user who starts the server after a deploy will be responsible for pulling the image and get a very slow startup. With this enabled, that cost is rolled into the helm deploy.

This was very poor UX for end users at berkeley, given that it happens “randomly” from their perspective, which is what led to the creation of the hook prePuller.

consideRatio · 2023-10-30T00:05:14Z

I've updated this PR to only disable the continuous prePuller for now, and I've opened a separate PR (#3341, currently in draft) to constrain the use of hook prePuller. This PR is now ready for review!

consideRatio · 2023-10-30T09:51:36Z

@GeorgianaElena @sgibson91 do you agree on this change?

GeorgianaElena · 2023-10-30T10:57:41Z

helm-charts/basehub/values.yaml

+  # prePuller is about pulling a one or more images identified via chart
+  # configuration, including singleuser.image, singleuser.profileList entries
+  # with a dedicated image, but not profileList entries with images' specified
+  # via profile_options.
  prePuller:
+    # continuous prePuller leads to the creation of a DaemonSet that starts a
+    # pod on each node to pull images.
+    #
+    # It is disabled as its only relevant for nodes started before user pods
+    # gets scheduled on them, in other cases it could delay startup and isn't
+    # expected to reduce startup times.
+    #


@consideRatio, what if we add these comments as a descriptions in the schema file?

I figure, the schema is where we currently "document" basehub defaults and having these in there kind of makes sense? Maybe in the future we can use that to generate some sort of API-like docs?

Maybe in the future we can use that to generate some sort of API-like docs?

I agree on letting basehub generate config reference docs based on descriptions for basehub added config in values.schema.yaml like done by z2jh!

I favor the approach of inlining a comment next to the changed default value in this case though.

All charts require a values.yaml file, so helm chart users likely look in them to learn about default values. If we separate remarks about default values into a schema file / generated docs, they may be less findable from the values.yaml file.

We don't yet have generated docs from the basehub's values.schema.yaml

If/when we generate docs from the schema, should the schema really redefine parts of the schema of dependency charts to provide a description that the default value is changed?

yuvipanda · 2023-10-31T10:16:41Z

@consideRatio thoughts on how this would interact with how we 'pre-warm' nodes for an event? In those cases, I believe we increase the size of the nodepool to ensure faster startup, and the continuous prepuller would be required there (subject to the same constraints outlined in #3313 (comment)).

consideRatio · 2023-10-31T10:21:17Z

@consideRatio thoughts on how this would interact with how we 'pre-warm' nodes for an event?

@yuvipanda I've opened #2541 about this, my idea is that we do a single one-off pre-pull to pre-started nodes if we pre-start nodes.

helm-charts/basehub/values.yaml

Co-authored-by: Yuvi Panda <[email protected]>

consideRatio · 2023-10-31T10:33:54Z

Thank you for reviewing @GeorgianaElena and @yuvipanda!!!

github-actions · 2023-10-31T10:34:41Z

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/6705278176

github-actions bot assigned consideRatio Oct 24, 2023

This comment was marked as resolved.

Sign in to view

consideRatio marked this pull request as ready for review October 24, 2023 13:46

consideRatio requested a review from a team as a code owner October 24, 2023 13:46

consideRatio changed the title ~~Stop using prePuller, both continuous and hook~~ [Engineering agreement needed] Stop using prePuller, both continuous and hook Oct 24, 2023

consideRatio marked this pull request as draft October 24, 2023 14:16

consideRatio mentioned this pull request Oct 24, 2023

Q4 Reduced workload goal - Oct 18 Sprint 1 tracking issue #3318

Closed

consideRatio changed the title ~~[Engineering agreement needed] Stop using prePuller, both continuous and hook~~ [Engineering agreement needed] Stop using continuous prePuller and constrain use of hook prePuller Oct 29, 2023

consideRatio changed the title ~~[Engineering agreement needed] Stop using continuous prePuller and constrain use of hook prePuller~~ Stop using continuous prePuller and constrain use of hook prePuller Oct 29, 2023

Stop using continuous prePuller

d403b4c

consideRatio force-pushed the pr/stop-pre-pulling branch from 347be22 to d403b4c Compare October 29, 2023 23:51

consideRatio changed the title ~~Stop using continuous prePuller and constrain use of hook prePuller~~ Stop using continuous prePuller Oct 29, 2023

consideRatio marked this pull request as ready for review October 29, 2023 23:56

consideRatio mentioned this pull request Oct 30, 2023

Constrain use of hook prePuller #3341

Closed

consideRatio changed the title ~~Stop using continuous prePuller~~ [Agreement needed] Stop using continuous prePuller Oct 30, 2023

GeorgianaElena reviewed Oct 30, 2023

View reviewed changes

GeorgianaElena approved these changes Oct 30, 2023

View reviewed changes

consideRatio changed the title ~~[Agreement needed] Stop using continuous prePuller~~ Stop using continuous prePuller Oct 31, 2023

yuvipanda approved these changes Oct 31, 2023

View reviewed changes

helm-charts/basehub/values.yaml Outdated Show resolved Hide resolved

Apply suggestions from code review

a23c0da

Co-authored-by: Yuvi Panda <[email protected]>

consideRatio merged commit 7f73364 into 2i2c-org:master Oct 31, 2023
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop using continuous prePuller #3313

Stop using continuous prePuller #3313

consideRatio commented Oct 24, 2023 •

edited

Loading

This comment was marked as resolved.

yuvipanda commented Oct 24, 2023

consideRatio commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023

GeorgianaElena Oct 30, 2023

consideRatio Oct 30, 2023

yuvipanda commented Oct 31, 2023

consideRatio commented Oct 31, 2023 •

edited

Loading

consideRatio commented Oct 31, 2023

github-actions bot commented Oct 31, 2023

Stop using continuous prePuller #3313

Stop using continuous prePuller #3313

Conversation

consideRatio commented Oct 24, 2023 • edited Loading

Update 2023-10-30

Original PR

Disabling continuous pulling

Disabling hook pulling

After disabling, these are no longer relevant caveats

This comment was marked as resolved.

yuvipanda commented Oct 24, 2023

consideRatio commented Oct 30, 2023 • edited Loading

consideRatio commented Oct 30, 2023

GeorgianaElena Oct 30, 2023

Choose a reason for hiding this comment

consideRatio Oct 30, 2023

Choose a reason for hiding this comment

yuvipanda commented Oct 31, 2023

consideRatio commented Oct 31, 2023 • edited Loading

consideRatio commented Oct 31, 2023

github-actions bot commented Oct 31, 2023

consideRatio commented Oct 24, 2023 •

edited

Loading

consideRatio commented Oct 30, 2023 •

edited

Loading

consideRatio commented Oct 31, 2023 •

edited

Loading