-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stateful Notebooks #9298
Stateful Notebooks #9298
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
fix notebooks
- move env getters to new file to prevent circular imports - move create admin method to user module
- add a stash method to filter by unique attrs - overwrite objs that are init at startup with the incoming migrated obj
- restore and spin woker image and pools from checkpoint - fix generation of UID from seed of worker name for inmem workers Co-authored-by: rasswanth-s <[email protected]>
- fix imports in 0.9.1_helpers
modified purge_worker_pool to purge_workers Co-authored-by: Shubham Gupta <[email protected]>
Co-authored-by: Shubham Gupta <[email protected]>
[WIP] Reset Stateful Migrations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we planning on adding the checkpoint code to all the notebooks in this PR?
packages/grid/devspace.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using dev-latest creates issues with latest image not getting pulled during k8s development.. can we avoid adding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are issues we face ? Right now the problem is that we want a way to spin up a cluster with the same tag as checkpoint is created, but since the DEVSPACE_TIMESTAMP is dynamic we cannot create images with a static tag. Also, Currently we're tagging an image with both DEVSPACE_TIMSTAMP tag and tag-latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that image:dev-latest
does not always get updated if you re-deploy after any code changes, breaking development experience and creating confusion. I previously had to revert this exact change from Rasswanth as well - because it just never worked.
dev-latest
might work in CI because there's only one image, but let's not add dev-latest
anywhere as it ends up creating confusion, and just causes divergence in expectation while debugging locally, in notebooks and CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the image tag is constantly changing, then the info about the image tag in the db is stale right ? Means the pools cannot use the redeployed versions anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per discussion:
- set the tags such that timestamp is always on top.
- dev-${DEVSPACE_TIMESTAMP}
- dev-latest
- don't change global.version value - keep it dev-${DEVSPACE_TIMESTAMP},
- use dev-latest for worker pool
Resolved in: 9e98cf4
notebooks/scenarios/bigquery/001-scale-delete-worker-pools.ipynb
Outdated
Show resolved
Hide resolved
Yes planning to do that as a new PR. We will create checkpoint at end of each notebook and then upload them as artifacts in CI. |
update bigquery notebook to filter big query image
Let's not use artifacts; reason being 1) issues with conflicting caches across multiple version/branches/nightlies/pr like we have with current pip cache 2) goals was so that local testing can leverage these checkpoint files directly. I'd recommend keeping it simple and checking them into the repo itself. That way everyone can run it locally almost instantly and we don't have to worry about wrong checkpoint versions being loaded for some other syft version on CIs. |
But don't we do that for K8s logs ? For each k8s run in CI the logs are available for us to download at end of tests. We were thinking of a similar behaviour for checkpoints as well, atleast when tests fail. I also agree local testing is the priority but I still think we can leverage it in CI atleast for notebooks and maybe just scenario notebooks. Also caching uses a different plugin and we don't want to persist checkpoints across PRs, instead have a copy available if things break in CI for local debugging. |
- use dev-latest tag for default worker pool
[WIP] Notebooks State - Misc Improvements
ah i must've confused cache with artifacts. okay this looks cool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work @shubham3121 @rasswanth-s =)
Description
Please include a summary of the change, the motivation, and any additional context that will help others understand your PR. If it closes one or more open issues, please tag them as described here.
Affected Dependencies
List any dependencies that are required for this change.
How has this been tested?
Checklist