-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a guide to pinning dependencies #161
Comments
An issue to find and link from here is the one about adding a |
Illustrative example that is coming up for folks right now:
Notebook 5.7.4 requires tornado>=4.3, but tornado 6 has been released since with some changes that break notebook 5.7.4. Notebook 5.7.5 is released with the fix for tornado 6 compatibility. By pinning notebook and not tornado, you are guaranteeing future breakage because your env is allowing a package's dependencies to be upgraded, but not allowing the package itself to receive its upgrades that are needed to keep compatibility with dependencies. Two general approaches:
Specific things that should generally be avoided:
|
so....it sounds like we should recommend an "all or none" approach, no? |
+1 on Chris' suggestion and I like Min's example. Action to take for someone who wants to help out on this issue: Take Min's comment and include it in the guide to reproducibility the source of which is https://github.com/jupyterhub/binder/blob/master/doc/tutorials/reproducibility.rst |
I like the "all or none" recommendation from @minrk.
I think what's missing is "best practices" on how to achieve "pin none" and "pin all", and when to choose which. (I faced the future-reproducibility issue myself by forgetting to pin the python version.) Things to keep in mind:
Dependency management and reproducibility are really hard. Surely people have thought about these issues before. But where? |
I think this is the "repo2docker freeze" command that's been discussed a few times. Essentially, it would run repo2docker to install everything and then run A first version of this is to use
We'll then want to figure out what to do about "lockfiles" since this freeze pattern generally means there are two files: one that specifies the loose requirements, and one that records an actual working installation (Pipfile.lock, etc.). To use this right now, you would have to clobber the environment.yml, or use top-level environment.yml for loose and binder/environment.yml for frozen or something similar. |
I just learnt about |
One thing to note here is that because binder apparently uses a specific This is because I think for now the documentation should be updated to mention that the |
Thanks for the useful discussion! I haven't been including pinned versions of all dependencies, so I need to rethink what I'm doing. @mdeff asks above:
I'm wondering the same thing. When I include a pinned version of jupyter I've found that the images build ok, but don't launch, so I've been leaving it out. More generally, are there packages that shouldn't be pinned? For example, I just tried generating a new |
In a recent debugging session, @minrk pointed out that if you only partially pin your repository (e.g. pin numpy versions but don't pin Python), you are likely going to break future-reproducibility. This is because the non-pinned version may stop supporting your pinned version.
We have a short guide to reproducibility here, and this would make a nice addition!
Edit from Tim: Want to help out with this? Suggested steps on what needs doing are here.
The text was updated successfully, but these errors were encountered: