-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pathogen-specific tools from base runtimes #7
Comments
Generally, I would love to have a way to use pathogen-specific Docker images in our workflows! That's been my dream since we added pango-learn to the base image back in the early pandemic. For the specific candidates you mentioned for removal, I can make some specific notes:
I would also recommend removing would be the pango-learn packages and its binary dependencies of gofasta and minimap2, since all of our pango annotations come from Nextclade now. |
This is the other half of workflows as programs, namely the "the artifacts/bundling (keyword: buildpacks) side of things", no? (And yes, we should totally do this if it's at all feasible -- there are a number of times I haven't done something because I know it's going to be such a hassle / burden to make the needed dependency available to our runtimes.) |
Yes, precisely. The whole idea there is that instead of having runtimes and pathogens separately, we have pathogens that are (or contain) their runtimes. We want to avoid having N pathogens and N×M pathogen-runtimes and making the user match them. The implementation examples Victor gave (and things like ncov-ingest's image) are coming at this from what I'd call a more ad-hoc approach, and I do not think we should go down that path as a way to get to custom runtimes per pathogen. That way lies ecosystem fragmentation and incurs significant usability costs (to both users and developers, us and others). There's lots of considerations of this work. For example, our runtimes are not small when installed on disk. We're going to want to be able to share a concrete, installed base across pathogens (not just a conceptual base). We'll also want to consider the cost vs. benefits of moving something out of the base runtimes; it will have non-trivial overhead (both conceptual and actual) and we should only do it when it's worth it. I'm not convinced many candidates given above meet that threshold? What concretely are we gaining with the removal of each? |
Do you have examples? They would be very helpful to guide both eventual work on this topic but also suggest pain points we might be able to alleviate now with the current base runtimes. |
The one I was reminded of with this week's avian-flu work is nextstrain/avian-flu#80. There have been a bunch of others along the lines of "can't use this pip dependency, not in our runtimes" but I managed to find an alternate solution so it wasn't a dealbreaker. |
This applies to docker-base and conda-base.
Context
Our base image has accumulated various pathogen-specific tools over time, some of which signficantly contribute to build time and image size. By removing these pathogen-specific tools, we can ensure the base image/environment reflects a continually updated version of Nextstrain tools and their dependencies. Using fauna as an example, more detailed reasoning is in nextstrain/fauna#170.
Candidates
Note
This seems like the right move for Fauna, but I'm not sure how far we want to take it. As we expand the number of core pathogens that rely on runtimes, the common base will only get smaller.
Tasks
For each pathogen/project that relies on tools that may be removed, create and use a custom runtime that installs the tools. Right now the process may be more involved than it should be, and we should provide a good path for extending the base runtimes (examples: docker, conda).
The text was updated successfully, but these errors were encountered: