You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the near future, we would like to support region-specific flu builds and also enable other researchers to use this repository to define their own custom flu builds in the same way the ncov repository allows custom workflows.
Description
Seasonal flu builds should follow the same general pattern as ncov builds where a standard workflow depends on an initial selection of sequences (based on some subsampling logic), inferring and annotating phylogenies, and then running build-specific steps to finalize the analysis.
Users should be able to configure their own workflows by modifying configuration files (e.g., YAML or JSON) and/or defining custom Snakemake profiles.
Existing parallel Nextstrain builds ("live" and "WHO") should be executable through this framework, such that we are users of our own workflow configuration system.
Specific solutions will require more discussion, but the basic first steps toward addressing this issue would be similar to what we had to do for ncov back in March/April:
Move all hardcoded parameters out of Snakefiles and into a configuration file.
Define a single top-level Snakefile that references component Snakefiles in workflow/*.smk files including a common.smk for shared functions and builds.smk for the main build logic.
Rewrite the WHO builds logic using pre- and post-main workflow rules, using dependencies and a custom profile to manage which rules are executed instead of running a separate Snakefile.
Organize outputs by named builds with custom parameters in a configuration files instead of using wildcards
Add support for running the workflow on a SLURM cluster and through AWS Batch by annotating rule-specific resources (memory, disk, and threads). (4d2cd21)
Define per-rule conda environments to allow custom dependency definitions outside of the canonical Nextstain environment or Docker image. (4d2cd21)
Document workflow configuration parameters and basic usage through a tutorial (similar to the ncov tutorial).
The text was updated successfully, but these errors were encountered:
Bumping this issue because we were asked during Nextstrain office hours (Oct 7, 2021) if we have plans to create a flu workflow similar to the ncov template/tutorial.
Do we want to rewrite the whole workflow from scratch or refactor the existing workflow?
Should we continue to consider the WHO builds as a separate workflow or will we eventually be able to run those builds with the “live” workflow?
Do we need to create separate trees for each combination of collaborating center and assay type or can we build a single tree and run the titer models multiple times for each combination of data with the same tree?
Trees would need to prioritize strains based on all available titer measurements.
How do we want to deploy "private" trees for collaborating centers?
Private Nextstrain Groups seem like an obvious solution, but should we have one group per CC or a single group for all?
Flagging this as something that could potentially benefit from/synergize with the currently active "workflows as programs" / "composable configuration" discussions.
This issue was resolved by #76. We should still discuss how to standardize Nextstrain workflows for segmented viruses, especially since the avian flu and Lassa workflows have taken different approaches. It's possible seasonal flu will remain an exception to whatever rule we decide on, though...
Context
In the near future, we would like to support region-specific flu builds and also enable other researchers to use this repository to define their own custom flu builds in the same way the ncov repository allows custom workflows.
Description
Seasonal flu builds should follow the same general pattern as ncov builds where a standard workflow depends on an initial selection of sequences (based on some subsampling logic), inferring and annotating phylogenies, and then running build-specific steps to finalize the analysis.
Users should be able to configure their own workflows by modifying configuration files (e.g., YAML or JSON) and/or defining custom Snakemake profiles.
Existing parallel Nextstrain builds ("live" and "WHO") should be executable through this framework, such that we are users of our own workflow configuration system.
Examples
See Nextstrain's ncov regional builds definitions for a real example of what we would like to support for flu eventually.
Possible solutions
Specific solutions will require more discussion, but the basic first steps toward addressing this issue would be similar to what we had to do for ncov back in March/April:
workflow/*.smk
files including acommon.smk
for shared functions andbuilds.smk
for the main build logic.The text was updated successfully, but these errors were encountered: