-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bsweger/get clades list #18
Conversation
There are a few changes (for example, sorting by date and clade before slicing on the "n" clades that create the final list). This changeset also adds test cases for various permutations of threshold, threshold_weeks, and the maximum number of clades allowed in the list being returned.
# FIXME: provide ability to instantiate Config for the get_clade_list function and get the data_path from there | ||
def main( | ||
genome_metadata_path: AnyPath = Config.nextstrain_latest_genome_metadata, | ||
data_dir: AnyPath = AnyPath(".").home() / "covid_variant", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with @rogersbw original code, we're saving the Nextstrain metadata to disk so we can work with it locally. I'm assuming that this is an internal concern (for example, we don't need to surface the download to whatever process runs get_clade_list
and could change this implementation detail if needed).
Please let me know if that assumption is incorrect!
def main( | ||
genome_metadata_path: AnyPath = Config.nextstrain_latest_genome_metadata, | ||
data_dir: AnyPath = AnyPath(".").home() / "covid_variant", | ||
threshold: float = 0.01, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These defaults are from the original clades_to_model
code: https://github.com/rogersbw/clade_data_utils/blob/main/utility/data_utility.py#L79
This all looks good to me. Merging! |
Resolves #13
Also see comments here about why I deviated from the original plan of putting this function in the hub itself.
This PR continues the work to port @rogersbw's "clades to model" process into
virus-clade-utils
:sequence.py
get_clade_list.py
, which returns a list of clades to model, based on 3 criteria:Note that
get_clade_list.py
returns "clades to model" as a Python list, with the assumption that it will be serialized and written to disk by the calling process (e.g., the variant nowcast hub). We'll need to think about that more, since that hub's "make round config" code is written in R.