Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add measles dataset #202

Merged
merged 7 commits into from
Jun 4, 2024
Merged

Add measles dataset #202

merged 7 commits into from
Jun 4, 2024

Conversation

kimandrews
Copy link
Contributor

Add a measles dataset to Nextclade.

@kimandrews kimandrews temporarily deployed to refs/pull/202/merge May 24, 2024 15:53 — with GitHub Actions Inactive
@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented May 24, 2024

Nice!

https://clades.nextstrain.org/?dataset-server=gh:@add-measles-dataset@&dataset-name=nextstrain/measles

Seems to be working technically, but I haven't checked science of things (I am not a scientist :))

I know nothing about measles, but if makes sense, please consider creating subdirectories, in case there will be more dataset flavors in the future. For example you could distinguish datasets by ref accession: nextstrain/measles/NC-001498-1, nomenclature: nextstrain/measles/who/clade-A/subtype-1, or some prominent feature(s): nextstrain/measles/host-elephants/flavor-vanilla, or something like that. Path choice considerations are described in the docs/ directory here.

By contrast to nextstrain.org urls, Nextclade datasets cannot be nested one inside another (in other words, only leaf directories can contain dataset files), so if we include just nextstrain/measles directory this time, this means nextstrain/measles/2 and nextstrain/measles/another will not be possible, so we'll have to invent something like nextstrain/measles-2, nextstrain/measles-another, which will not be nice to other organisms.

Additionally, it is possible to add shortcuts (aliases) (example), so you can for example alias nextstrain/measles/A/12/delta to nextstrain/measles and nextstrain/measles/A and even measles. Shortcuts must not have conflicts across all of the datasets.

@rneher
Copy link
Member

rneher commented May 24, 2024

Nice! As Ivan said, the would make sense to generate one additional level to the path (for example the reference, or an indication that this is just N). We might want to add a full genome build at some point.

Consider using --include-root-sequence-inline during the export step. This will allow coloring by sites that aren't variable.

Nextclade also has no use for time information on the trees. It doesn't really hurt, but leads to unexpected behavior if users start toggling between time and divergence. So you might consider removing it for the nextclade build.

Update dataset name and path in pathogen.json and README in anticipation of moving dataset to subdirectories.
Create a N450 subdirectory, since future measles datasets could use other parts of the genome; and also create a WHO-2012 subdirectory, since this dataset is based on genotyping guidelines provided by the WHO in 2012, and there could be future revisions to N450 genotyping.
@kimandrews kimandrews temporarily deployed to refs/pull/202/merge May 24, 2024 19:00 — with GitHub Actions Inactive
kimandrews added a commit to nextstrain/measles that referenced this pull request May 28, 2024
Removes commands for adding time information to the Nextclade dataset tree, since Nextclade doesn't use this information. This change was recommended here: nextstrain/nextclade_data#202 (comment)
@kimandrews kimandrews temporarily deployed to refs/pull/202/merge May 28, 2024 21:09 — with GitHub Actions Inactive
@kimandrews kimandrews temporarily deployed to refs/pull/202/merge May 28, 2024 23:06 — with GitHub Actions Inactive
kimandrews added a commit to nextstrain/measles that referenced this pull request May 28, 2024
Removes commands for adding time information to the Nextclade dataset tree, since Nextclade doesn't use this information. This change was recommended here: nextstrain/nextclade_data#202 (comment)
kimandrews added a commit to nextstrain/measles that referenced this pull request May 30, 2024
Removes temporal info from Nextclade dataset timetree, since Nextclade doesn't use this information. This change was recommended here: nextstrain/nextclade_data#202 (comment)
@rneher rneher self-requested a review June 1, 2024 21:19
Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, Kim. This looks very good to me.

@rneher rneher deployed to refs/pull/202/merge June 4, 2024 20:16 — with GitHub Actions Active
@rneher rneher merged commit d7774d1 into master Jun 4, 2024
1 check passed
@rneher rneher deleted the add-measles-dataset branch June 4, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants