Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CCHFV to nextclade_data. #199

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

anna-parker
Copy link
Contributor

Using the https://github.com/neherlab/CCHFV repository and NCBI Virus I was able to create nextclade_data sets for CCHFV which can then be used by nextclade run.

Auspice trees for the three segments can be built

    • independently from each other
    • dependent on each other (choosing only samples with all segments to allow for the creation of tanglegrams)
    • dependent on each other with additional recombination site inference (using TreeKnit to infer ARGs we can better estimate branch lengths).

I chose the second option for now - but this can be changed.

Additionally, I chose to only name 3 genes: RdRp(RNA-dependent RNA polymerase, product: putative polyprotein) and GCP (product: glycoprotein precursor) and NP (product: nucleoprotein).

Potentially we would like to also name the non-structural S protein (NSS), details in https://www.mdpi.com/1999-4915/8/4/106.

Copy link
Member

@corneliusroemer corneliusroemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented May 15, 2024

@anna-parker I fixed a few bugs in file declarations and added changelogs. We disabled automated CI for forks due to security concerns, so I pushed the processed files (data_output/) myself.

The datasets can be accessed if you provide --server CLI arg or dataset-server URL param:

https://clades.nextstrain.org/?dataset-server=gh:anna-parker/nextclade_data@cchfv@data_output&dataset-name=nextstrain/cchfv/linked/S

If you have access to nextstrain org, then it makes sense to work directly in the nextstrain/nextclade_data repo. This way checks will run automatically. If you don't have it, Richard can probably arrange it.

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented May 15, 2024

Please thoroughly consider:

  • which collection/organization you want this dataset to be in. Right now it's in nextstrain collection, even though you are pushing from a fork. For third parties we recommend using community collection. This is mostly political, and to avoid dramas like: who will be allowed to make changes? who will maintain it? who decides what the clades/lineages are if there's no consensus?
  • path of each dataset. In particular with relation of what clades/flavors/hosts are there now and which ones you want to add in the future. This is a technical & bioinformatics decision. Paths are immutable you cannot change paths or delete datasets later. See the docs/ for more details.
  • if the dataset is not well tested and/or if there's any concerns with regards to quality or correctness, then it is appropriate to set .attributes.experimental = true in pathogen.json

@anna-parker
Copy link
Contributor Author

Thanks so much @ivan-aksamentov! @rneher do you have any concerns about CCHFV being in the nextstrain collection? We can also discuss offline if that is easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants