Skip to content

Commit

Permalink
feat: use nightly nextclade tree
Browse files Browse the repository at this point in the history
- [x] switch Nextclade dataset in directory format (which allows to replace dataset files)
- [x] replace reference tree in the dataset with the nightly tree from https://nextstrain.org/staging/nextclade/sars-cov-2

This allows to bypass laggy Nextclade dataset updates and use the latest data always. Which may or may not be what we want.

This aims to be a workaround until the dataset updates are sorted out.

Potential problems:
- nightly trees are not systematically reviewed and can contain bugs
- does any other parts of the dataset need to be updated along with the tree? (such as pathogen.json)
- does any other repos need to be updated to use nightly tree? (e.g. ncov-ingest) i.e. is there an assumption that the exact same dataset is used in 2 or more places?
  • Loading branch information
ivan-aksamentov committed Jan 28, 2025
1 parent 8beaf39 commit c3cd0a3
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions workflow/snakemake_rules/main_workflow.smk
Original file line number Diff line number Diff line change
Expand Up @@ -455,14 +455,20 @@ rule prepare_nextclade:
Downloading reference files for nextclade (used for alignment and qc).
"""
output:
nextclade_dataset = "data/sars-cov-2-nextclade-defaults.zip",
nextclade_dataset = "data/sars-cov-2-nextclade-defaults",
params:
name = config["nextclade_dataset"],
conda: config["conda_environment"]
shell:
r"""
nextclade --version
nextclade dataset get --name {params.name} --output-zip {output.nextclade_dataset}
nextclade dataset get --name {params.name} --output-dir {output.nextclade_dataset}
# override tree.json with nightly tree
curl -fsSL \
-o {output.nextclade_dataset}/tree.json \
-H "Accept: application/vnd.nextstrain.dataset.main+json;q=1, application/json;q=0.9, text/plain;q=0.8, */*;q=0.1" \
"https://nextstrain.org/staging/nextclade/sars-cov-2"
"""

rule build_align:
Expand All @@ -473,7 +479,7 @@ rule build_align:
"""
input:
sequences = rules.combine_samples.output.sequences,
nextclade_dataset = "data/sars-cov-2-nextclade-defaults.zip",
nextclade_dataset = "data/sars-cov-2-nextclade-defaults",
output:
alignment = "results/{build_name}/aligned.fasta",
nextclade_qc = 'results/{build_name}/nextclade_qc.tsv',
Expand Down

0 comments on commit c3cd0a3

Please sign in to comment.