Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into yellow-fever-dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
ivan-aksamentov committed Aug 2, 2024
2 parents adfceaf + 6890959 commit ba20e00
Show file tree
Hide file tree
Showing 44 changed files with 114,324 additions and 3,715 deletions.
1 change: 1 addition & 0 deletions data/nextstrain/collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
"nextstrain/rsv/a/EPI_ISL_412866",
"nextstrain/rsv/b/EPI_ISL_1653999",
"nextstrain/mpox/all-clades",
"nextstrain/mpox/clade-i",
"nextstrain/mpox/clade-iib",
"nextstrain/mpox/lineage-b.1",
"nextstrain/flu/h3n2/pb1",
Expand Down
3 changes: 3 additions & 0 deletions data/nextstrain/mpox/clade-i/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 2024-08-01T22:31:31Z

Initial release of this dataset.
28 changes: 28 additions & 0 deletions data/nextstrain/mpox/clade-i/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Nextclade dataset for "Mpox virus (Clade I)"

| Key | Value |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
| data source | Genbank |
| workflow | [github.com/nextstrain/mpox/nextclade](https://github.com/nextstrain/mpox/nextclade) |
| nextclade dataset path | nextstrain/mpox/clade-i |
| reference | [DQ011155.1](https://www.ncbi.nlm.nih.gov/nuccore/DQ011155.1), isolate `Zaire_1979-005`, an early complete clade I sequence |
| annotation | based on [DQ011155.1](https://www.ncbi.nlm.nih.gov/nuccore/DQ011155.1), but with genes called by modern names (OPGXXX) |
| clade definitions | [github.com/mpxv-lineages/lineage-designation](https://github.com/mpxv-lineages/lineage-designation) |
| related datasets | Mpox virus (All clades): `nextstrain/mpox/all-clades`<br>Mpox virus (clade IIb) `nextstrain/mpox/clade-iib`<br>Mpox virus (Lineage B.1 within clade IIb) `nextstrain/mpox/lineage-b.1` |

## Scope of this dataset

This dataset is for Mpox viruses of clade I (Ia and Ib). A broader dataset for all clades I, IIa and IIb is available under `nextstrain/mpox/all-clades`.

## Reference sequence and reference tree

The reference used in this dataset is [DQ011155.1](https://www.ncbi.nlm.nih.gov/nuccore/DQ011155.1), an early complete clade I sequence (Isolate `Zaire_1979-005`).

This is in contrast to the reference used in the other Nextclade mpox datasets, which use a clade IIb reference sequence.

The reference tree consists of all good quality clade I sequences available within Genbank at the time of dataset creation (with identical sequences deduplicated to 1), as well as 3 outgroup genomes (a reconstructed ancestor of all clades, and one sequence for each of clade IIa and clade IIb).

## Further reading

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
381 changes: 381 additions & 0 deletions data/nextstrain/mpox/clade-i/genome_annotation.gff3

Large diffs are not rendered by default.

75 changes: 75 additions & 0 deletions data/nextstrain/mpox/clade-i/pathogen.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{
"alignmentParams": {
"excessBandwidth": 100,
"terminalBandwidth": 300,
"allowedMismatches": 8,
"windowSize": 40,
"minSeedCover": 0.1,
"gapAlignmentSide": "left"
},
"attributes": {
"name": "Mpox virus (Clade I)",
"reference accession": "DQ011155.1",
"reference name": "Zaire_1979-005"
},
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
},
"deprecated": false,
"enabled": true,
"experimental": false,
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"official": true,
"qc": {
"frameShifts": {
"enabled": true,
"ignoredFrameShifts": [
],
"scoreWeight": 20
},
"missingData": {
"enabled": true,
"missingDataThreshold": 20000,
"scoreBias": 1000
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 40
},
"privateMutations": {
"cutoff": 50,
"enabled": true,
"typical": 5,
"weightLabeledSubstitutions": 6,
"weightReversionSubstitutions": 6,
"weightUnlabeledSubstitutions": 1
},
"snpClusters": {
"clusterCutOff": 5,
"enabled": true,
"scoreWeight": 10,
"windowSize": 1000
},
"stopCodons": {
"enabled": true,
"ignoredStopCodons": [
],
"scoreWeight": 40
}
},
"schemaVersion": "3.0.0",
"shortcuts": [
],
"version": {
"tag": "unreleased"
}
}
2 changes: 2 additions & 0 deletions data/nextstrain/mpox/clade-i/reference.fasta

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions data/nextstrain/mpox/clade-i/sequences.fasta

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/nextstrain/mpox/clade-i/tree.json

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions data/nextstrain/rsv/a/EPI_ISL_412866/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## 2024-08-01T22:31:31Z

- add subclades A.D.1.4-8
- add subclades A.D.3.2-6, add representatives to A.D.3.1
- add subclade A.D.5.4, adjust definition of A.D.5.3 to make it a clear sibling


## 2024-01-29T10:29:43Z

- fix definitions of G_clades (legacy) for RSV-A and RSV-B
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/rsv/a/EPI_ISL_412866/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
| ---------------------- | --------------------------------------------------------------------------------------------------------------------|
| authors | [Richard Neher](https://neherlab.org), Laura Urbanska, [Nextstrain](https://nextstrain.org) |
| data source | Genbank + authorized other sequences |
| workflow | [github.com/nextstrain/rsv/nextclade](https://github.com/nextstrain/rsv/nextclade) |
| workflow | [github.com/nextstrain/rsv](https://github.com/nextstrain/rsv) |
| nextclade dataset path | nextstrain/rsv/a/EPI_ISL_412866 |
| reference | EPI_ISL_412866 |
| clade definitions | [github.com/rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A) |
Expand Down
14 changes: 12 additions & 2 deletions data/nextstrain/rsv/a/EPI_ISL_412866/pathogen.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,12 @@
},
"stopCodons": {
"enabled": true,
"ignoredStopCodons": []
"ignoredStopCodons": [
{
"codon": 320,
"cdsName": "G"
}
]
}
},
"cdsOrderPreference": [
Expand Down Expand Up @@ -88,5 +93,10 @@
"name": "RSV-A",
"reference accession": "EPI_ISL_412866",
"reference name": "hRSV/A/England/397/2017"
}
},
"geneOrderPreference": [
"F",
"G",
"L"
]
}
Loading

0 comments on commit ba20e00

Please sign in to comment.