-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
- Loading branch information
Showing
7 changed files
with
12,052 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
## Unreleased | ||
|
||
Initial release of yellow fever virus dataset. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Yellow fever virus dataset | ||
|
||
| Key | Value | | ||
| ----------------- | -----------------------------------------------------------------| | ||
| name | Yellow fever virus (YFV) prM-E region | | ||
| authors | [Nextstrain](https://nextstrain.org) | | ||
| reference | AY640589.1 | | ||
| workflow | <https://github.com/nextstrain/yellow-fever/tree/main/nextclade> | | ||
| path | `nextstrain/yellow-fever/prM-E` | | ||
|
||
## Scope of this dataset | ||
|
||
This dataset assigns genotypes to yellow fever virus samples based on | ||
strain and genotype information from [Mutebi et al.][] (J Virol. 2001 | ||
Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75) | ||
|
||
These two papers, collectively, define 7 distinct yellow fever virus | ||
genotypes based on a 670 nucleotide region of the yellow fever virus | ||
genome, (bases 641-1310), called the prM-E region. This region | ||
comprises the 3' end of the pre-membrane protein (prM) gene, the | ||
entire membrane protein (M) gene, and the 5' end of the envelope | ||
protein (E) gene. | ||
|
||
(N.b., the reference sequence used in this data set is actually 672nt | ||
long, from bases 641-1312 of the genome reference. The 2 extra bases | ||
make the reference an complete open reading frame.) | ||
|
||
This dataset can be used to assign genotypes to any sequence that | ||
includes at least 500 bp of the prM-E region, including whole genome | ||
sequences. Sequence data beyond the prM-E region will be reported as an | ||
insertion in the Nextclade output. | ||
|
||
## Features | ||
|
||
This dataset supports: | ||
|
||
- Assignment of genotypes | ||
- Phylogenetic placement | ||
- Sequence quality control (QC) | ||
|
||
## What are Nextclade datasets | ||
|
||
Read more about Nextclade datasets in the Nextclade documentation: | ||
<https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html> | ||
|
||
[Mutebi et al.]: https://pubmed.ncbi.nlm.nih.gov/11435580/ | ||
[Bryant et al.]: https://pubmed.ncbi.nlm.nih.gov/17511518/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
##sequence-region prM-E 1 672 | ||
NC_002031.1 feature source 1 672 . + . gene=nuc | ||
NC_002031.1 feature gene 1 333 . + . gene_name=prM | ||
NC_002031.1 feature gene 109 333 . + . gene_name=M | ||
NC_002031.1 feature gene 334 672 . + . gene_name=E |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
{ | ||
"files": { | ||
"reference": "reference.fasta", | ||
"pathogenJson": "pathogen.json", | ||
"genomeAnnotation": "genome_annotation.gff3", | ||
"treeJson": "tree.json", | ||
"examples": "sequences.fasta", | ||
"readme": "README.md", | ||
"changelog": "CHANGELOG.md" | ||
}, | ||
"attributes": { | ||
"name": "Yellow fever virus (YFV) prM-E region", | ||
"reference name": "Asibi", | ||
"reference accession": "AY640589.1" | ||
}, | ||
"schemaVersion": "3.0.0", | ||
"alignmentParams": { | ||
"minSeedCover": 0.01, | ||
"minLength": 500 | ||
}, | ||
"qc": { | ||
"missingData": { | ||
"enabled": true, | ||
"missingDataThreshold": 20, | ||
"scoreBias": 4 | ||
}, | ||
"mixedSites": { | ||
"enabled": true, | ||
"mixedSitesThreshold": 4 | ||
}, | ||
"frameShifts": { | ||
"enabled": true | ||
}, | ||
"stopCodons": { | ||
"enabled": true | ||
}, | ||
"privateMutations": { | ||
"enabled": true, | ||
"cutoff": 8, | ||
"typical": 2, | ||
"weightLabeledSubstitutions": 1, | ||
"weightReversionSubstitutions": 1, | ||
"weightUnlabeledSubstitutions": 1 | ||
}, | ||
"snpClusters": { | ||
"enabled": true, | ||
"clusterCutOff": 3, | ||
"scoreWeight": 50, | ||
"windowSize": 50 | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
> prM-E region (genome 641-1312, 672 nt) | ||
CCAAGAGAGGAGCCAGATGACATTGATTGCTGGTGCTATGGGGTGGAAAACGTTAGAGTC | ||
GCATATGGTAAGTGTGACTCAGCAGGCAGGTCTAGGAGGTCAAGAAGGGCCATTGACTTG | ||
CCTACGCATGAAAACCATGGTTTGAAGACCCGGCAAGAAAAATGGATGACTGGAAGAATG | ||
GGTGAAAGGCAACTCCAAAAGATTGAGAGATGGCTCGTGAGGAACCCCTTTTTTGCAGTG | ||
ACAGCTCTGACCATTGCCTACCTTGTGGGAAGCAACATGACGCAACGAGTCGTGATTGCC | ||
CTACTGGTCTTGGCTGTTGGTCCGGCCTACTCAGCTCACTGCATTGGAATTACTGACAGG | ||
GATTTCATTGAGGGGGTGCATGGAGGAACTTGGGTTTCAGCTACCCTGGAGCAAGACAAG | ||
TGTGTCACTGTTATGGCCCCTGACAAGCCTTCATTGGACATCTCACTAGAGACAGTAGCC | ||
ATTGATGGACCTGCTGAGGCGAGGAAAGTGTGTTACAATGCAGTTCTCACTCATGTGAAG | ||
ATTAATGACAAGTGCCCCAGCACTGGAGAGGCCCACCTAGCTGAAGAGAACGAAGGGGAC | ||
AATGCGTGCAAGCGCACTTATTCTGATAGAGGCTGGGGCAATGGCTGTGGCCTATTTGGG | ||
AAAGGGAGCATT |
4,380 changes: 4,380 additions & 0 deletions
4,380
data/nextstrain/yellow-fever/prM-E/sequences.fasta
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.