Skip to content

Commit

Permalink
Add yellow fever virus dataset
Browse files Browse the repository at this point in the history
Built from [nextclade workflow in yellow fever repo](nextstrain/yellow-fever#10)
  • Loading branch information
genehack committed Aug 2, 2024
1 parent 67810f6 commit 5fe9274
Show file tree
Hide file tree
Showing 7 changed files with 12,052 additions and 0 deletions.
3 changes: 3 additions & 0 deletions data/nextstrain/yellow-fever/prM-E/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release of yellow fever virus dataset.
47 changes: 47 additions & 0 deletions data/nextstrain/yellow-fever/prM-E/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Yellow fever virus dataset

| Key | Value |
| ----------------- | -----------------------------------------------------------------|
| name | Yellow fever virus (YFV) prM-E region |
| authors | [Nextstrain](https://nextstrain.org) |
| reference | AY640589.1 |
| workflow | <https://github.com/nextstrain/yellow-fever/tree/main/nextclade> |
| path | `nextstrain/yellow-fever/prM-E` |

## Scope of this dataset

This dataset assigns genotypes to yellow fever virus samples based on
strain and genotype information from [Mutebi et al.][] (J Virol. 2001
Aug;75(15):6999-7008) and [Bryant et al.][] (PLoS Pathog. 2007 May 18;3(5):e75)

These two papers, collectively, define 7 distinct yellow fever virus
genotypes based on a 670 nucleotide region of the yellow fever virus
genome, (bases 641-1310), called the prM-E region. This region
comprises the 3' end of the pre-membrane protein (prM) gene, the
entire membrane protein (M) gene, and the 5' end of the envelope
protein (E) gene.

(N.b., the reference sequence used in this data set is actually 672nt
long, from bases 641-1312 of the genome reference. The 2 extra bases
make the reference an complete open reading frame.)

This dataset can be used to assign genotypes to any sequence that
includes at least 500 bp of the prM-E region, including whole genome
sequences. Sequence data beyond the prM-E region will be reported as an
insertion in the Nextclade output.

## Features

This dataset supports:

- Assignment of genotypes
- Phylogenetic placement
- Sequence quality control (QC)

## What are Nextclade datasets

Read more about Nextclade datasets in the Nextclade documentation:
<https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html>

[Mutebi et al.]: https://pubmed.ncbi.nlm.nih.gov/11435580/
[Bryant et al.]: https://pubmed.ncbi.nlm.nih.gov/17511518/
5 changes: 5 additions & 0 deletions data/nextstrain/yellow-fever/prM-E/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
##sequence-region prM-E 1 672
NC_002031.1 feature source 1 672 . + . gene=nuc
NC_002031.1 feature gene 1 333 . + . gene_name=prM
NC_002031.1 feature gene 109 333 . + . gene_name=M
NC_002031.1 feature gene 334 672 . + . gene_name=E
52 changes: 52 additions & 0 deletions data/nextstrain/yellow-fever/prM-E/pathogen.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"files": {
"reference": "reference.fasta",
"pathogenJson": "pathogen.json",
"genomeAnnotation": "genome_annotation.gff3",
"treeJson": "tree.json",
"examples": "sequences.fasta",
"readme": "README.md",
"changelog": "CHANGELOG.md"
},
"attributes": {
"name": "Yellow fever virus (YFV) prM-E region",
"reference name": "Asibi",
"reference accession": "AY640589.1"
},
"schemaVersion": "3.0.0",
"alignmentParams": {
"minSeedCover": 0.01,
"minLength": 500
},
"qc": {
"missingData": {
"enabled": true,
"missingDataThreshold": 20,
"scoreBias": 4
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 4
},
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true
},
"privateMutations": {
"enabled": true,
"cutoff": 8,
"typical": 2,
"weightLabeledSubstitutions": 1,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"snpClusters": {
"enabled": true,
"clusterCutOff": 3,
"scoreWeight": 50,
"windowSize": 50
}
}
}
13 changes: 13 additions & 0 deletions data/nextstrain/yellow-fever/prM-E/reference.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
> prM-E region (genome 641-1312, 672 nt)
CCAAGAGAGGAGCCAGATGACATTGATTGCTGGTGCTATGGGGTGGAAAACGTTAGAGTC
GCATATGGTAAGTGTGACTCAGCAGGCAGGTCTAGGAGGTCAAGAAGGGCCATTGACTTG
CCTACGCATGAAAACCATGGTTTGAAGACCCGGCAAGAAAAATGGATGACTGGAAGAATG
GGTGAAAGGCAACTCCAAAAGATTGAGAGATGGCTCGTGAGGAACCCCTTTTTTGCAGTG
ACAGCTCTGACCATTGCCTACCTTGTGGGAAGCAACATGACGCAACGAGTCGTGATTGCC
CTACTGGTCTTGGCTGTTGGTCCGGCCTACTCAGCTCACTGCATTGGAATTACTGACAGG
GATTTCATTGAGGGGGTGCATGGAGGAACTTGGGTTTCAGCTACCCTGGAGCAAGACAAG
TGTGTCACTGTTATGGCCCCTGACAAGCCTTCATTGGACATCTCACTAGAGACAGTAGCC
ATTGATGGACCTGCTGAGGCGAGGAAAGTGTGTTACAATGCAGTTCTCACTCATGTGAAG
ATTAATGACAAGTGCCCCAGCACTGGAGAGGCCCACCTAGCTGAAGAGAACGAAGGGGAC
AATGCGTGCAAGCGCACTTATTCTGATAGAGGCTGGGGCAATGGCTGTGGCCTATTTGGG
AAAGGGAGCATT
4,380 changes: 4,380 additions & 0 deletions data/nextstrain/yellow-fever/prM-E/sequences.fasta

Large diffs are not rendered by default.

Loading

0 comments on commit 5fe9274

Please sign in to comment.