-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #245 from nextstrain/2024-11-23_rsv_update
rsv: update rsv datasets with more recent data and the B.D.E.1.1 clad…
- Loading branch information
Showing
24 changed files
with
11,225 additions
and
4,650 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,8 @@ | ||
## Unreleased | ||
|
||
- update reference tree with more recent data | ||
|
||
|
||
## 2024-08-01T22:31:31Z | ||
|
||
- add subclades A.D.1.4-8 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5,234 changes: 2,851 additions & 2,383 deletions
5,234
data/nextstrain/rsv/a/EPI_ISL_412866/sequences.fasta
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4,549 changes: 2,291 additions & 2,258 deletions
4,549
data/nextstrain/rsv/b/EPI_ISL_1653999/sequences.fasta
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22 changes: 22 additions & 0 deletions
22
data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/CHANGELOG.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
## Unreleased | ||
|
||
- update reference tree with more recent data | ||
|
||
|
||
## 2024-08-01T22:31:31Z | ||
|
||
- add subclades A.D.1.4-8 | ||
- add subclades A.D.3.2-6, add representatives to A.D.3.1 | ||
- add subclade A.D.5.4, adjust definition of A.D.5.3 to make it a clear sibling | ||
|
||
|
||
## 2024-01-29T10:29:43Z | ||
|
||
- fix definitions of G_clades (legacy) for RSV-A and RSV-B | ||
|
||
|
||
## 2024-01-16T20:31:02Z | ||
|
||
**first release of v3 dataset.** | ||
|
||
Updated consortium nomenclature. |
21 changes: 21 additions & 0 deletions
21
data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# RSV-A dataset with reference genome A/England/397/2017 | ||
|
||
| Key | Value | | ||
| ---------------------- | --------------------------------------------------------------------------------------------------------------------| | ||
| authors | [Richard Neher](https://neherlab.org), Laura Urbanska, [Nextstrain](https://nextstrain.org) | | ||
| data source | Genbank + authorized other sequences | | ||
| workflow | [github.com/nextstrain/rsv](https://github.com/nextstrain/rsv) | | ||
| nextclade dataset path | nextstrain/rsv/a/EPI_ISL_412866 | | ||
| reference | EPI_ISL_412866 | | ||
| clade definitions | [github.com/rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A) | | ||
|
||
## Scope of this dataset | ||
This dataset for RSV-A uses reference sequence A/England/397/2017 with is available at under accession number EPI_ISL_412866 in GISAID. An almost identical sequence (slightly longer, 12 mutations, no gaps or indels) is available in NCBI as [LR699737](https://www.ncbi.nlm.nih.gov/nuccore/LR699737). | ||
This sequence has the duplication in the G-protein shared by all currently circulating variants. | ||
The reference tree covers the diversity of RSV-A since the first sequenced samples. | ||
|
||
|
||
## Nomenclature | ||
The dataset follows the consortium nomenclature established in 2023 that uses a combination of letters and numbers to designate lineages in a hierarchical fashion. | ||
Definitions of individuals lineages are available on github in the repository [rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A). | ||
Legacy clade definitions for the nomenclature defined by Goya et al (`G_clade`) are included for orientation. These clade definitions will not be updated and are incomplete. We encourage users to use the new consortium nomenclature. |
Binary file not shown.
16 changes: 16 additions & 0 deletions
16
data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/genome_annotation.gff3
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##gff-version 3 | ||
##sequence-region EPI_ISL_412866 1 15225 | ||
EPI_ISL_412866 annotation remark 1 15225 . . . molecule_type=cRNA;organism=Human orthopneumovirus;taxonomy=Viruses,Riboviria,Orthornavirae,Negarnaviricota,Haploviricotina,Monjiviricetes,Mononegavirales,Pneumoviridae,Orthopneumovirus,Orthopneumovirus hominis | ||
EPI_ISL_412866 feature source 1 15225 . . . mol_type=viral cRNA;organism=Human orthopneumovirus | ||
EPI_ISL_412866 feature 5'UTR 1 15 . . . citation=%5B1%5D;function=Leader region 5%27UTR | ||
EPI_ISL_412866 feature CDS 70 489 . . 0 codon_start=1;db_xref=GeneID:37607636;gene=NS1;gene_name=NS1;Name=NS1;product=nonstructural protein 1;protein_id=YP_009518850.1;translation=MGSNSLSMIKVRLQNLFDNDEVALLKITCYTDKLIQLTNALAKAVIHTIKLNGIVFVHVITSSDICPNNNIVVKSNFTTMPVLQNGGYIWEMMELTHCSQPNGLIDDNCEIKFSKKLSDSTMTNYMNQLSELLGFDLNP%2A | ||
EPI_ISL_412866 feature CDS 599 973 . . 0 codon_start=1;db_xref=GeneID:37607637;gene=NS2;gene_name=NS2;Name=NS2;product=nonstructural protein 2;protein_id=YP_009518851.1;translation=MDTTHNDTTPQRLMITDMRPLSLETIITSLTRDIITHKFIYLINHECIVRKLDERQATFTFLVNYEMKLLHKVGSTKYKKYTEYNTKYGTFPMPIFINHDGFLECIGIKPTKHTPIIYKYDLNP%2A | ||
EPI_ISL_412866 feature CDS 1111 2286 . . 0 codon_start=1;db_xref=GeneID:37607638;gene=N;gene_name=N;Name=N;product=nucleoprotein;protein_id=YP_009518852.1;translation=MALSKVKLNDTLNKDQLLSSSKYTIQRSTGDSIDTPNYDVQKHINKLCGMLLITEDANHKFTGLIGMLYAMSRLGREDTIKILKDAGYHVKANGVDVTTHRQDINGKEMKFEVLTLASLTTEIQINIEIESRKSYKKMLKEMGEVAPEYRHDSPDCGMIILCIAALVITKLAAGDRSGLTAVIRRANNVLKNEMKRYKGLLPKDIANSFYEVFEKYPHFIDVFVHFGIAQSSTRGGSRVEGIFAGLFMNAYGAGQVMLRWGVLAKSVKNIMLGHASVQAEMEQVVEVYEYAQKLGGEAGFYHILNNPKASLLSLTQFPHFSSVVLGNAAGLGIMGEYRGTPRNQDLYDAAKVYAEQLKENGVINYSVLDLTAEELEAIKHQLNPKDNDVEL%2A | ||
EPI_ISL_412866 feature CDS 2318 3043 . . 0 codon_start=1;db_xref=GeneID:37607639;gene=P;gene_name=P;Name=P;product=phosphoprotein;protein_id=YP_009518853.1;translation=MEKFAPEFHGEDANNRATKFLESIKGKFTSPKDPKKKDSIISVNSIDIEVTKESLITSNSTIINPINETDDTVGNKPNYQRKPLVSFKEDPTPSDNPFSKLYKETIETFDNNEEESSYSYEEINDQTNDNITARLDRIDEKLSEILGMLHTLVVASAGPTSARDGIRDAMVGLREEMIEKIRTEALMTNDRLEAMARLRNEESEKMAKDTSDEVSLNPTSEKLNNLLEGNDSDNDLSLEDF%2A | ||
EPI_ISL_412866 feature CDS 3226 3996 . . 0 codon_start=1;db_xref=GeneID:37607640;gene=M;gene_name=M;Name=M;product=matrix protein;protein_id=YP_009518854.1;translation=METYVNKLHEGSTYTAAVQYNVLEKDDDPASLTIWVPMFQSSMPADLLIKELANVNILVKQISTPKGPSLRVMINSRSAVLAQMPSKFTICANVSLDERSKLAYDVTTPCEIKACSLTCLKSKNMLTTVKDLTMKTLNPTHDIIALCEFENIVTSKKVIIPTYLRSISVRNKDLNTLENITTTEFKNAITNAKIIPYSGLLLVITVTDNKGAFKYIKPQSQFIVDLGAYLEKESIYYVTTNWKHTATRFAIKPMED%2A | ||
EPI_ISL_412866 feature CDS 4266 4460 . . 0 codon_start=1;db_xref=GeneID:37607641;gene=SH;gene_name=SH;Name=SH;product=small hydrophobic protein;protein_id=YP_009518855.1;translation=MENTSITIEFSSKFWPYFTLIHMITTIISLIIIISIMIAILNKLCEYNVFHNKTFELPRARVNT%2A | ||
EPI_ISL_412866 feature CDS 4652 5617 . . 0 codon_start=1;db_xref=GeneID:37607642;gene=G;gene_name=G;Name=G;product=attachment glycoprotein;protein_id=YP_009518856.1;translation=MSKTKDQRTAKTLERTWDTLNHLLFISSCLYKLNLKSIAQITLSILAMIISTSLIIAAIIFIASANHKVTPTTAIIQDATNQIKNTTPTHLTQNPQLGISLSNLSGTTSQSTTILASTTPSAESTPQSTTVKIINTTTTQILPSKPTTKQRQNKPQNKPNNDFHFEVFNFVPCSICSNNPTCWAICKRIPNKKPGKKTTTKPTKKPTLKTTKKDPKPQTTKPKGVLTTKPTGKPTINTTKTNSRTTLLTSNTKGNPEHTSQKETIHSTTSEGYPSPSQVYTTSDQEETLHSTTSEGYPSPSQVYTTSEYLSQSLSSSNTTK%2A | ||
EPI_ISL_412866 feature CDS 5697 7421 . . 0 codon_start=1;db_xref=GeneID:37607643;gene=F;gene_name=F;Name=F;product=fusion glycoprotein;protein_id=YP_009518857.1;translation=MELPILKTNAITTILAAVTLCFASSQNITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKENKCNGTDAKVKLIKQELDKYKNAVTELQLLMQSTPAANSRARRELPRFMNYTLNNTKNTNVTLSKKRKRRFLGFLLGVGSAIASGIAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSISNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSSNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNIDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDEFDASISQVNEKINQSLAFIRKSDELLHNVNAGKSTTNIMITTIIIVIIVILLALIAVGLLLYCKARSTPVTLSKDQLSGINNIAFSN%2A | ||
EPI_ISL_412866 feature CDS 7640 8224 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-1;Name=M2-1;note=ORF 1%2C matrix protein 2;product=M2-1 protein;protein_id=YP_009518858.1;translation=MSRRNPCKFEIRGHCLNGKRCHFSHNYFEWPPHALLVRQNFMLNRILKSMDKSIDTLSEISGAAELDRTEEYALGVVGVLESYIGSINNITKQSACVAMSKLLTELNSDDIKKLRDNEEPNSPKVRVYNTVISYIESNRKNNKQTIHLLKRLPADVLKKTIKNTLDIHKSITINNSKESTVSDTNDHAKNNDTT%2A | ||
EPI_ISL_412866 feature CDS 8193 8465 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-2;Name=M2-2;note=ORF 2%2C RNA processivity factor;product=M2-2 protein;protein_id=YP_009518859.1;translation=TTMPKIMILPDKYPCSINSILITSNYRVTMYNQKNTLYINQNNQNSHIYPPDQPFNEIHWTSQDLIDATQNFLQHLGITDDIYTIYILVS%2A | ||
EPI_ISL_412866 feature CDS 8532 15029 . . 0 codon_start=1;db_xref=GeneID:37607645;gene=L;gene_name=L;Name=L;note=RNA dependant RNA polymerase%3B RdRp;product=polymerase protein;protein_id=YP_009518860.1;translation=MDPIISGNSANVYLTDSYLKGVISFSECNALGSYIFNGPYLKNDYTNLISRQNPLIEHINLKKLNITQSLISKYHKGEIKIEEPTYFQSLLMTYKSMTSSEQTTTTNLLKKIIRRAIEISDVKVYAILNKLGLKEKDKIKSNNGQDEDNSVITTIIKDDILLAVKDNQSHPKADKNQSTKQKDTIKTTLLKKLMCSMQHPPSWLIHWFNLYTKLNSILTQYRSSEVKNHGFILIDNHTLSGFQFILNQYGCIVYHRELKRITVTTYNQFLTWKDISLSRLNVCLITWISNCLNTLNKSLGLRCGFNNVILTQLFLYGDCILKLFHNEGFYIIKEVEGFIMSLILNITEEDQFRKRFYNSMLNNITDAANKAQKNLLSRVCHTLLDKTISDNIINGRWIILLSKFLKLIKLAGDNNLNNLSELYFLFRIFGHPMVDERQAMDAVKVNCNETKFYLLSSLSMLRGAFIYRIIKGFVNNYNRWPTLRNAIVLPLRWLTYYKLNTYPSLLELTERDLIVLSGLRFYREFRLPKKVDLEMIINDKAISPPKNLIWTSFPRNYMPSHIQNYIEHEKLKFSDSDKSRRVLEYYLRDNKFNECDLYNCVVNQSYLNNPNHVVSLTGKERELSVGRMFAMQPGMFRQVQILAEKMIAENILQFFPESLTRYGDLELQKILELKAGISNKSNRYNDNYNNYISKCSIITDLSKFNQAFRYETSCICSDVLDELHGVQSLFSWLHLTIPHVTIICTYRHAPPYIKDHIVDLNNVDEQSGLYRYHMGGIEGWCQKLWTIEAISLLDLISLKGKFSITALINGDNQSIDISKPVRLMEGQTHAQADYLLALNSLKLLYKEYAGIGHKLKGTETYISRDMQFMSKTIQHNGVYYPASIKKVLRVGPWINTILDDFKVSLESIGSLTQELEYRGESLLCSLIFRNVWLYNQIALQLKNHALCNNKLYLDILKVLKHLKTFFNLDNIDTALTLYMNLPMLFGGGDPNLLYRSFYRRTPDFLTEAIVHSVFILSYYTNHDLKDKLQDLSDDRLNKFLTCIITFDKNPNAEFVTLMRDPQALGSERQAKITSEINRLAVTEVLSTAPNKIFSKSAQHYTTTEIDLNDIMQNIEPTYPHGLRVVYESLPFYKAEKIVNLISGTKSITNILEKTSAIDLTDIDRATEMMRKNITLLIRILPLDCNRDKREILSMENLSITELSKYVRERSWSLSNIVGVTSPSIMYTMDIKYTTSTIASGIIIEKYNVNSLTRGERGPTKPWVGSSTQEKKTMPVYNRQVLTKKQRDQIDLLAKLDWVYASIDNKDEFMEELSIGTLGLTYEKAKKLFPQYLSVNYLHRLTVSSRPCEFPASIPAYRTTNYHFDTSPINRILTEKYGDEDIDIVFQNCISFGLSLMSVVEQFTNVCPNRIILIPKLNEIHLMKPPIFTGDVDIHKLKLVIQKQHMFLPDKISLTQYVELFLSNKTLKSGSNVNSNLILAHKISDYFHNTYILSTNLAGHWILIIQLMKDSKGIFEKDWGEGYITDHMFINLKVFFNAYKTYLLCFHKGYGRAKLECDMNTSDLLCVLELIDSSYWKSMSKVFLEQKVIKYILSQDASLHRVKGCHSFKLWFLKRLNVAEFTVCPWVVNIDYHPTHMKAILTYIDLVRMGLINIDRIYIKNKHKFNDEFYTSNLFYINYNFSDNTHLLTKHIRIANSELESNYNKLYHPTPETLENILTNPVKNNEKKTLSGYCIGKNVDSIMLPSLSNKKLIKSSTMIRTNYSRQDLYNLFPTVVIDKIIDHSGNTAKSNQLYTTTSHQISLVHNSTSLYCMLPWHHINRFNFVFSSTGCKISIEYILKDLKIKDPNCIAFIGEGAGNLLLRTVVELHPDIRYIYRSLKDCNDHSLPIEFLRLYNGHINIDYGENLTIPATDATNNIHWSYLHIKFAEPISLFVCDAELPVTVNWSKIIIEWSKHVRKCKYCSSVNKCTLIVKYHAQDDIDFKLDNITILKTYVCLGSKLKGSEVYLVLTIGPANVFPVFNVVQNAKLILSRTKNFIMPKKADKESIDANIKSLIPFLCYPITKKGINTALSKLKSVVSGDILSYSIAGRNEVFSNKLINHKHMNILKWFNHVLNFRSTELNYNHLYMVESTYPHLSELLNSLTTNELKKLIKITGSLLYNFYNE%2A |
109 changes: 109 additions & 0 deletions
109
data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/pathogen.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
{ | ||
"schemaVersion": "3.0.0", | ||
"alignmentParams": { | ||
"excessBandwidth": 9, | ||
"terminalBandwidth": 100, | ||
"allowedMismatches": 4, | ||
"gapAlignmentSide": "left", | ||
"minSeedCover": 0.1 | ||
}, | ||
"compatibility": { | ||
"cli": "3.0.0-alpha.0", | ||
"web": "3.0.0-alpha.0" | ||
}, | ||
"defaultCds": "F", | ||
"files": { | ||
"changelog": "CHANGELOG.md", | ||
"examples": "sequences.fasta", | ||
"genomeAnnotation": "genome_annotation.gff3", | ||
"pathogenJson": "pathogen.json", | ||
"readme": "README.md", | ||
"reference": "reference.fasta", | ||
"treeJson": "tree.json" | ||
}, | ||
"qc": { | ||
"privateMutations": { | ||
"enabled": true, | ||
"typical": 50, | ||
"cutoff": 150, | ||
"weightLabeledSubstitutions": 2, | ||
"weightReversionSubstitutions": 1, | ||
"weightUnlabeledSubstitutions": 1 | ||
}, | ||
"missingData": { | ||
"enabled": false, | ||
"missingDataThreshold": 2000, | ||
"scoreBias": 500 | ||
}, | ||
"snpClusters": { | ||
"enabled": false, | ||
"windowSize": 100, | ||
"clusterCutOff": 10, | ||
"scoreWeight": 50 | ||
}, | ||
"mixedSites": { | ||
"enabled": true, | ||
"mixedSitesThreshold": 8 | ||
}, | ||
"frameShifts": { | ||
"enabled": true | ||
}, | ||
"stopCodons": { | ||
"enabled": true, | ||
"ignoredStopCodons": [ | ||
{ | ||
"codon": 320, | ||
"cdsName": "G" | ||
} | ||
] | ||
} | ||
}, | ||
"cdsOrderPreference": [ | ||
"F", | ||
"G", | ||
"L" | ||
], | ||
"maintenance": { | ||
"website": [ | ||
"https://nextstrain.org", | ||
"https://clades.nextstrain.org" | ||
], | ||
"documentation": [ | ||
"https://github.com/nextstrain/rsv" | ||
], | ||
"source code": [ | ||
"https://github.com/nextstrain/rsv" | ||
], | ||
"issues": [ | ||
"https://github.com/nextstrain/rsv/issues" | ||
], | ||
"organizations": [ | ||
"Nextstrain" | ||
], | ||
"authors": [ | ||
"Nextstrain team <https://nextstrain.org>" | ||
] | ||
}, | ||
"shortcuts": [ | ||
"rsv_a", | ||
"nextstrain/rsv/a", | ||
"nextstrain/rsv/a/hRSV-A-England-397-2017" | ||
], | ||
"attributes": { | ||
"name": "RSV-A", | ||
"reference accession": "EPI_ISL_412866", | ||
"reference name": "hRSV/A/England/397/2017" | ||
}, | ||
"geneOrderPreference": [ | ||
"F", | ||
"G", | ||
"L" | ||
], | ||
"version": { | ||
"tag": "unreleased", | ||
"compatibility": { | ||
"cli": "3.0.0-alpha.0", | ||
"web": "3.0.0-alpha.0" | ||
} | ||
} | ||
} |
Oops, something went wrong.