Skip to content

Commit

Permalink
Merge pull request #245 from nextstrain/2024-11-23_rsv_update
Browse files Browse the repository at this point in the history
rsv: update rsv datasets with more recent data and the B.D.E.1.1 clad…
  • Loading branch information
rneher authored Nov 23, 2024
2 parents c38d427 + cb3be92 commit 6e21f72
Show file tree
Hide file tree
Showing 24 changed files with 11,225 additions and 4,650 deletions.
5 changes: 5 additions & 0 deletions data/nextstrain/rsv/a/EPI_ISL_412866/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

- update reference tree with more recent data


## 2024-08-01T22:31:31Z

- add subclades A.D.1.4-8
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/rsv/a/EPI_ISL_412866/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
| clade definitions | [github.com/rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A) |

## Scope of this dataset
This dataset for RSV-B uses reference sequence A/England/397/2017 with is available at under accession number EPI_ISL_412866 in GISAID. An almost identical sequence (slightly longer, 12 mutations, no gaps or indels) is available in NCBI as [LR699737](https://www.ncbi.nlm.nih.gov/nuccore/LR699737).
This dataset for RSV-A uses reference sequence A/England/397/2017 with is available at under accession number EPI_ISL_412866 in GISAID. An almost identical sequence (slightly longer, 12 mutations, no gaps or indels) is available in NCBI as [LR699737](https://www.ncbi.nlm.nih.gov/nuccore/LR699737).
This sequence has the duplication in the G-protein shared by all currently circulating variants.
The reference tree covers the diversity of RSV-A since the first sequenced samples.

Expand Down
5,234 changes: 2,851 additions & 2,383 deletions data/nextstrain/rsv/a/EPI_ISL_412866/sequences.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/nextstrain/rsv/a/EPI_ISL_412866/tree.json

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions data/nextstrain/rsv/b/EPI_ISL_1653999/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Unreleased

- update reference tree with more recent data.
- include designation of B.D.E.1.1

## 2024-08-01T22:31:31Z

- update of reference tree with additional data. No new clades.
Expand Down
4,549 changes: 2,291 additions & 2,258 deletions data/nextstrain/rsv/b/EPI_ISL_1653999/sequences.fasta

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/nextstrain/rsv/b/EPI_ISL_1653999/tree.json

Large diffs are not rendered by default.

24 changes: 18 additions & 6 deletions data_output/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -1836,6 +1836,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-08-01T22:31:31Z",
"tag": "2024-08-01--22-31-31Z",
Expand All @@ -1862,8 +1869,7 @@
}
],
"version": {
"updatedAt": "2024-08-01T22:31:31Z",
"tag": "2024-08-01--22-31-31Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -1914,9 +1920,9 @@
"treeJson": "tree.json"
},
"capabilities": {
"clades": 17,
"clades": 18,
"customClades": {
"G_clade": 9
"G_clade": 10
},
"qc": [
"privateMutations",
Expand All @@ -1926,6 +1932,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-08-01T22:31:31Z",
"tag": "2024-08-01--22-31-31Z",
Expand All @@ -1952,8 +1965,7 @@
}
],
"version": {
"updatedAt": "2024-08-01T22:31:31Z",
"tag": "2024-08-01--22-31-31Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Unreleased

- update reference tree with more recent data


## 2024-08-01T22:31:31Z

- add subclades A.D.1.4-8
- add subclades A.D.3.2-6, add representatives to A.D.3.1
- add subclade A.D.5.4, adjust definition of A.D.5.3 to make it a clear sibling


## 2024-01-29T10:29:43Z

- fix definitions of G_clades (legacy) for RSV-A and RSV-B


## 2024-01-16T20:31:02Z

**first release of v3 dataset.**

Updated consortium nomenclature.
21 changes: 21 additions & 0 deletions data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# RSV-A dataset with reference genome A/England/397/2017

| Key | Value |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------|
| authors | [Richard Neher](https://neherlab.org), Laura Urbanska, [Nextstrain](https://nextstrain.org) |
| data source | Genbank + authorized other sequences |
| workflow | [github.com/nextstrain/rsv](https://github.com/nextstrain/rsv) |
| nextclade dataset path | nextstrain/rsv/a/EPI_ISL_412866 |
| reference | EPI_ISL_412866 |
| clade definitions | [github.com/rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A) |

## Scope of this dataset
This dataset for RSV-A uses reference sequence A/England/397/2017 with is available at under accession number EPI_ISL_412866 in GISAID. An almost identical sequence (slightly longer, 12 mutations, no gaps or indels) is available in NCBI as [LR699737](https://www.ncbi.nlm.nih.gov/nuccore/LR699737).
This sequence has the duplication in the G-protein shared by all currently circulating variants.
The reference tree covers the diversity of RSV-A since the first sequenced samples.


## Nomenclature
The dataset follows the consortium nomenclature established in 2023 that uses a combination of letters and numbers to designate lineages in a hierarchical fashion.
Definitions of individuals lineages are available on github in the repository [rsv-lineages/lineage-designation-A](https://github.com/rsv-lineages/lineage-designation-A).
Legacy clade definitions for the nomenclature defined by Goya et al (`G_clade`) are included for orientation. These clade definitions will not be updated and are incomplete. We encourage users to use the new consortium nomenclature.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##gff-version 3
##sequence-region EPI_ISL_412866 1 15225
EPI_ISL_412866 annotation remark 1 15225 . . . molecule_type=cRNA;organism=Human orthopneumovirus;taxonomy=Viruses,Riboviria,Orthornavirae,Negarnaviricota,Haploviricotina,Monjiviricetes,Mononegavirales,Pneumoviridae,Orthopneumovirus,Orthopneumovirus hominis
EPI_ISL_412866 feature source 1 15225 . . . mol_type=viral cRNA;organism=Human orthopneumovirus
EPI_ISL_412866 feature 5'UTR 1 15 . . . citation=%5B1%5D;function=Leader region 5%27UTR
EPI_ISL_412866 feature CDS 70 489 . . 0 codon_start=1;db_xref=GeneID:37607636;gene=NS1;gene_name=NS1;Name=NS1;product=nonstructural protein 1;protein_id=YP_009518850.1;translation=MGSNSLSMIKVRLQNLFDNDEVALLKITCYTDKLIQLTNALAKAVIHTIKLNGIVFVHVITSSDICPNNNIVVKSNFTTMPVLQNGGYIWEMMELTHCSQPNGLIDDNCEIKFSKKLSDSTMTNYMNQLSELLGFDLNP%2A
EPI_ISL_412866 feature CDS 599 973 . . 0 codon_start=1;db_xref=GeneID:37607637;gene=NS2;gene_name=NS2;Name=NS2;product=nonstructural protein 2;protein_id=YP_009518851.1;translation=MDTTHNDTTPQRLMITDMRPLSLETIITSLTRDIITHKFIYLINHECIVRKLDERQATFTFLVNYEMKLLHKVGSTKYKKYTEYNTKYGTFPMPIFINHDGFLECIGIKPTKHTPIIYKYDLNP%2A
EPI_ISL_412866 feature CDS 1111 2286 . . 0 codon_start=1;db_xref=GeneID:37607638;gene=N;gene_name=N;Name=N;product=nucleoprotein;protein_id=YP_009518852.1;translation=MALSKVKLNDTLNKDQLLSSSKYTIQRSTGDSIDTPNYDVQKHINKLCGMLLITEDANHKFTGLIGMLYAMSRLGREDTIKILKDAGYHVKANGVDVTTHRQDINGKEMKFEVLTLASLTTEIQINIEIESRKSYKKMLKEMGEVAPEYRHDSPDCGMIILCIAALVITKLAAGDRSGLTAVIRRANNVLKNEMKRYKGLLPKDIANSFYEVFEKYPHFIDVFVHFGIAQSSTRGGSRVEGIFAGLFMNAYGAGQVMLRWGVLAKSVKNIMLGHASVQAEMEQVVEVYEYAQKLGGEAGFYHILNNPKASLLSLTQFPHFSSVVLGNAAGLGIMGEYRGTPRNQDLYDAAKVYAEQLKENGVINYSVLDLTAEELEAIKHQLNPKDNDVEL%2A
EPI_ISL_412866 feature CDS 2318 3043 . . 0 codon_start=1;db_xref=GeneID:37607639;gene=P;gene_name=P;Name=P;product=phosphoprotein;protein_id=YP_009518853.1;translation=MEKFAPEFHGEDANNRATKFLESIKGKFTSPKDPKKKDSIISVNSIDIEVTKESLITSNSTIINPINETDDTVGNKPNYQRKPLVSFKEDPTPSDNPFSKLYKETIETFDNNEEESSYSYEEINDQTNDNITARLDRIDEKLSEILGMLHTLVVASAGPTSARDGIRDAMVGLREEMIEKIRTEALMTNDRLEAMARLRNEESEKMAKDTSDEVSLNPTSEKLNNLLEGNDSDNDLSLEDF%2A
EPI_ISL_412866 feature CDS 3226 3996 . . 0 codon_start=1;db_xref=GeneID:37607640;gene=M;gene_name=M;Name=M;product=matrix protein;protein_id=YP_009518854.1;translation=METYVNKLHEGSTYTAAVQYNVLEKDDDPASLTIWVPMFQSSMPADLLIKELANVNILVKQISTPKGPSLRVMINSRSAVLAQMPSKFTICANVSLDERSKLAYDVTTPCEIKACSLTCLKSKNMLTTVKDLTMKTLNPTHDIIALCEFENIVTSKKVIIPTYLRSISVRNKDLNTLENITTTEFKNAITNAKIIPYSGLLLVITVTDNKGAFKYIKPQSQFIVDLGAYLEKESIYYVTTNWKHTATRFAIKPMED%2A
EPI_ISL_412866 feature CDS 4266 4460 . . 0 codon_start=1;db_xref=GeneID:37607641;gene=SH;gene_name=SH;Name=SH;product=small hydrophobic protein;protein_id=YP_009518855.1;translation=MENTSITIEFSSKFWPYFTLIHMITTIISLIIIISIMIAILNKLCEYNVFHNKTFELPRARVNT%2A
EPI_ISL_412866 feature CDS 4652 5617 . . 0 codon_start=1;db_xref=GeneID:37607642;gene=G;gene_name=G;Name=G;product=attachment glycoprotein;protein_id=YP_009518856.1;translation=MSKTKDQRTAKTLERTWDTLNHLLFISSCLYKLNLKSIAQITLSILAMIISTSLIIAAIIFIASANHKVTPTTAIIQDATNQIKNTTPTHLTQNPQLGISLSNLSGTTSQSTTILASTTPSAESTPQSTTVKIINTTTTQILPSKPTTKQRQNKPQNKPNNDFHFEVFNFVPCSICSNNPTCWAICKRIPNKKPGKKTTTKPTKKPTLKTTKKDPKPQTTKPKGVLTTKPTGKPTINTTKTNSRTTLLTSNTKGNPEHTSQKETIHSTTSEGYPSPSQVYTTSDQEETLHSTTSEGYPSPSQVYTTSEYLSQSLSSSNTTK%2A
EPI_ISL_412866 feature CDS 5697 7421 . . 0 codon_start=1;db_xref=GeneID:37607643;gene=F;gene_name=F;Name=F;product=fusion glycoprotein;protein_id=YP_009518857.1;translation=MELPILKTNAITTILAAVTLCFASSQNITEEFYQSTCSAVSKGYLSALRTGWYTSVITIELSNIKENKCNGTDAKVKLIKQELDKYKNAVTELQLLMQSTPAANSRARRELPRFMNYTLNNTKNTNVTLSKKRKRRFLGFLLGVGSAIASGIAVSKVLHLEGEVNKIKSALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDKQLLPIVNKQSCSISNIETVIEFQQKNNRLLEITREFSVNAGVTTPVSTYMLTNSELLSLINDMPITNDQKKLMSSNVQIVRQQSYSIMSIIKEEVLAYVVQLPLYGVIDTPCWKLHTSPLCTTNTKEGSNICLTRTDRGWYCDNAGSVSFFPQAETCKVQSNRVFCDTMNSLTLPSEVNLCNIDIFNPKYDCKIMTSKTDVSSSVITSLGAIVSCYGKTKCTASNKNRGIIKTFSNGCDYVSNKGVDTVSVGNTLYYVNKQEGKSLYVKGEPIINFYDPLVFPSDEFDASISQVNEKINQSLAFIRKSDELLHNVNAGKSTTNIMITTIIIVIIVILLALIAVGLLLYCKARSTPVTLSKDQLSGINNIAFSN%2A
EPI_ISL_412866 feature CDS 7640 8224 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-1;Name=M2-1;note=ORF 1%2C matrix protein 2;product=M2-1 protein;protein_id=YP_009518858.1;translation=MSRRNPCKFEIRGHCLNGKRCHFSHNYFEWPPHALLVRQNFMLNRILKSMDKSIDTLSEISGAAELDRTEEYALGVVGVLESYIGSINNITKQSACVAMSKLLTELNSDDIKKLRDNEEPNSPKVRVYNTVISYIESNRKNNKQTIHLLKRLPADVLKKTIKNTLDIHKSITINNSKESTVSDTNDHAKNNDTT%2A
EPI_ISL_412866 feature CDS 8193 8465 . . 0 codon_start=1;db_xref=GeneID:37607644;gene=M2;gene_name=M2-2;Name=M2-2;note=ORF 2%2C RNA processivity factor;product=M2-2 protein;protein_id=YP_009518859.1;translation=TTMPKIMILPDKYPCSINSILITSNYRVTMYNQKNTLYINQNNQNSHIYPPDQPFNEIHWTSQDLIDATQNFLQHLGITDDIYTIYILVS%2A
EPI_ISL_412866 feature CDS 8532 15029 . . 0 codon_start=1;db_xref=GeneID:37607645;gene=L;gene_name=L;Name=L;note=RNA dependant RNA polymerase%3B RdRp;product=polymerase protein;protein_id=YP_009518860.1;translation=MDPIISGNSANVYLTDSYLKGVISFSECNALGSYIFNGPYLKNDYTNLISRQNPLIEHINLKKLNITQSLISKYHKGEIKIEEPTYFQSLLMTYKSMTSSEQTTTTNLLKKIIRRAIEISDVKVYAILNKLGLKEKDKIKSNNGQDEDNSVITTIIKDDILLAVKDNQSHPKADKNQSTKQKDTIKTTLLKKLMCSMQHPPSWLIHWFNLYTKLNSILTQYRSSEVKNHGFILIDNHTLSGFQFILNQYGCIVYHRELKRITVTTYNQFLTWKDISLSRLNVCLITWISNCLNTLNKSLGLRCGFNNVILTQLFLYGDCILKLFHNEGFYIIKEVEGFIMSLILNITEEDQFRKRFYNSMLNNITDAANKAQKNLLSRVCHTLLDKTISDNIINGRWIILLSKFLKLIKLAGDNNLNNLSELYFLFRIFGHPMVDERQAMDAVKVNCNETKFYLLSSLSMLRGAFIYRIIKGFVNNYNRWPTLRNAIVLPLRWLTYYKLNTYPSLLELTERDLIVLSGLRFYREFRLPKKVDLEMIINDKAISPPKNLIWTSFPRNYMPSHIQNYIEHEKLKFSDSDKSRRVLEYYLRDNKFNECDLYNCVVNQSYLNNPNHVVSLTGKERELSVGRMFAMQPGMFRQVQILAEKMIAENILQFFPESLTRYGDLELQKILELKAGISNKSNRYNDNYNNYISKCSIITDLSKFNQAFRYETSCICSDVLDELHGVQSLFSWLHLTIPHVTIICTYRHAPPYIKDHIVDLNNVDEQSGLYRYHMGGIEGWCQKLWTIEAISLLDLISLKGKFSITALINGDNQSIDISKPVRLMEGQTHAQADYLLALNSLKLLYKEYAGIGHKLKGTETYISRDMQFMSKTIQHNGVYYPASIKKVLRVGPWINTILDDFKVSLESIGSLTQELEYRGESLLCSLIFRNVWLYNQIALQLKNHALCNNKLYLDILKVLKHLKTFFNLDNIDTALTLYMNLPMLFGGGDPNLLYRSFYRRTPDFLTEAIVHSVFILSYYTNHDLKDKLQDLSDDRLNKFLTCIITFDKNPNAEFVTLMRDPQALGSERQAKITSEINRLAVTEVLSTAPNKIFSKSAQHYTTTEIDLNDIMQNIEPTYPHGLRVVYESLPFYKAEKIVNLISGTKSITNILEKTSAIDLTDIDRATEMMRKNITLLIRILPLDCNRDKREILSMENLSITELSKYVRERSWSLSNIVGVTSPSIMYTMDIKYTTSTIASGIIIEKYNVNSLTRGERGPTKPWVGSSTQEKKTMPVYNRQVLTKKQRDQIDLLAKLDWVYASIDNKDEFMEELSIGTLGLTYEKAKKLFPQYLSVNYLHRLTVSSRPCEFPASIPAYRTTNYHFDTSPINRILTEKYGDEDIDIVFQNCISFGLSLMSVVEQFTNVCPNRIILIPKLNEIHLMKPPIFTGDVDIHKLKLVIQKQHMFLPDKISLTQYVELFLSNKTLKSGSNVNSNLILAHKISDYFHNTYILSTNLAGHWILIIQLMKDSKGIFEKDWGEGYITDHMFINLKVFFNAYKTYLLCFHKGYGRAKLECDMNTSDLLCVLELIDSSYWKSMSKVFLEQKVIKYILSQDASLHRVKGCHSFKLWFLKRLNVAEFTVCPWVVNIDYHPTHMKAILTYIDLVRMGLINIDRIYIKNKHKFNDEFYTSNLFYINYNFSDNTHLLTKHIRIANSELESNYNKLYHPTPETLENILTNPVKNNEKKTLSGYCIGKNVDSIMLPSLSNKKLIKSSTMIRTNYSRQDLYNLFPTVVIDKIIDHSGNTAKSNQLYTTTSHQISLVHNSTSLYCMLPWHHINRFNFVFSSTGCKISIEYILKDLKIKDPNCIAFIGEGAGNLLLRTVVELHPDIRYIYRSLKDCNDHSLPIEFLRLYNGHINIDYGENLTIPATDATNNIHWSYLHIKFAEPISLFVCDAELPVTVNWSKIIIEWSKHVRKCKYCSSVNKCTLIVKYHAQDDIDFKLDNITILKTYVCLGSKLKGSEVYLVLTIGPANVFPVFNVVQNAKLILSRTKNFIMPKKADKESIDANIKSLIPFLCYPITKKGINTALSKLKSVVSGDILSYSIAGRNEVFSNKLINHKHMNILKWFNHVLNFRSTELNYNHLYMVESTYPHLSELLNSLTTNELKKLIKITGSLLYNFYNE%2A
109 changes: 109 additions & 0 deletions data_output/nextstrain/rsv/a/EPI_ISL_412866/unreleased/pathogen.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
{
"schemaVersion": "3.0.0",
"alignmentParams": {
"excessBandwidth": 9,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "left",
"minSeedCover": 0.1
},
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
},
"defaultCds": "F",
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"qc": {
"privateMutations": {
"enabled": true,
"typical": 50,
"cutoff": 150,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"missingData": {
"enabled": false,
"missingDataThreshold": 2000,
"scoreBias": 500
},
"snpClusters": {
"enabled": false,
"windowSize": 100,
"clusterCutOff": 10,
"scoreWeight": 50
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 8
},
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true,
"ignoredStopCodons": [
{
"codon": 320,
"cdsName": "G"
}
]
}
},
"cdsOrderPreference": [
"F",
"G",
"L"
],
"maintenance": {
"website": [
"https://nextstrain.org",
"https://clades.nextstrain.org"
],
"documentation": [
"https://github.com/nextstrain/rsv"
],
"source code": [
"https://github.com/nextstrain/rsv"
],
"issues": [
"https://github.com/nextstrain/rsv/issues"
],
"organizations": [
"Nextstrain"
],
"authors": [
"Nextstrain team <https://nextstrain.org>"
]
},
"shortcuts": [
"rsv_a",
"nextstrain/rsv/a",
"nextstrain/rsv/a/hRSV-A-England-397-2017"
],
"attributes": {
"name": "RSV-A",
"reference accession": "EPI_ISL_412866",
"reference name": "hRSV/A/England/397/2017"
},
"geneOrderPreference": [
"F",
"G",
"L"
],
"version": {
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
}
}
Loading

0 comments on commit 6e21f72

Please sign in to comment.