Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flu: update IAV HA datasets after promotion of subclades #240

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions data/nextstrain/flu/h1n1pdm/ha/CY121680/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
## Unreleased

This release adds the subclade D.5 as specified in the (influenza-clade-nomenclature repository)[https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/blob/main/CHANGELOG.md#2024-11-12].

## 2024-11-05T09:19:52Z

- update reference trees
- include subclade proposals


## 2024-07-03T08:29:55Z

- add representative samples from early pandemic-era clades including 1, 2, 3, 4, 6C, 7, and 8 to improve clade label annotations for older sequences
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/flu/h1n1pdm/ha/CY121680/tree.json

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion data/nextstrain/flu/h1n1pdm/ha/MW626062/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
## Unreleased

This release adds the subclade D.5 as specified in the (influenza-clade-nomenclature repository)[https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/blob/main/CHANGELOG.md#2024-11-12].

## 2024-11-05T09:19:52Z

- update reference trees
- include subclade proposals

## 2024-07-03T08:29:55Z

## 2024-07-03T08:29:55Z

Added configuration of current and recent vaccine strains as 'reference nodes' on the reference tree, against which query sequences can be compared. This feature is in addition to the new 'compare to clade founder' feature, allowing to compare each query sequence to the most ancestral node of a clade or lineage.

Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/flu/h1n1pdm/ha/MW626062/tree.json

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion data/nextstrain/flu/h3n2/ha/CY163680/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
## 2024-11-05T09:19:52Z
## Unreleased
This release adds the subclades J.1.1, J.2.1, and J.2.2 as specified in the (influenza-clade-nomenclature repository)[https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/blob/main/CHANGELOG.md#2024-11-12].

## 2024-11-05T09:19:52Z
- update reference trees
- include subclade proposals

Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/flu/h3n2/ha/CY163680/tree.json

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion data/nextstrain/flu/h3n2/ha/EPI1857216/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
## 2024-11-05T09:19:52Z
## Unreleased
This release adds the subclades J.1.1, J.2.1, and J.2.2 as specified in the (influenza-clade-nomenclature repository)[https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/blob/main/CHANGELOG.md#2024-11-12].

## 2024-11-05T09:19:52Z
- update reference trees
- include subclade proposals


## 2024-08-08T05:08:21Z

Fix numbering of RBD sites it the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/flu/h3n2/ha/EPI1857216/tree.json

Large diffs are not rendered by default.

48 changes: 36 additions & 12 deletions data_output/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -804,7 +804,7 @@
"clades": 25,
"customClades": {
"short-clade": 16,
"subclade": 21,
"subclade": 22,
"proposedSubclade": 22
},
"qc": [
Expand All @@ -818,6 +818,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
Expand Down Expand Up @@ -852,8 +859,7 @@
}
],
"version": {
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -909,7 +915,7 @@
"clades": 25,
"customClades": {
"short-clade": 16,
"subclade": 21,
"subclade": 22,
"proposedSubclade": 22
},
"qc": [
Expand All @@ -923,6 +929,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
Expand Down Expand Up @@ -957,8 +970,7 @@
}
],
"version": {
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -1113,7 +1125,7 @@
"capabilities": {
"clades": 37,
"customClades": {
"subclade": 42,
"subclade": 45,
"short-clade": 37,
"proposedSubclade": 45
},
Expand All @@ -1128,6 +1140,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
Expand Down Expand Up @@ -1170,8 +1189,7 @@
}
],
"version": {
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -1226,7 +1244,7 @@
"capabilities": {
"clades": 34,
"customClades": {
"subclade": 39,
"subclade": 42,
"short-clade": 34,
"proposedSubclade": 42
},
Expand All @@ -1242,6 +1260,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
Expand Down Expand Up @@ -1292,8 +1317,7 @@
}
],
"version": {
"updatedAt": "2024-11-05T09:19:52Z",
"tag": "2024-11-05--09-19-52Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Unreleased

This release adds the subclade D.5 as specified in the (influenza-clade-nomenclature repository)[https://github.com/influenza-clade-nomenclature/seasonal_A-H3N2_HA/blob/main/CHANGELOG.md#2024-11-12].

## 2024-11-05T09:19:52Z

- update reference trees
- include subclade proposals


## 2024-07-03T08:29:55Z

- add representative samples from early pandemic-era clades including 1, 2, 3, 4, 6C, 7, and 8 to improve clade label annotations for older sequences

- added configuration of current and recent vaccine strains as 'reference nodes' on the reference tree, against which query sequences can be compared. This feature is in addition to the new 'compare to clade founder' feature, allowing to compare each query sequence to the most ancestral node of a clade or lineage. See Nextclade documentation for more details about 'relative mutations' functionality.

## 2024-04-19T07:50:39Z

- aliasing of C.1.1.1 as D
- addition of subclades D.1 - D.4: [D.1](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/D.1.yml), [D.2](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/D.2.yml), [D.3](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/D.3.yml), [D.4](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/D.4.yml)
- addition of subclades [C.1.8](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/C.1.8.yml) and [C.1.9](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/C.1.9.yml)
- addition of subclades [C.1.7.1](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/C.1.7.1.yml) and [C.1.7.2](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/C.1.7.2.yml)


## 2024-01-16T20:31:02Z

Initial release for Nextclade v3!

- addition of subclade [C.1.7](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/blob/main/subclades/C.1.7.yml)

Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Influenza A(H1N1pdm) HA based on reference "A/California/07/2009"

| Key | Value |
| -------------------- | -------------------- |
| authors | [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
| name | Influenza A(H1N1pdm) HA |
| reference | A/California/07/2009 |
| dataset path | flu/h1n1pdm/ha/CY121680 |
| reference accession | CY121680 |
| clade definitions | [github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/) |


## Scope of this dataset
This dataset uses an older reference sequence (A/California/07/2009) and recent sequences will differ at a large number of positions from this reference.
For the analysis of currently circulating viruses, the dataset using A/Wisconsin/588/2019 as reference might be more appropriate.

## Features
This dataset supports

* Assignment to clades and subclades based on the nomenclature defined in [github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/)
* Identification of glycosilation motifs
* Sequence QC
* Phylogenetic placement

## Clades of seasonal influenza viruses

The WHO Collaborating centers define "clades" as genetic groups of viruses with signature mutations to facilitate discussion of circulating diversity of the viruses.
Clade demarcation do not always coincide with significantly different antigenic properties of the viruses.
Clade names are structured as _Number-Letter_ binomials separated by periods as in `6B.1A.5a.2a.1`. These sometimes get shortened by omission of leading binomials like `5a.2a.1`.

In addition to these clades, "subclades" are defined to break down diversity at higher resolution and allow following the spread of different viral groups.
These follow a Pango-like nomenclature consisting of a letter followed by a numbers separated by periods as in `C.1.2`.
The leading letter is an alias of a previous name.
Details of the nomenclature system can be found at [github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/](https://github.com/influenza-clade-nomenclature/seasonal_A-H1N1pdm_HA/).



## What is Nextclade dataset

Read more about Nextclade datasets in Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
##gff-version 3
##sequence-region CY121680.1 1 1752
CY121680.1 feature gene 21 71 . + . gene_name="SigPep"
CY121680.1 feature gene 72 1052 . + . gene_name="HA1"
CY121680.1 feature gene 1053 1718 . + . gene_name="HA2"
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
{
"schemaVersion": "3.0.0",
"alignmentParams": {
"excessBandwidth": 9,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
"minSeedCover": 0.1
},
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
},
"defaultCds": "HA1",
"files": {
"changelog": "CHANGELOG.md",
"examples": "sequences.fasta",
"genomeAnnotation": "genome_annotation.gff3",
"pathogenJson": "pathogen.json",
"readme": "README.md",
"reference": "reference.fasta",
"treeJson": "tree.json"
},
"qc": {
"privateMutations": {
"enabled": true,
"typical": 5,
"cutoff": 15,
"weightLabeledSubstitutions": 2,
"weightReversionSubstitutions": 1,
"weightUnlabeledSubstitutions": 1
},
"missingData": {
"enabled": false,
"missingDataThreshold": 100,
"scoreBias": 10
},
"snpClusters": {
"enabled": false,
"windowSize": 100,
"clusterCutOff": 5,
"scoreWeight": 50
},
"mixedSites": {
"enabled": true,
"mixedSitesThreshold": 4
},
"frameShifts": {
"enabled": true
},
"stopCodons": {
"enabled": true,
"ignoredStopCodons": []
}
},
"cdsOrderPreference": [
"HA1",
"HA2"
],
"maintenance": {
"website": [
"https://nextstrain.org",
"https://clades.nextstrain.org"
],
"documentation": [
"https://github.com/nextstrain/seasonal-flu"
],
"source code": [
"https://github.com/nextstrain/seasonal_flu"
],
"issues": [
"https://github.com/nextstrain/seasonal_flu/issues"
],
"organizations": [
"Nextstrain"
],
"authors": [
"Nextstrain team <https://nextstrain.org>"
]
},
"nucMutLabelMap": {},
"nucMutLabelMapReverse": {},
"shortcuts": [
"flu_h1n1pdm_ha_broad",
"nextstrain/flu/h1n1pdm/ha/california-7-2009"
],
"aaMotifs": [
{
"name": "glycosylation",
"nameShort": "Glyc.",
"nameFriendly": "Glycosylation",
"description": "N-linked glycosylation motifs (N-X-S/T with X any amino acid other than P)",
"includeCdses": [
{
"cds": "HA1",
"ranges": []
},
{
"cds": "HA2",
"ranges": [
{
"begin": 0,
"end": 186
}
]
}
],
"motifs": [
"N[^P][ST]"
]
}
],
"attributes": {
"name": "Influenza A H1N1pdm HA",
"segment": "ha",
"reference accession": "CY121680",
"reference name": "A/California/7/2009-egg"
},
"version": {
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>CY121680.1 Influenza A virus (A/California/07/2009(H1N1)) hemagglutinin (HA) gene, complete cds
GGAAAACAAAAGCAACAAAAATGAAGGCAATACTAGTAGTTCTGCTATATACATTTGCAACCGCAAATGCAGACACATTATGTATAGGTTATCATGCGAACAATTCAACAGACACTGTAGACACAGTACTAGAAAAGAATGTAACAGTAACACACTCTGTTAACCTTCTAGAAGACAAGCATAACGGGAAACTATGCAAACTAAGAGGGGTAGCCCCATTGCATTTGGGTAAATGTAACATTGCTGGCTGGATCCTGGGAAATCCAGAGTGTGAATCACTCTCCACAGCAAGCTCATGGTCCTACATTGTGGAAACACCTAGTTCAGACAATGGAACGTGTTACCCAGGAGATTTCATCGATTATGAGGAGCTAAGAGAGCAATTGAGCTCAGTGTCATCATTTGAAAGGTTTGAGATATTCCCCAAGACAAGTTCATGGCCCAATCATGACTCGAACAAAGGTGTAACGGCAGCATGTCCTCATGCTGGAGCAAAAAGCTTCTACAAAAATTTAATATGGCTAGTTAAAAAAGGAAATTCATACCCAAAGCTCAGCAAATCCTACATTAATGATAAAGGGAAAGAAGTCCTCGTGCTATGGGGCATTCACCATCCATCTACTAGTGCTGACCAACAAAGTCTCTATCAGAATGCAGATGCATATGTTTTTGTGGGGTCATCAAGATACAGCAAGAAGTTCAAGCCGGAAATAGCAATAAGACCCAAAGTGAGGGATCGAGAAGGGAGAATGAACTATTACTGGACACTAGTAGAGCCGGGAGACAAAATAACATTCGAAGCAACTGGAAATCTAGTGGTACCGAGATATGCATTCGCAATGGAAAGAAATGCTGGATCTGGTATTATCATTTCAGATACACCAGTCCACGATTGCAATACAACTTGTCAAACACCCAAGGGTGCTATAAACACCAGCCTCCCATTTCAGAATATACATCCGATCACAATTGGAAAATGTCCAAAATATGTAAAAAGCACAAAATTGAGACTGGCCACAGGATTGAGGAATATCCCGTCTATTCAATCTAGAGGCCTATTTGGGGCCATTGCCGGTTTCATTGAAGGGGGGTGGACAGGGATGGTAGATGGATGGTACGGTTATCACCATCAAAATGAGCAGGGGTCAGGATATGCAGCCGACCTGAAGAGCACACAGAATGCCATTGACGAGATTACTAACAAAGTAAATTCTGTTATTGAAAAGATGAATACACAGTTCACAGCAGTAGGTAAAGAGTTCAACCACCTGGAAAAAAGAATAGAGAATTTAAATAAAAAAGTTGATGATGGTTTCCTGGACATTTGGACTTACAATGCCGAACTGTTGGTTCTATTGGAAAATGAAAGAACTTTGGACTACCACGATTCAAATGTGAAGAACTTATATGAAAAGGTAAGAAGCCAGCTAAAAAACAATGCCAAGGAAATTGGAAACGGCTGCTTTGAATTTTACCACAAATGCGATAACACGTGCATGGAAAGTGTCAAAAATGGGACTTATGACTACCCAAAATACTCAGAGGAAGCAAAATTAAACAGAGAAGAAATAGATGGGGTAAAGCTGGAATCAACAAGGATTTACCAGATTTTGGCGATCTATTCAACTGTCGCCAGTTCATTGGTACTGGTAGTCTCCCTGGGGGCAATCAGTTTCTGGATGTGCTCTAATGGGTCTCTACAGTGTAGAATATGTATTTAACATTAGGATTTCAGAAGCATGAGAAAAACAC
Loading