Skip to content

Commit

Permalink
Merge pull request #193 from nextstrain/sc2-bandwidth
Browse files Browse the repository at this point in the history
Increase excessBandwidth for SC2 datasets
  • Loading branch information
corneliusroemer authored Apr 24, 2024
2 parents 30b4ee2 + aa57e47 commit 024d977
Show file tree
Hide file tree
Showing 51 changed files with 522,102 additions and 15 deletions.
4 changes: 4 additions & 0 deletions data/nextstrain/sars-cov-2/BA.2.86/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## Unreleased

- fix: Increase Nextclade alignment parameter "excessBandwidth" from 9 to 12 to correctly align complex series of indels that has arisen in Spike NTD with the occurrence of S:31- in some JN.1. The bandwidth is chosen to be as small as possible (to ensure fast runtime) but as large as necessary for correct alignment. As a result, occasional adjustments like here are required. Nothing else is changed in this SARS-CoV-2 dataset update.

## 2024-04-15T15:08:22Z

- Nextstrain clades 24A (JN.1) and 24B (JN.1.11.1) are now included.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/sars-cov-2/BA.2.86/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"alignmentParams": {
"excessBandwidth": 9,
"excessBandwidth": 12,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
Expand Down
4 changes: 4 additions & 0 deletions data/nextstrain/sars-cov-2/BA.2/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## Unreleased

- fix: Increase Nextclade alignment parameter "excessBandwidth" from 9 to 12 to correctly align complex series of indels that has arisen in Spike NTD with the occurrence of S:31- in some JN.1. The bandwidth is chosen to be as small as possible (to ensure fast runtime) but as large as necessary for correct alignment. As a result, occasional adjustments like here are required. Nothing else is changed in this SARS-CoV-2 dataset update.

## 2024-04-15T15:08:22Z

- Nextstrain clades 24A (JN.1) and 24B (JN.1.11.1) are now included.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/sars-cov-2/BA.2/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"alignmentParams": {
"excessBandwidth": 9,
"excessBandwidth": 12,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
Expand Down
4 changes: 4 additions & 0 deletions data/nextstrain/sars-cov-2/XBB/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## Unreleased

- fix: Increase Nextclade alignment parameter "excessBandwidth" from 9 to 12 to correctly align complex series of indels that has arisen in Spike NTD with the occurrence of S:31- in some JN.1. The bandwidth is chosen to be as small as possible (to ensure fast runtime) but as large as necessary for correct alignment. As a result, occasional adjustments like here are required. Nothing else is changed in this SARS-CoV-2 dataset update.

## 2024-04-15T15:08:22Z

- Nextstrain clades 24A (JN.1) and 24B (JN.1.11.1) are now included.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/sars-cov-2/XBB/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"alignmentParams": {
"excessBandwidth": 9,
"excessBandwidth": 12,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
Expand Down
4 changes: 4 additions & 0 deletions data/nextstrain/sars-cov-2/wuhan-hu-1/orfs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## Unreleased

- fix: Increase Nextclade alignment parameter "excessBandwidth" from 9 to 12 to correctly align complex series of indels that has arisen in Spike NTD with the occurrence of S:31- in some JN.1. The bandwidth is chosen to be as small as possible (to ensure fast runtime) but as large as necessary for correct alignment. As a result, occasional adjustments like here are required. Nothing else is changed in this SARS-CoV-2 dataset update.

## 2024-04-15T15:08:22Z

- Nextstrain clades 24A (JN.1) and 24B (JN.1.11.1) are now included.
Expand Down
2 changes: 1 addition & 1 deletion data/nextstrain/sars-cov-2/wuhan-hu-1/orfs/pathogen.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"alignmentParams": {
"excessBandwidth": 9,
"excessBandwidth": 12,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
Expand Down
4 changes: 4 additions & 0 deletions data/nextstrain/sars-cov-2/wuhan-hu-1/proteins/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## Unreleased

- fix: Increase Nextclade alignment parameter "excessBandwidth" from 9 to 12 to correctly align complex series of indels that has arisen in Spike NTD with the occurrence of S:31- in some JN.1. The bandwidth is chosen to be as small as possible (to ensure fast runtime) but as large as necessary for correct alignment. As a result, occasional adjustments like here are required. Nothing else is changed in this SARS-CoV-2 dataset update.

## 2024-04-15T15:08:22Z

- Nextstrain clades 24A (JN.1) and 24B (JN.1.11.1) are now included.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"alignmentParams": {
"excessBandwidth": 9,
"excessBandwidth": 12,
"terminalBandwidth": 100,
"allowedMismatches": 4,
"gapAlignmentSide": "right",
Expand Down
50 changes: 40 additions & 10 deletions data_output/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
Expand All @@ -110,8 +117,7 @@
}
],
"version": {
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -173,6 +179,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
Expand All @@ -199,8 +212,7 @@
}
],
"version": {
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -265,6 +277,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
Expand All @@ -291,8 +310,7 @@
}
],
"version": {
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -354,6 +372,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
Expand All @@ -380,8 +405,7 @@
}
],
"version": {
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down Expand Up @@ -443,6 +467,13 @@
]
},
"versions": [
{
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
}
},
{
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
Expand All @@ -469,8 +500,7 @@
}
],
"version": {
"updatedAt": "2024-04-15T15:08:22Z",
"tag": "2024-04-15--15-08-22Z",
"tag": "unreleased",
"compatibility": {
"cli": "3.0.0-alpha.0",
"web": "3.0.0-alpha.0"
Expand Down
148 changes: 148 additions & 0 deletions data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
## Unreleased

- fix: Increase Nextclade alignment parameter "excessBandwidth" from 9 to 12 to correctly align complex series of indels that has arisen in Spike NTD with the occurrence of S:31- in some JN.1. The bandwidth is chosen to be as small as possible (to ensure fast runtime) but as large as necessary for correct alignment. As a result, occasional adjustments like here are required. Nothing else is changed in this SARS-CoV-2 dataset update.

## 2024-04-15T15:08:22Z

- Nextstrain clades 24A (JN.1) and 24B (JN.1.11.1) are now included.
- All 83 Pango lineages designated between 2024-02-13 and 2024-04-12 are now included, unfold below to see a list of all newly included lineages with their designation dates:

<details>
<summary> Newly included lineages, with designation date in parentheses</summary>

- JN.1.23 (2024-02-20)
- KP.1 (2024-02-22)
- JN.1.7.1 (2024-03-01)
- JN.1.7.2 (2024-03-01)
- KQ.1 (2024-03-01)
- JN.1.13.1 (2024-03-01)
- JN.1.1.5 (2024-03-01)
- KR.1 (2024-03-04)
- KP.1.1 (2024-03-04)
- KP.2 (2024-03-04)
- JN.1.24 (2024-03-04)
- XDK.1 (2024-03-04)
- JN.1.25 (2024-03-04)
- JN.1.25.1 (2024-03-04)
- XDQ.1 (2024-03-04)
- JQ.2 (2024-03-05)
- JN.13 (2024-03-06)
- JN.13.1 (2024-03-06)
- JN.1.1.6 (2024-03-09)
- JN.1.26 (2024-03-09)
- JN.1.27 (2024-03-09)
- JN.1.4.4 (2024-03-09)
- JN.1.8.2 (2024-03-09)
- JN.1.28 (2024-03-09)
- JN.1.1.7 (2024-03-09)
- JN.1.29 (2024-03-09)
- JN.1.4.5 (2024-03-14)
- JN.1.18.1 (2024-03-14)
- JN.1.30 (2024-03-14)
- JN.1.31 (2024-03-14)
- JN.1.16.1 (2024-03-14)
- KS.1 (2024-03-14)
- JN.1.4.6 (2024-03-14)
- JN.1.32 (2024-03-14)
- GE.1.2.2 (2024-03-17)
- KT.1 (2024-03-17)
- KT.1.1 (2024-03-17)
- KT.1.2 (2024-03-17)
- KP.3 (2024-03-17)
- JN.1.33 (2024-03-17)
- JN.1.34 (2024-03-17)
- JN.1.35 (2024-03-17)
- JN.1.36 (2024-03-17)
- KP.2.1 (2024-03-19)
- KP.2.2 (2024-03-19)
- JN.1.30.1 (2024-03-19)
- JQ.2.1 (2024-03-20)
- XDD.1.1.1 (2024-03-25)
- KU.1 (2024-03-25)
- KU.2 (2024-03-25)
- JN.1.18.2 (2024-03-25)
- JN.1.36.1 (2024-03-25)
- KV.1 (2024-03-25)
- JN.1.37 (2024-03-25)
- JN.1.28.1 (2024-03-25)
- KW.1 (2024-03-25)
- JN.1.38 (2024-03-25)
- JN.1.39 (2024-03-25)
- JN.1.40 (2024-03-25)
- JN.1.41 (2024-03-25)
- JN.1.42 (2024-03-25)
- JN.1.43 (2024-03-25)
- JN.1.43.1 (2024-03-25)
- KP.1.1.1 (2024-03-25)
- JN.1.44 (2024-03-25)
- XDU (2024-03-26)
- XDP.1 (2024-03-26)
- JN.2.2.1 (2024-03-26)
- JN.1.8.3 (2024-03-29)
- KY.1 (2024-03-29)
- JN.14 (2024-03-29)
- JN.1.45 (2024-03-29)
- XDV (2024-04-02)
- XDV.1 (2024-04-02)
- JN.1.42.1 (2024-04-02)
- JN.1.46 (2024-04-02)
- JN.1.4.7 (2024-04-02)
- KV.2 (2024-04-02)
- KW.1.1 (2024-04-03)
- KZ.1 (2024-04-04)
- KZ.1.1 (2024-04-04)
- KZ.1.1.1 (2024-04-04)
- JN.1.7.3 (2024-04-04)

</details>

## 2024-02-16T04:00:32Z

- All 35 Pango lineages designated between 2024-01-16 and 2024-02-12 are now included, unfold below to see a list of all newly included lineages with their designation dates:

<details>
<summary> Newly included lineages, with designation date in parentheses</summary>

- XDN (2024-01-16)
- XDP (2024-01-16)
- JN.3.2 (2024-01-16)
- JN.3.2.1 (2024-01-16)
- JN.11 (2024-01-21)
- JN.1.12 (2024-01-21)
- JN.12 (2024-01-22)
- JN.1.11.1 (2024-01-22)
- KM.1 (2024-01-31)
- BA.2.87 (2024-02-01)
- BA.2.87.1 (2024-02-01)
- XDQ (2024-02-01)
- GK.1.10.1 (2024-02-01)
- XDR (2024-02-01)
- JN.1.13 (2024-02-01)
- JN.1.14 (2024-02-01)
- JN.1.15 (2024-02-01)
- JN.1.4.1 (2024-02-01)
- JN.1.2.1 (2024-02-01)
- JN.1.16 (2024-02-01)
- JN.1.17 (2024-02-01)
- JN.1.4.2 (2024-02-02)
- JN.1.18 (2024-02-04)
- JN.1.19 (2024-02-04)
- JN.1.20 (2024-02-04)
- XDS (2024-02-04)
- JN.1.9.1 (2024-02-04)
- JN.1.21 (2024-02-04)
- JN.1.22 (2024-02-04)
- JN.1.4.3 (2024-02-12)
- XDT (2024-02-12)
- HK.13.2.1 (2024-02-12)
- KN.1 (2024-02-12)
- KN.1.1 (2024-02-12)
- JN.1.1.4 (2024-02-12)

</details>

## 2024-01-16T20:31:02Z

Initial release of this v3 dataset.

This dataset is new to v3 and does not have a v2 equivalent.
49 changes: 49 additions & 0 deletions data_output/nextstrain/sars-cov-2/BA.2.86/unreleased/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# SARS-CoV-2 dataset with mutations relative to BA.2.86

| Key | Value |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| authors | [Cornelius Roemer](https://neherlab.org), [Richard Neher](https://neherlab.org), [Nextstrain](https://nextstrain.org) |
| reference | `Wuhan-Hu-1/2019` with BA.2.86 SNPs added |
| workflow | https://github.com/neherlab/nextclade_data_workflows/tree/v3-sc2/sars-cov-2 |
| path | `nextstrain/sars-cov-2/BA.2.86` |
| clade definitions | [Nextstrain clades](https://nextstrain.org/blog/2022-04-29-SARS-CoV-2-clade-naming-2022) and [Pango lineages](https://www.nature.com/articles/s41564-020-0770-5) |

## Scope of this dataset

This dataset shows mutations relative to the prototypical BA.2.86 sequence and is particularly useful for the analysis of SARS-CoV-2 sequences that are descended from BA.2.86.

For the analysis of non-BA.2.86 sequences, other Nextclade datasets for SARS-CoV-2 may be more appropriate. In addition, the `wuhan-hu-1/proteins` dataset shows amino acid mutations in coordinates of mature proteins (nsp1-16) instead of ORF1a and ORF1b coordinates.

## Reference sequence and reference tree

The reference sequence in this dataset is `Wuhan-Hu-1/2019` but with BA.2.86 SNPs added. SNPs (but not indels) are thus shown with respect to BA.2.86 while the mutation positions remain within the familiar `Wuhan-Hu-1` coordinate system.

The reference tree contains one sequence for each Pango lineage descended from BA.2 (including recombinants such as XBB) and is rooted on BA.2.

## Features

This dataset supports:

- Assignment of Nextstrain clades
- Assignment of Pango lineages
- Sequence QC
- Phylogenetic placement
- Calculation of ACE2 binding scores relative to BA.2.86 as described in [Starr et al. 2022](https://doi.org/10.1371/journal.ppat.1010951)

## Nextstrain clades

Since its emergence in late 2019, SARS-CoV-2 has evolved into many co-circulating variants. To facilitate discussion of these variants, we have grouped them into clades which are defined by specific signature mutations.

Nextstrain clade names consist of two numbers representing a year and then a single letter representing the clade within that year. For example, 21A is the first clade we named in 2021.

We define each clade by a combination of signature mutations. You can find the exact clade definition on Github in this [file](https://github.com/nextstrain/ncov/blob/master/defaults/clades.tsv).

Below is an illustration of the phylogenetic relationships of Nextstrain clades ([source](https://github.com/nextstrain/ncov-clades-schema/)):

![Illustration of phylogenetic relationships of SARS-CoV-2 clades, as defined by Nextstrain](https://raw.githubusercontent.com/nextstrain/ncov-clades-schema/master/clades.svg)

Learn more about how Nextclade assigns clades in the [documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/user/algorithm/).

## What are Nextclade datasets

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Binary file not shown.
Loading

0 comments on commit 024d977

Please sign in to comment.