Skip to content

Releases: nextstrain/nextclade_data

2022-03-24

24 Mar 23:35
Compare
Choose a tag to compare

New dataset version (tag 2022-03-24T12:00:00Z)

SARS-CoV-2

  • Recombinants: Recombinant Pango lineages are now included in the reference tree. Each recombinant is attached to the root node so as not to spawn false internal nodes in the tree that would attract bad sequences. As long as recombinants do not qualify for a Nextstrain clade, they will receive the place holder clade name recombinant. Pango lineages are provided if present. Beware that new unnamed recombinants with similar donors but slightly different breakpoint will attach to existing recombinants in the reference tree and thus get a wrong Pango lineage. A number of reversions and labeled mutations is a sign that you may have a similar but different recombinant.
  • Pango lineages: In this release, Nextclade can assign Pango lineages up to pango-designation release v1.2.133, featuring Omicron recombinants like XD, XE and XF.
  • QC: qc.json was updated with the most common stop codons and frameshifts that appear to be real and not artefacts (in ORFs 3a, 6, 7a, 7b,8, 9b)
  • QC: virus_properties.json was updated and now contains more mutations that are common in 21K which should help identifying recombinants

SARS-CoV-2 without recombinants

  • New dataset: Now that recombinants are included in the default SARS-CoV-2 tree, it is no longer easy to identify breakpoints and donors of new recombinants if they attach to existing recombinants on the tree. To facilitate the analysis of new potential recombinants, we have added a new dataset named "SARS-CoV-2 without recombinants" that does not include recombinants and can thus be used for recombinant analysis as before the inclusion of recombinants. This dataset should only be used for recombinant analysis, it will receive less attention than the main (default) SARS-CoV-2 dataset.
  • Pango lineages: In this release, Nextclade can assign Pango lineages up to pango-designation release v1.2.133, except recombinants (lineages starting with X).

2022-03-14

24 Mar 23:34
Compare
Choose a tag to compare

New dataset version (tag 2022-03-14T12:00:00Z)

SARS-CoV-2

  • Pango lineages: Nextclade now assigns sequences a pango lineage, similar to how clades are assigned. Output is visible in both web and tsv/csv output (column Nextclade_pango). The classifier is about 98% accurate for sequences from the past 12 months. Older lineages are deprioritised, and accuracy is thus worse. Read more about the method and validation against pangoLEARN and UShER in this report: Nextclade as pango lineage classifier: Methods and Validation.
  • Pango lineages: In this release, Nextclade can assign Pango lineages up to pango-designation release v1.2.132, featuring lineages like BA.2.3, BA.1.17 and BA.1.1.16.
  • Reference tree: Every pango lineage that's sampled in gets a synthetic sequence that is chosen to represent a hypothetical common ancestor of the lineage, according to the sequences listed as members in the pango-designation repo.

2022-02-07

07 Feb 13:08
Compare
Choose a tag to compare

2022-02-07

New dataset version (tag 2022-02-07T12:00:00Z)

SARS-CoV-2

  • Reference tree: Updated with new data. New algorithm for choosing how many of each pango lineage to include improves coverage of common and recent lineages. Every pango lineage that's included gets one relatively basal (early) sequence to keep number of false positive reversions down.

2022-01-18

24 Jan 21:32
Compare
Choose a tag to compare

New dataset version (tag 2022-01-18T12:00:00Z)

  • Backwards incompatibility(!): New datasets no longer work for Nextclade versions before 1.10.0, to use new datasets, you must update

SARS-CoV-2

  • Files: added virus_properties.json containing common mutations per clade
  • QC: higher penalty for private mutations that are reversions or common in other clades

Influenza

  • Files: Stub virus_properties.json added to be compatible with new Nextclade version 1.10.0

2022-01-05

06 Jan 15:18
Compare
Choose a tag to compare

2022-01-05

SARS-CoV-2

New dataset version (tag 2022-01-05T19:54:31Z)

  • Reference tree: Added more Omicron sequences, from all of BA.1/BA.2/BA.3
  • Reference tree: General data update with new pango lineages
  • Sample sequences: Added BA.2 and BA.3 to sample sequences

2021-12-16

17 Dec 15:49
09629c7
Compare
Choose a tag to compare

Influenza

New dataset version (tag 2021-12-16T20:15:53Z)

  • Clades: New WHO clades names are used
  • Reference tree: Data source is now GISAID which means better global coverage

SARS-CoV-2

New dataset version (tag 2021-12-16T20:57:35Z)

  • Clades: 21M (Omicron) added as Omicron catch all equivalent to pango B.1.1.529
  • Clades: 21L elevated to 21L (Omicron) in line with WHO practice
  • QC: Fixed known frameshift ORF7b:3 (was erroneously ORF7a:3)

2021-12-09

09 Dec 21:00
Compare
Choose a tag to compare

SARS-CoV-2

New dataset version (tag 2021-12-09T18:09:18Z)

  • Clades: Omicron is split into 21K (Omicron) (pango BA.1) and 21L (pango BA.2). The minor clade 21L is at this point not called Omicron by WHO so it does not get the Omicron label for now.
  • Reference tree: Data has been updated to early December
  • Pango lineages designated until early December have been sampled in

2021-12-03--00-14-37--UTC

03 Dec 00:31
Compare
Choose a tag to compare

General

  • Added explicit cache-control headers

SARS-CoV-2

  • Sample sequences: Added two 21K (Omicron) sequences

2021-11-27

27 Nov 12:09
bf9a021
Compare
Choose a tag to compare

2021-11-27

SARS-CoV-2

New dataset version (tag 2021-11-27T11:53:22Z)

Changes
  • Clades: 21K is renamed 21K (Omicron) in line with WHO elevation to VOC status

2021-11-26

27 Nov 12:06
Compare
Choose a tag to compare

2021-11-26

SARS-CoV-2

New dataset version (tag 2021-11-26T14:02:45Z)

Changes
  • Data source: GISAID data is now used to generate the reference tree. This switch is necessary, because the new clade 21K (B.1.1.529) is only present in GISAID data, thus far.
Updates
  • New clade: 21K (B.1.1.529) has been added to the reference tree
  • Reference tree: Data has been updated to sequences submitted to GISAID by 2021-11-24
  • Reference tree: Pango lineages designated until 2021-11-24 have been sampled into the tree