Skip to content

Commit

Permalink
Merge pull request #202 from nextstrain/add-measles-dataset
Browse files Browse the repository at this point in the history
Add measles dataset
  • Loading branch information
rneher authored Jun 4, 2024
2 parents c0149e7 + fd8c47b commit d7774d1
Show file tree
Hide file tree
Showing 18 changed files with 72,209 additions and 29,713 deletions.
4 changes: 3 additions & 1 deletion data/nextstrain/collection.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@
"nextstrain/flu/h3n2/pa",
"nextstrain/flu/h1n1pdm/pb2",
"nextstrain/flu/h1n1pdm/pb1",
"nextstrain/flu/h3n2/pb2"
"nextstrain/flu/h3n2/pb2",
"nextstrain/measles",
"nextstrain/measles/N450/WHO-2012"
]
}
3 changes: 3 additions & 0 deletions data/nextstrain/measles/N450/WHO-2012/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Unreleased

Initial release.
32 changes: 32 additions & 0 deletions data/nextstrain/measles/N450/WHO-2012/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Measles dataset

| Key | Value |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| name | Measles N450 (WHO-2012)
| authors | [Nextstrain](https://nextstrain.org) |
| reference | NC_001498.1 |
| workflow | https://github.com/nextstrain/measles/tree/main/nextclade |
| path | `nextstrain/measles/N450/WHO-2012` |


## Scope of this dataset

This dataset assigns genotypes to measles samples based on [criteria outlined by the WHO](https://www.who.int/publications/i/item/WER8709).

The WHO has defined 24 measles genotypes based on N gene and H gene sequences from 28 reference strains. For new measles samples, genotypes can be assigned based on genetic similarity to the reference strains at the "N450" region (a 450 bp region of the N gene).

The reference tree used in this dataset includes N450 sequences for the 28 reference strains, along with other representative strains for each genotype.

This dataset can be used to assign genotypes to any sequence that includes at least 400 bp of the N450 region, including whole genome sequences. Sequence data beyond the N450 region will be reported as an insertion in the Nextclade output.

## Features

This dataset supports:

- Assignment of genotypes
- Phylogenetic placement
- Sequence quality control (QC)

## What are Nextclade datasets

Read more about Nextclade datasets in the Nextclade documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
Loading

0 comments on commit d7774d1

Please sign in to comment.