Skip to content

Commit

Permalink
Merge pull request #90 from nextstrain/update-sc2
Browse files Browse the repository at this point in the history
Update SC2 datasets
  • Loading branch information
corneliusroemer authored Sep 21, 2023
2 parents 67fcbeb + e883ffe commit ada2e5a
Show file tree
Hide file tree
Showing 18 changed files with 47,043 additions and 11 deletions.
111 changes: 111 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,116 @@
# CHANGELOG

## 2023-09-21

### New SARS-CoV-2 dataset version (tag `2023-09-21T12:00:00Z`)

- Add extra example sequences (GL.1, FL.1.5.1, EG.5.1, HV.1, DV.7.1, GK.2, BA.2.86)
- Add new frame shifts and stop codons to be ignored by QC
- Pango lineages designated between 2023-08-09 and 2023-09-20 are now included, unfold below to see a list with designation dates:

<details>
<summary> Newly included lineages, with designation date in parentheses</summary>

- JA.1 (2023-08-10)
- JB.1 (2023-08-10)
- JB.2 (2023-08-10)
- FW.1.1 (2023-08-22)
- JC.1 (2023-08-22)
- XBB.1.41.2 (2023-08-22)
- GA.4.1 (2023-08-22)
- GE.1.1 (2023-08-22)
- GE.1.2 (2023-08-22)
- XBB.1.5.102 (2023-08-22)
- JD.1 (2023-08-22)
- FL.1.5.2 (2023-08-23)
- JB.2.1 (2023-08-24)
- HK.6 (2023-08-24)
- GS.2 (2023-08-24)
- GS.3 (2023-08-24)
- GS.4 (2023-08-24)
- GS.4.1 (2023-08-24)
- GE.1.3 (2023-08-24)
- JE.1 (2023-08-24)
- HG.2 (2023-08-24)
- JF.1 (2023-08-24)
- JF.2 (2023-08-24)
- GK.1.4 (2023-08-24)
- GK.2.1 (2023-08-24)
- HK.3.1 (2023-08-24)
- XCH (2023-08-24)
- XCJ (2023-08-24)
- GM.3 (2023-08-24)
- GM.3.1 (2023-08-24)
- HC.2 (2023-08-25)
- EG.5.1.7 (2023-08-25)
- JD.1.1 (2023-08-25)
- JD.1.2 (2023-08-25)
- XBB.1.5.103 (2023-08-30)
- XCH.1 (2023-08-30)
- JG.1 (2023-08-30)
- HK.7 (2023-08-30)
- EG.2.2 (2023-08-30)
- EG.2.3 (2023-08-30)
- EG.2.4 (2023-08-30)
- EG.2.5 (2023-08-30)
- EG.13 (2023-08-31)
- FL.4.8 (2023-08-31)
- FL.4.9 (2023-08-31)
- FL.4.10 (2023-08-31)
- FY.3.2 (2023-08-31)
- FY.3.3 (2023-08-31)
- JH.1 (2023-08-31)
- JH.2 (2023-08-31)
- FL.2.6 (2023-08-31)
- XCK (2023-08-31)
- FL.10.2 (2023-08-31)
- XCL (2023-09-01)
- FL.30 (2023-09-01)
- GW.1.1 (2023-09-01)
- GK.2.2 (2023-09-01)
- HF.1.1 (2023-09-01)
- HF.1.2 (2023-09-01)
- XBB.1.5.104 (2023-09-01)
- XBB.1.16.23 (2023-09-01)
- XBB.1.5.105 (2023-09-01)
- HK.8 (2023-09-01)
- HK.9 (2023-09-01)
- JJ.1 (2023-09-01)
- HK.10 (2023-09-01)
- JK.1 (2023-09-01)
- HS.1.1 (2023-09-01)
- HK.11 (2023-09-04)
- JG.2 (2023-09-04)
- XBB.1.16.24 (2023-09-04)
- JL.1 (2023-09-04)
- BA.2.86.1 (2023-09-05)
- XCM (2023-09-05)
- FY.4.1.2 (2023-09-06)
- EG.6.1.1 (2023-09-07)
- GJ.1.2.2 (2023-09-10)
- GJ.1.2.3 (2023-09-10)
- GJ.1.2.4 (2023-09-10)
- GJ.1.2.5 (2023-09-10)
- FL.4.11 (2023-09-10)
- HK.3.2 (2023-09-10)
- HK.3.3 (2023-09-10)
- EG.14 (2023-09-10)
- GK.3.2 (2023-09-10)
- CK.1.1.2 (2023-09-10)
- EF.3 (2023-09-10)
- GW.5.1 (2023-09-11)
- GW.5.1.1 (2023-09-11)
- DV.7.1.1 (2023-09-11)
- DV.7.1.2 (2023-09-11)
- EG.5.1.8 (2023-09-11)
- GK.2.3 (2023-09-17)
- GK.4 (2023-09-17)
- EG.5.1.9 (2023-09-17)
- JG.3 (2023-09-17)
- XBB.1.5.106 (2023-09-17)

</details>

## 2023-08-22

### New influenza dataset version (tag `2023-08-10T12:00:00Z`)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
##gff-version 3
##sequence-region MN908947 1 29903
# Gene map (genome annotation) of SARS-CoV-2 in GFF format.
# For gene map purpses we only need some of the columns. We substitute unused values with "." as per GFF spec.
# See GFF format reference at https://www.ensembl.org/info/website/upload/gff.html
# seqname source feature start end score strand frame attribute
MN908947 GenBank gene 266 13468 . + . gene_name=ORF1a
MN908947 GenBank gene 13468 21555 . + . gene_name=ORF1b
MN908947 GenBank gene 25393 26220 . + . gene_name=ORF3a
MN908947 GenBank gene 21563 25384 . + . gene_name=S
MN908947 GenBank gene 26245 26472 . + . gene_name=E
MN908947 GenBank gene 26523 27191 . + . gene_name=M
MN908947 GenBank gene 27202 27387 . + . gene_name=ORF6
MN908947 GenBank gene 27394 27759 . + . gene_name=ORF7a
MN908947 GenBank gene 27756 27887 . + . gene_name=ORF7b
MN908947 GenBank gene 27894 28259 . + . gene_name=ORF8
MN908947 GenBank gene 28274 29533 . + . gene_name=N
MN908947 GenBank gene 28284 28577 . + . gene_name=ORF9b
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Country (Institute),Target,Oligonucleotide,Sequence

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"tag": "2023-09-21T12:00:00Z",
"comment": "Update to include lineage BA.2.86",
"compatibility": {
"nextcladeCli": {
"min": "1.10.0",
"max": null
},
"nextcladeWeb": {
"min": "1.13.0",
"max": null
}
},
"enabled": true,
"files": {
"geneMap": "genemap.gff",
"primers": "primers.csv",
"qc": "qc.json",
"reference": "reference.fasta",
"sequences": "sequences.fasta",
"tree": "tree.json",
"virusPropertiesJson": "virus_properties.json"
},
"metadata": {}
}

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
##gff-version 3
##sequence-region MN908947 1 29903
# Gene map (genome annotation) of SARS-CoV-2 in GFF format.
# For gene map purpses we only need some of the columns. We substitute unused values with "." as per GFF spec.
# See GFF format reference at https://www.ensembl.org/info/website/upload/gff.html
# seqname source feature start end score strand frame attribute
MN908947 GenBank gene 266 13468 . + . gene_name=ORF1a
MN908947 GenBank gene 13468 21555 . + . gene_name=ORF1b
MN908947 GenBank gene 25393 26220 . + . gene_name=ORF3a
MN908947 GenBank gene 21563 25384 . + . gene_name=S
MN908947 GenBank gene 26245 26472 . + . gene_name=E
MN908947 GenBank gene 26523 27191 . + . gene_name=M
MN908947 GenBank gene 27202 27387 . + . gene_name=ORF6
MN908947 GenBank gene 27394 27759 . + . gene_name=ORF7a
MN908947 GenBank gene 27756 27887 . + . gene_name=ORF7b
MN908947 GenBank gene 27894 28259 . + . gene_name=ORF8
MN908947 GenBank gene 28274 29533 . + . gene_name=N
MN908947 GenBank gene 28284 28577 . + . gene_name=ORF9b
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Country (Institute),Target,Oligonucleotide,Sequence
Charité (Germany),RdRp,Charité_RdRp_F,GTGARATGGTCATGTGTGGCGG
Charité (Germany),RdRp,Charité_S_RdRp_P,CAGGTGGAACCTCATCAGGAGATGC
Charité (Germany),RdRp,Charité_RdRp_R,CARATGTTAAASACACTATTAGCATA
Charité (Germany),E,Charité_E_F,ACAGGTACGTTAATAGTTAATAGCGT
Charité (Germany),E,Charité_E_P,ACACTAGCCATCCTTACTGCGCTTCG
Charité (Germany),E,Charité_E_R,ATATTGCAGCAGTACGCACACA
Charité (Germany),N,Charité_N_F,CACATTGGCACCCGCAATC
Charité (Germany),N,Charité_N_P,ACTTCCTCAAGGAACAACATTGCCA
Charité (Germany),N,Charité_N_R,GAGGAACGAGAAGAGGCTTG
HKU (Hong Kong),ORF1b-nsp14,HKU_ORF_F,TGGGGYTTTACRGGTAACCT
HKU (Hong Kong),ORF1b-nsp14,HKU_ORF_P,TAGTTGTGATGCWATCATGACTAG
HKU (Hong Kong),ORF1b-nsp14,HKU_ORF_R,AACRCGCTTAACAAAGCACTC
HKU (Hong Kong),N,HKU_N_F,TAATCAGACAAGGAACTGATTA
HKU (Hong Kong),N,HKU_N_P,GCAAATTGTGCAATTTGCGG
HKU (Hong Kong),N,HKU_N_R,CGAAGGTGTGACTTCCATG
China CDC (China),N,ChinaCDC_N_F,GGGGAACTTCTCCTGCTAGAAT
China CDC (China),N,ChinaCDC_N_P,TTGCTGCTGCTTGACAGATT
China CDC (China),N,ChinaCDC_N_R,CAGACATTTTGCTCTCAAGCTG
China CDC (China),ORF1ab-nsp10,ChinaCDC_ORF_F,CCCTGTGGGTTTTACACTTAA
China CDC (China),ORF1ab-nsp10,ChinaCDC_ORF_P,CCGTCTGCGGTATGTGGAAAGGTTATGG
China CDC (China),ORF1ab-nsp10,ChinaCDC_ORF_R,ACGATTGTGCATCAGCTGA
US CDC (United States),N1,USCDC_N1_F,GACCCCAAAATCAGCGAAAT
US CDC (United States),N1,USCDC_N1_P,ACCCCGCATTACGTTTGGTGGACC
US CDC (United States),N1,USCDC_N1_R,TCTGGTTACTGCCAGTTGAATCTG
US CDC (United States),N2,USCDC_N2_F,TTACAAACATTGGCCGCAAA
US CDC (United States),N2,USCDC_N2_P,ACAATTTGCCCCCAGCGCTTCAG
US CDC (United States),N2,USCDC_N2_R,GCGCGACATTCCGAAGAA
US CDC (United States),N3,USCDC_N3_F,GGGAGCCTTGAATACACCAAAA
US CDC (United States),N3,USCDC_N3_P,AYCACATTGGCACCCGCAATCCTG
US CDC (United States),N3,USCDC_N3_R,TGTAGCACGATTGCAGCATTG
"Institut Pasteur, Paris (France)",RdRp,Pasteur_IP2_F,ATGAGCTTAGTCCTGTTG
"Institut Pasteur, Paris (France)",RdRp,Pasteur_IP2_P,AGATGTCTTGTGCTGCCGGTA
"Institut Pasteur, Paris (France)",RdRp,Pasteur_IP2_R,CTCCCTTTGTTGTGTTGT
"Institut Pasteur, Paris (France)",RdRp,Pasteur_IP4_F,GGTAACTGGTATGATTTCG
"Institut Pasteur, Paris (France)",RdRp,Pasteur_IP4_P,TCATACAAACCACGCCAGG
"Institut Pasteur, Paris (France)",RdRp,Pasteur_IP4_R,CTGGTCAAGGTTAATATAGG
Loading

0 comments on commit ada2e5a

Please sign in to comment.