Skip to content

Commit

Permalink
add plot of natural diversity vs DMS (#16)
Browse files Browse the repository at this point in the history
* add plot of natural diversity vs DMS

* write natural sequence diversity and escape to CSV
  • Loading branch information
jbloom authored Dec 30, 2024
1 parent 21231fb commit 6e288b9
Show file tree
Hide file tree
Showing 7 changed files with 1,953 additions and 0 deletions.
9 changes: 9 additions & 0 deletions natural_diversity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Analysis of natural spike diversity performed outside the main pipeline
This subdirectory contains analysis of the natural diversity of spike at different sites, and is run manually by the Jupyter notebook in this subdirectory (so outside the main pipeline).
It plots natural sequence diversity versus DMS.

The file [data/NC_045512.2.gp](data/NC_045512.2.gp) defines the domain structure of spike.

The file [data/ncov_open_global_all-time.json](data/ncov_open_global_all-time.json) is the Nextstrain JSON downloaded from [https://nextstrain.org/ncov/open/global/all-time](https://nextstrain.org/ncov/open/global/all-time) on Dec-28-2024 that contains all of the data on the submsampled Nextstrain global SARS-CoV-2 phylogeny using open sequences.

The Jupyter notebook [spike_schematic.ipynb](spike_schematic.ipynb) is run manually to analyze this JSON and other data files to make schematics of the spike with the number of effective amino acids at each site, which are place in `.results/`.
87 changes: 87 additions & 0 deletions natural_diversity/data/NC_045512.2.gp
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
LOCUS YP_009724390 1273 aa linear VRL 18-JUL-2020
DEFINITION surface glycoprotein [Severe acute respiratory syndrome coronavirus
2].
ACCESSION YP_009724390
VERSION YP_009724390.1
DBLINK BioProject: PRJNA485481
DBSOURCE REFSEQ: accession NC_045512.2
KEYWORDS RefSeq.
SOURCE Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
ORGANISM Severe acute respiratory syndrome coronavirus 2
Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes;
Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae;
Betacoronavirus; Sarbecovirus.
REFERENCE 1 (residues 1 to 1273)
AUTHORS Wu,F., Zhao,S., Yu,B., Chen,Y.M., Wang,W., Song,Z.G., Hu,Y.,
Tao,Z.W., Tian,J.H., Pei,Y.Y., Yuan,M.L., Zhang,Y.L., Dai,F.H.,
Liu,Y., Wang,Q.M., Zheng,J.J., Xu,L., Holmes,E.C. and Zhang,Y.Z.
TITLE A new coronavirus associated with human respiratory disease in
China
JOURNAL Nature 579 (7798), 265-269 (2020)
PUBMED 32015508
REMARK Erratum:[Nature. 2020 Apr;580(7803):E7. PMID: 32296181]
REFERENCE 2 (residues 1 to 1273)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (17-JAN-2020) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 3 (residues 1 to 1273)
AUTHORS Wu,F., Zhao,S., Yu,B., Chen,Y.-M., Wang,W., Hu,Y., Song,Z.-G.,
Tao,Z.-W., Tian,J.-H., Pei,Y.-Y., Yuan,M.L., Zhang,Y.-L.,
Dai,F.-H., Liu,Y., Wang,Q.-M., Zheng,J.-J., Xu,L., Holmes,E.C. and
Zhang,Y.-Z.
TITLE Direct Submission
JOURNAL Submitted (05-JAN-2020) Shanghai Public Health Clinical Center &
School of Public Health, Fudan University, Shanghai, China
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence is identical to QHD43416.
Annotation was added using homology to SARSr-CoV NC_004718.3. ###
Formerly called 'Wuhan seafood market pneumonia virus.' If you have
questions or suggestions, please email us at [email protected]
and include the accession number NC_045512.### Protein structures
can be found at
https://www.ncbi.nlm.nih.gov/structure/?term=sars-cov-2.### Find
all other Severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2) sequences at
https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/

##Assembly-Data-START##
Assembly Method :: Megahit v. V1.1.3
Sequencing Technology :: Illumina
##Assembly-Data-END##
COMPLETENESS: full length.
Method: conceptual translation.
FEATURES Location/Qualifiers
Spike 1..1273
S1 16..681
S2 682..1211
NTD 13..304
RBD 319..541
binding_loop_1 417..420
binding_loop_2 446..456
binding_loop_3 486..505
ORIGIN
1 mfvflvllpl vssqcvnltt rtqlppaytn sftrgvyypd kvfrssvlhs tqdlflpffs
61 nvtwfhaihv sgtngtkrfd npvlpfndgv yfasteksni irgwifgttl dsktqslliv
121 nnatnvvikv cefqfcndpf lgvyyhknnk swmesefrvy ssannctfey vsqpflmdle
181 gkqgnfknlr efvfknidgy fkiyskhtpi nlvrdlpqgf saleplvdlp iginitrfqt
241 llalhrsylt pgdsssgwta gaaayyvgyl qprtfllkyn engtitdavd caldplsetk
301 ctlksftvek giyqtsnfrv qptesivrfp nitnlcpfge vfnatrfasv yawnrkrisn
361 cvadysvlyn sasfstfkcy gvsptklndl cftnvyadsf virgdevrqi apgqtgkiad
421 ynyklpddft gcviawnsnn ldskvggnyn ylyrlfrksn lkpferdist eiyqagstpc
481 ngvegfncyf plqsygfqpt ngvgyqpyrv vvlsfellha patvcgpkks tnlvknkcvn
541 fnfngltgtg vltesnkkfl pfqqfgrdia dttdavrdpq tleilditpc sfggvsvitp
601 gtntsnqvav lyqdvnctev pvaihadqlt ptwrvystgs nvfqtragcl igaehvnnsy
661 ecdipigagi casyqtqtns prrarsvasq siiaytmslg aensvaysnn siaiptnfti
721 svtteilpvs mtktsvdctm yicgdstecs nlllqygsfc tqlnraltgi aveqdkntqe
781 vfaqvkqiyk tppikdfggf nfsqilpdps kpskrsfied llfnkvtlad agfikqygdc
841 lgdiaardli caqkfngltv lpplltdemi aqytsallag titsgwtfga gaalqipfam
901 qmayrfngig vtqnvlyenq klianqfnsa igkiqdslss tasalgklqd vvnqnaqaln
961 tlvkqlssnf gaissvlndi lsrldkveae vqidrlitgr lqslqtyvtq qliraaeira
1021 sanlaatkms ecvlgqskrv dfcgkgyhlm sfpqsaphgv vflhvtyvpa qeknfttapa
1081 ichdgkahfp regvfvsngt hwfvtqrnfy epqiittdnt fvsgncdvvi givnntvydp
1141 lqpeldsfke eldkyfknht spdvdlgdis ginasvvniq keidrlneva knlneslidl
1201 qelgkyeqyi kwpwyiwlgf iagliaivmv timlccmtsc csclkgccsc gscckfdedd
1261 sepvlkgvkl hyt
//

1 change: 1 addition & 0 deletions natural_diversity/data/ncov_open_global_all-time.json

Large diffs are not rendered by default.

Loading

0 comments on commit 6e288b9

Please sign in to comment.