-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add plot of natural diversity vs DMS (#16)
* add plot of natural diversity vs DMS * write natural sequence diversity and escape to CSV
- Loading branch information
Showing
7 changed files
with
1,953 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Analysis of natural spike diversity performed outside the main pipeline | ||
This subdirectory contains analysis of the natural diversity of spike at different sites, and is run manually by the Jupyter notebook in this subdirectory (so outside the main pipeline). | ||
It plots natural sequence diversity versus DMS. | ||
|
||
The file [data/NC_045512.2.gp](data/NC_045512.2.gp) defines the domain structure of spike. | ||
|
||
The file [data/ncov_open_global_all-time.json](data/ncov_open_global_all-time.json) is the Nextstrain JSON downloaded from [https://nextstrain.org/ncov/open/global/all-time](https://nextstrain.org/ncov/open/global/all-time) on Dec-28-2024 that contains all of the data on the submsampled Nextstrain global SARS-CoV-2 phylogeny using open sequences. | ||
|
||
The Jupyter notebook [spike_schematic.ipynb](spike_schematic.ipynb) is run manually to analyze this JSON and other data files to make schematics of the spike with the number of effective amino acids at each site, which are place in `.results/`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
LOCUS YP_009724390 1273 aa linear VRL 18-JUL-2020 | ||
DEFINITION surface glycoprotein [Severe acute respiratory syndrome coronavirus | ||
2]. | ||
ACCESSION YP_009724390 | ||
VERSION YP_009724390.1 | ||
DBLINK BioProject: PRJNA485481 | ||
DBSOURCE REFSEQ: accession NC_045512.2 | ||
KEYWORDS RefSeq. | ||
SOURCE Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) | ||
ORGANISM Severe acute respiratory syndrome coronavirus 2 | ||
Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes; | ||
Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae; | ||
Betacoronavirus; Sarbecovirus. | ||
REFERENCE 1 (residues 1 to 1273) | ||
AUTHORS Wu,F., Zhao,S., Yu,B., Chen,Y.M., Wang,W., Song,Z.G., Hu,Y., | ||
Tao,Z.W., Tian,J.H., Pei,Y.Y., Yuan,M.L., Zhang,Y.L., Dai,F.H., | ||
Liu,Y., Wang,Q.M., Zheng,J.J., Xu,L., Holmes,E.C. and Zhang,Y.Z. | ||
TITLE A new coronavirus associated with human respiratory disease in | ||
China | ||
JOURNAL Nature 579 (7798), 265-269 (2020) | ||
PUBMED 32015508 | ||
REMARK Erratum:[Nature. 2020 Apr;580(7803):E7. PMID: 32296181] | ||
REFERENCE 2 (residues 1 to 1273) | ||
CONSRTM NCBI Genome Project | ||
TITLE Direct Submission | ||
JOURNAL Submitted (17-JAN-2020) National Center for Biotechnology | ||
Information, NIH, Bethesda, MD 20894, USA | ||
REFERENCE 3 (residues 1 to 1273) | ||
AUTHORS Wu,F., Zhao,S., Yu,B., Chen,Y.-M., Wang,W., Hu,Y., Song,Z.-G., | ||
Tao,Z.-W., Tian,J.-H., Pei,Y.-Y., Yuan,M.L., Zhang,Y.-L., | ||
Dai,F.-H., Liu,Y., Wang,Q.-M., Zheng,J.-J., Xu,L., Holmes,E.C. and | ||
Zhang,Y.-Z. | ||
TITLE Direct Submission | ||
JOURNAL Submitted (05-JAN-2020) Shanghai Public Health Clinical Center & | ||
School of Public Health, Fudan University, Shanghai, China | ||
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final | ||
NCBI review. The reference sequence is identical to QHD43416. | ||
Annotation was added using homology to SARSr-CoV NC_004718.3. ### | ||
Formerly called 'Wuhan seafood market pneumonia virus.' If you have | ||
questions or suggestions, please email us at [email protected] | ||
and include the accession number NC_045512.### Protein structures | ||
can be found at | ||
https://www.ncbi.nlm.nih.gov/structure/?term=sars-cov-2.### Find | ||
all other Severe acute respiratory syndrome coronavirus 2 | ||
(SARS-CoV-2) sequences at | ||
https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/ | ||
|
||
##Assembly-Data-START## | ||
Assembly Method :: Megahit v. V1.1.3 | ||
Sequencing Technology :: Illumina | ||
##Assembly-Data-END## | ||
COMPLETENESS: full length. | ||
Method: conceptual translation. | ||
FEATURES Location/Qualifiers | ||
Spike 1..1273 | ||
S1 16..681 | ||
S2 682..1211 | ||
NTD 13..304 | ||
RBD 319..541 | ||
binding_loop_1 417..420 | ||
binding_loop_2 446..456 | ||
binding_loop_3 486..505 | ||
ORIGIN | ||
1 mfvflvllpl vssqcvnltt rtqlppaytn sftrgvyypd kvfrssvlhs tqdlflpffs | ||
61 nvtwfhaihv sgtngtkrfd npvlpfndgv yfasteksni irgwifgttl dsktqslliv | ||
121 nnatnvvikv cefqfcndpf lgvyyhknnk swmesefrvy ssannctfey vsqpflmdle | ||
181 gkqgnfknlr efvfknidgy fkiyskhtpi nlvrdlpqgf saleplvdlp iginitrfqt | ||
241 llalhrsylt pgdsssgwta gaaayyvgyl qprtfllkyn engtitdavd caldplsetk | ||
301 ctlksftvek giyqtsnfrv qptesivrfp nitnlcpfge vfnatrfasv yawnrkrisn | ||
361 cvadysvlyn sasfstfkcy gvsptklndl cftnvyadsf virgdevrqi apgqtgkiad | ||
421 ynyklpddft gcviawnsnn ldskvggnyn ylyrlfrksn lkpferdist eiyqagstpc | ||
481 ngvegfncyf plqsygfqpt ngvgyqpyrv vvlsfellha patvcgpkks tnlvknkcvn | ||
541 fnfngltgtg vltesnkkfl pfqqfgrdia dttdavrdpq tleilditpc sfggvsvitp | ||
601 gtntsnqvav lyqdvnctev pvaihadqlt ptwrvystgs nvfqtragcl igaehvnnsy | ||
661 ecdipigagi casyqtqtns prrarsvasq siiaytmslg aensvaysnn siaiptnfti | ||
721 svtteilpvs mtktsvdctm yicgdstecs nlllqygsfc tqlnraltgi aveqdkntqe | ||
781 vfaqvkqiyk tppikdfggf nfsqilpdps kpskrsfied llfnkvtlad agfikqygdc | ||
841 lgdiaardli caqkfngltv lpplltdemi aqytsallag titsgwtfga gaalqipfam | ||
901 qmayrfngig vtqnvlyenq klianqfnsa igkiqdslss tasalgklqd vvnqnaqaln | ||
961 tlvkqlssnf gaissvlndi lsrldkveae vqidrlitgr lqslqtyvtq qliraaeira | ||
1021 sanlaatkms ecvlgqskrv dfcgkgyhlm sfpqsaphgv vflhvtyvpa qeknfttapa | ||
1081 ichdgkahfp regvfvsngt hwfvtqrnfy epqiittdnt fvsgncdvvi givnntvydp | ||
1141 lqpeldsfke eldkyfknht spdvdlgdis ginasvvniq keidrlneva knlneslidl | ||
1201 qelgkyeqyi kwpwyiwlgf iagliaivmv timlccmtsc csclkgccsc gscckfdedd | ||
1261 sepvlkgvkl hyt | ||
// | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.