Abstract: Genome sequences constitute the primary evidence on the origin and spread of the 2019-2020 Covid-19 pandemic. Rapid comparative analysis of coronavirus CoV-SARS-2 genomes is critical for disease control, outbreak forecasting, and for developing clinical interventions. With CoV Genome Tracker we trace viral genomic changes in real time using a haplotype network, an accurate and scalable representation of evolutionary changes at micro-evolutionary time scale. We resolve direction of mutations by using a bat-associated genome as the outgroup. At the macro-evolutionary time scale, the Genome Tracker provides gene-by-gene and codon-by-codon evolutionary rates to facilitate the search for molecular targets of clinical interventions.
This is the repository for the CoV-SARS-2 Genome Tracker http://cov.genometracker.org
- Download hCoV-19 genomic sequences from GISAID
- Parse sequences and meta-data using parse-metadata.ipynb
- Discard haplotypes with missing 10% or more bases then identify haplotype imputing interior missing bases with closest haplotype
- Build minimum spanning tree for unique haplotypes using hapnet.pl
- Create a bootstrap alignments and infer networks using hapnet-boot.pl
- Get consensus network using net-consense.pl
Reads the GISAID hCov-19 sequence metadata and adds geo-location.
parse-metadata.ipynb
It outputs into the following formats:
covid-19-[current_date].tsv
covid-[current_date].fasta
Align each CoV-SARS-2 isolates from GISAID to the NCBI reference genome Wuhan-Hu-1 (Genbank accession ID: NC_045512)
input: fasta containing hCov-19 genome, reference seq Wuhan-Hu-1, ploidy file, folder name bamfile for non-human sequences, outgroup (sorted bam file).
infer-genome.bash [ covid-[file_date.fasta] ] [Wuhan-Hu-1 reference sequence] [vcf ploidy file] [folder name that contains non-human cov genome bam files] [outgroup bam file]
Estimating missing genotypes from haplotype or genotype reference panel.
- input:
- SAM file containing haplotypes with each sequence < 10% gaps (any non-ATGC: "n", "." or "-")
impute-hap.pl [input] []
- output
- log file with the removed sequences with 10% gaps or more
- sequence type (ST) alignment
Reconstruct a network of unique haplotypes This program calculates the minimum spanning tree of a set haplotype.
hapnet.pl