-
Notifications
You must be signed in to change notification settings - Fork 189
Genome build
We have included a suite of tools including genome size survey, genetic map and Hi-C heatmap to check for quality of genome build.
Tip
Download the test dataset here.
The raw sequencing data provides a way to estimate the size, ploidy, heterozygosity and repeat content of a genome, similar to GenomeScope. Let's say that you have a kmer count histogram (commonly generated by Jellyfish, or other kmer counter), in a file reads.histo
.
1 1281576854
2 89292133
3 21588481
4 9347716
5 5569400
6 4705214
With 1st column the frequency of kmer in the sequencing data, and 2nd column the abundance of kmer with a given frequency. It is easy to infer all the genome statistics and annotate directly on the kmer histogram.
python -m jcvi.assembly.kmer histogram reads.histo "*S. species* ‘Variety 1’" 21
This takes the kmer counts and the species name that goes in the tile. Finally the size K
when used to generate the kmer histogram. Behind the scenes, a negative binomial mixture model is applied to approximate the various genome statistics, including the ploidy of the genome.
You can then simply read various genome statistics from the plot, and that the genome is a tetraploid.
© Haibao Tang, 2010-2024