Skip to content

Genome build

Haibao Tang edited this page May 4, 2024 · 10 revisions

We have included a suite of tools including genome size survey, genetic map and Hi-C heatmap to check for quality of genome build.

Tip

Download the test dataset here.

Genome size survey

The raw sequencing data provides a way to estimate the size, ploidy, heterozygosity and repeat content of a genome, similar to GenomeScope. Let's say that you have a kmer count histogram (commonly generated by Jellyfish, or other kmer counter), in a file reads.histo.

1 1281576854
2 89292133
3 21588481
4 9347716
5 5569400
6 4705214

With 1st column the frequency of kmer in the sequencing data, and 2nd column the abundance of kmer with a given frequency. It is easy to infer all the genome statistics and annotate directly on the kmer histogram.

python -m jcvi.assembly.kmer histogram reads.histo "*S. species* ‘Variety 1’" 21

This takes the kmer counts and the species name that goes in the tile. Finally the size K when used to generate the kmer histogram. Behind the scenes, a negative binomial mixture model is applied to approximate the various genome statistics, including the ploidy of the genome.

reads.png

You can then simply read various genome statistics from the plot, and that the genome is a tetraploid.

Genetic map concordance

Hi-C heatmap