compute label density across fasta records

Applying the labeldensity.pl script to the human genome (hg19)

This script looks for label occurrences across a reference genome and computes the label density in bins of a user-provided size. It also produced BED and IGV files allowing IGV visualisation.

In the example below, the human reference genome was searched for Nt.BspQI sites (both strands) and the density in 100knb bins computed for visualization in IGV. A random region was selected to show a zoomed view (chr4:1,486,940-16,191,114) and colors were adapted from the quantile distribution of the densities in order to easily identify regions of under-labelling (0-25% quantile Q1) or over-labelling (75-100% quantile Q3). A second copy og the same track was added in point mode with a line showing the median labelling. A additional tracks for N-regions (aka gaps) was added (made with the script fastaFindGaps.pl).

The command used to create the Nt.BspQI track was:

labeldensity.pl -i hg19.fa -t BspQI-density -l 20000 -b 100000 -n 'GCTCTTC'

The distribution of BspQI densities across the human genome (hg19) is computed and plotted by the script. Please note the values for 25%, 50%, and 75% which will be used later on for the colors and median line in IGV.

density distribution

The density track created by labeldensity.pl was added twice in IGV next to the standard human genes track, and the human Gaps (N-regions). One copy of the track was shown as heatmap and the second copy as dots.

density heatmap

The colors for the heatmap view were taken from the quantiles in the above plot. The horizontal line in the lower plot corresponds to the median value of the distribution. ngstools