-
Notifications
You must be signed in to change notification settings - Fork 6
compute label density across fasta records
This script looks for label occurrences across a reference genome and computes the label density in bins of a user-provided size. It also produced BED and IGV files allowing IGV visualisation.
In the example below, the human reference genome was searched for Nt.BspQI sites (both strands) and the density in 100knb bins computed for visualization in IGV. A random region was selected to show a zoomed view (chr4:1,486,940-16,191,114) and colors were adapted from the quantile distribution of the densities in order to easily identify regions of under-labelling (0-25% quantile Q1) or over-labelling (75-100% quantile Q3). A second copy og the same track was added in point mode with a line showing the median labelling. A additional tracks for N-regions (aka gaps) was added (made with the script fastaFindGaps.pl).
The command used to create the Nt.BspQI track was:
labeldensity.pl -i hg19.fa -t BspQI-density -l 20000 -b 100000 -n 'GCTCTTC'
The distribution of BspQI densities across the human genome (hg19) is computed and plotted by the script. Please note the values for 25%, 50%, and 75% which will be used later on for the colors and median line in IGV.
The density track created by labeldensity.pl was added twice in IGV next to the standard human genes track, and the human Gaps (N-regions). One copy of the track was shown as heatmap and the second copy as dots.
The colors for the heatmap view were taken from the quantiles in the above plot. The horizontal line in the lower plot corresponds to the median value of the distribution.
A zoomed region in chr4 shows a concentration of over-labelling (red color).
Users can estimate biases and locate potential fragile sites when the additional raw label data track is added and zoom is set high enough.
Please send comments and feedback to [email protected]
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.