NucleR is an R/Bioconductor package for working with next generation sequencing and tilling arrays. It uses a novel aproach in this field which comprises a deep profile cleaning using Fourier Transform and peak scoring for a quick and flexible nucleosome calling.
The aim of this package is not providing an all-in-one data analysis pipeline but complement those existing specialized libraries for low-level data importation and pre-processment into R/Bioconductor framework.
NucleR works with data from high-troughput technologies MNase-seq and ChIP-seq, and Tiling Microarrays (ChIP-on-Chip).
This is a brief summary of the main functions:
- Data import:
readBAM
,processReads
,processTilingArray
- Data transformation:
coverage.rpm
,filterFFT
,controlCorrection
- Nucleosome calling:
peakDetection
,peakScoring
- Visualization:
plotPeaks
- Data generation:
syntheticNucMap
This software was published in Bioinformatics Journal: Flores, O., and Orozco, M. (2011). nucleR: a package for non-parametric nucleosome positioning. Bioinformatics 27, 2149–2150.
Follow these instructions to install 'nucleR' from Bioconductor repository in a linux system:
- Install curl-devel system library (required for Rcurl).
# i.e., for Ubuntu distributions:
apt-get install libcurl4-gnutls-dev
# i.e., for openSuse distributions:
yast2 -i libcurl-devel
- Make sure in the R shell you have an updated BiocManager manager:
(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
- nucleR depends on several R packages. To install them run:
BiocManager::install('dplyr')
BiocManager::install('IRanges')
BiocManager::install('GenomicRanges')
BiocManager::install('ShortRead',ask = FALSE)
BiocManager::install('doParallel')
BiocManager::install('ggplot2')
BiocManager::install('magrittr')
- Install nucleR in R:
BiocManager::install('nucleR')
Alternatively, build the R package from the source code deposited in this repository:
git clone https://github.com/nucleosome-dynamics/nucleR.git
tar -czvf nucleR.tar.gz nucleR
install.packages("nucleR.tar.gz", repos = NULL)
This is an example of the main steps followed to analyse nuclesome positioning data with nucleR. For more details about the functions, description of the example data, how to upload your data and additional analyses refer to nucleR manual and vignette.
1- Load the package in R
library(nucleR)
2- Load the example data provided with the package, containing position of MNase-seq reads mapped to S. cerevisiae genome
data(nucleosome_htseq)
class(nucleosome_htseq)
nucleosome_htseq
3- Filter reads and remove noise: discard the reads longer than 200bp (threshold given to only keep mononucleosomes), remove noise due to MNase efficiency by trimming reads to use only its central part (50bp around the dyad)
reads_trim <- processReads(nucleosome_htseq, type="paired", fragmentLen=200, trim=50)
4- Obtain the normalized coverage (the count of how many reads are mapped to each position, divided by the total number of reads and multiplied by one milion)
cover_trim <- coverage.rpm(reads_trim)
5- Smooth the coverage signal using the Fast Fourier Transformation
fft_ta <- filterFFT(cover_trim, pcKeepComp=0.01, showPowerSpec=TRUE)
6- Detect peaks in the smoothed coverage which correspond to nucleosome dyads and score them according to their fuzziness level
peaks <- peakDetection(fft_ta, threshold="25%", score=TRUE, width=147)
This repository builds on the original nucleR package, written by Oscar Flores.
TODO
- Add other old functions
- Add tests
- Test on windows