This software is part of a larger pipeline to call structural variants in single-cell Strand-seq data.
For optimal integration with pipeline version 1.0, please use version
0.3.1-dev
.
Mosaicatcher can be built using Cmake (v3.0) on Linux and MacOS.
It relies on two external dependecies, which should both be installed on your system:
- boost libraries >= 1.50.
- HTSlib >= 1.3.1.
git clone https://github.com/friendsofstrandseq/mosaicatcher.git
cd mosaicatcher
mkdir build
cd build
cmake ../src
make
./mosaic --version
Mosaicatcher counts Strand-seq reads and classifies strand states of each chromosome in each cell using a Hidden Markov Model.
Choose between bins of fixed width (-w
) or predefined bins (-b
).
Here is an example for bins with a fixed width of 200kb:
./build/mosaic count \
-o counts.txt.gz \
-i counts.info \
-x data/exclude/GRCh38_full_analysis_set_plus_decoy_hla.exclude \
-w 200000 \
cell1.bam cell2.bam [...]
To generate QC plots from these tables run
Rscript R/qc.R \
counts.txt.gz \
counts.info \
counts.pdf
- Sequencing reads should be supplied in exactly one BAM file per single cell
- Each BAM file must contain a single read group (
@RG
). Cells are grouped into samples by using the sameSM
tag. - BAM files must be sorted and indexed.
Simulate strand-seq data and SVs on the level of binned counts. You are asked to specify an SV config file such as in the example data/simulation/example.txt
.
Then run
./build/mosaic simulate \
-o counts.txt.gz \
svconfig.txt
Rscript R/qc.R counts.txt.gz counts.pdf
For information on Strand-seq see
Falconer E et al., 2012 (doi: 10.1038/nmeth.2206)