As described in the README a small dataset (30 samples by 300 targets) is distributed as part of the XHMM tutorial. Execute the following commands to call CNVs in that dataset with DECA. A video of this workflow running on OSX is available. More detail about the individual steps, parameters, etc. is included in the README.
Install DECA as described in the README:
git clone
Assuming that Maven and Spark are already installed (with
defined), build DECA with native deca export MAVEN_OPTS="-Xmx512m" mvn package -P native-lgpl
Add the
script toPATH
and create an environment variable pointing to the DECA jar.export PATH=$PATH:$PWD/bin export DECA_JAR=$PWD/deca-cli/target/deca-cli_2.11-0.2.1-SNAPSHOT.jar
Download and prepare XHMM tutorial data:
cd .. wget unzip cat low_complexity_targets.txt extreme_gc_targets.txt | sort -u > exclude_targets.txt
Run DECA on a workstation:
Note that you will need to set
to a suitable temporary directory for your system and likely need to change the number of cores, executor memory and driver memory to suitable values for your system. The resulting CNVs will be written toDECA.gff3
.deca-submit \ --master local[16] \ --driver-class-path $DECA_JAR \ --conf spark.local.dir=/path/to/temp/directory \ --conf spark.driver.maxResultSize=0 \ --conf spark.kryo.registrationRequired=true \ --executor-memory 96G --driver-memory 16G \ -- normalize_and_discover \ -min_some_quality 29.5 \ -exclude_targets exclude_targets.txt \ -I DATA.RD.txt \ -o DECA.gff3