This pipeline is designed for calling somatic structural and small variants from long-read sequencing data (e.g. Pacbio-HiFi and Ont).
It works on LSF job scheduler and can run multiple jobs in parallel.
Basically, this pipeline is composed of two steps:
for structral variants, and
for small variants.
Usage: a. perl config4StrucVar.tsv b. perl config4SmallVar.tsv
Option 1: set a environment for LSF job on compute1 by adding the following to ~/.bashrc file:
do NOT forget to run source ~/.bashrc
Option 2: modify the configure file (config.tsv) to specify the location of the softwares (minimap2 & Sniffles2) and the minimap2 index (if available) in the configure file. for example, MINIMAP2 ${your_path}/minimap2
prepare your sample list file (sample.lst)
format: #id sequencing_platform path_of_fastq
Note: specify the sequencing platform as the parameter -x used in minimap2 (
modify the parameters configure file (config.tsv), inclduing the paths of output "OUTDIR", sample list "SAMPLE", softwares and index and the bsub setting.
run perl config4StrucVar.tsv
Please be sure the output dir is writtable and all softwares can be invoked.
Take care of the Sniffles mode!!! This pipeline defaultly runs both basic and mosaic (for low-frequency/non-germline SVs) modes.
Small variant calling is based on ClairS ( This step requires the output bam file from Minimap2!
a. load new environment:
b. set up a configure file and a sample list.
sample list format: #id tumor_bam_path normal_bam_path (if none, will run in the tumor only mode)
c. run perl config4SmallVar.tsv
