⚠️ This pipeline is in its early stages. Please use with caution.
Minghao Jiang, [email protected]
-
SV callers
cuteSV, Sniffles, SVIM, SVision, Severus, NanoSV, NanoVar, Delly, Debreak
-
Annotation tools
-
R packages
vroom, tibble, glue, dplyr, tidyr, purrr, GenomicRanges, stringr, BiocParallel, parallel
-
Other tools
Minimap2, SAMtools, BCFtools, SURVIVOR, vcf2maf (1.6.21), SnpSift, duphold
%%{
init: {
'theme': 'base',
'themeVariables': {
'fontFamily': 'Comic Sans MS',
'primaryColor': '#ACD98DFF',
'primaryTextColor': 'black',
'lineColor': 'black',
'secondaryColor': 'grey',
'tertiaryColor': '#EEEEEE'
}
}
}%%
flowchart TD
classDef norm stroke-width: 1px, stroke: black;
classDef tool fill: #FFB977FF, padding: 0px, margin: 0px, stroke-width: 1px, stroke: black;
classDef other_file stroke-dasharray: 5 5, stroke-width: 1px, stroke: black;
classDef other_tool fill: #FFB977FF, padding: 0px, margin: 0px, stroke-dasharray: 5 5, stroke-width: 1px, stroke: black;
classDef output fill: #98D9E4FF, stroke-width: 1px, stroke: black;
fastq@{ shape: lean-r, label: "FASTQ" }
bam([BAM])
cutesv_vcf([VCF])
sniffles2_vcf([VCF])
cutesv_filtered_vcf([filtered
VCF])
sniffles2_filtered_vcf([filtered
VCF])
merged_vcf([merged VCF])
annotated_vcf([annotated VCF/TSV])
somatic_vcf@{ shape: lean-l, label: "somatic SVs" }
germline_vcf@{ shape: lean-l, label: "germline SVs" }
other_vcfs([VCFs])
other_filtered_vcf([filtered
VCFs])
minimap2@{ shape: tag-rect, label: "Minimap2" }
cutesv@{ shape: tag-rect, label: "cuteSV" }
sniffles2@{ shape: tag-rect, label: " Sniffles2 " }
other_callers@{ shape: tag-rect, label: "other\ncallers" }
bcftools@{ shape: tag-rect, label: "BCFtools" }
survivor@{ shape: tag-rect, label: "SURVIVOR" }
anno_tools@{ shape: tag-rect, label: "VEP, AnnotSV, SnpEff" }
fastq:::norm
bam:::norm
cutesv_vcf:::norm
sniffles2_vcf:::norm
other_vcfs:::other_file
cutesv_filtered_vcf:::norm
sniffles2_filtered_vcf:::norm
other_filtered_vcf:::other_file
merged_vcf:::norm
annotated_vcf:::norm
somatic_vcf:::output
germline_vcf:::output
minimap2:::tool
cutesv:::tool
sniffles2:::tool
other_callers:::other_tool
bcftools:::tool
survivor:::tool
anno_tools:::tool
fastq --> minimap2 --> bam
bam --> cutesv --> cutesv_vcf --> bcftools --> cutesv_filtered_vcf --> survivor
bam --> sniffles2 --> sniffles2_vcf --> bcftools --> sniffles2_filtered_vcf --> survivor
bam -.-> other_callers -.-> other_vcfs -.-> bcftools -.-> other_filtered_vcf -.-> survivor
survivor --> merged_vcf --> anno_tools --> annotated_vcf
annotated_vcf --> somatic_vcf
annotated_vcf --> germline_vcf
-
Clone this repo and navigate into it:
git clone https://github.com/jasonwong-lab/smk_sv.git cd smk_sv
- Follow all steps below after you are in the top dir of this repo.
- Uncomment all rules in the
Snakefile
. - Check the predefined
wildcards_constraints
in theSnakefile
and modify/delete it if necessary. - Using a JSON schema to validate the configuration file might prevent Snakemake from monitoring changes to the parameters. You can comment the
validate(config, "config/config.schema.json")
in theSnakefile
.
-
Install AnnotSV manually.
- AnnotSV is not included in the image due to its large annotation resources (~ 20GB) that cannot be specified elsewhere.
- Creating a lock file for each combination of sample and type_sv has been implemented. However, AnnotSV might still encounter errors since it doesn’t support processing multiple files within the same directory. To address this, an additional resource parameter
constraint_annotsv=1
has been added to the ruleannotate_sv_annotsv
to ensure that only one instance of AnnotSV runs at a time. You can modify this parameter inworkflow/profile/default/config.yaml
where its default is1
.
-
Prepare config files:
- Copy
config/config-test.yaml
toconfig/config.yaml
.- Adjust the configuration settings according to your project's needs.
- Specification of important elements:
dir_run
: working directory where all results will be stored.mapper
: dict whose keys are names of mappers and values (boolean) indicate whether perform mapping or not. Only the first mapper will be used. When a mapper is specified and its value isfalse
, no mapping by this mapper will be performed, but its results will be used in the following steps.callers
: dict whose keys are names of callers and values (boolean) indicate whether perform SV calling using this caller or not. When a caller is specified and its value isfalse
, no SV calling by this caller will be performed, but its results will be used in the following steps.types_sv
: SV types to be called. BND indicates translocations.threads
: number of CPUs of each rule to be used.- ...
- Copy
workflow/profiles/default/config-test.yaml
toworkflow/profiles/default/config.yaml
.- Bind directories you need in the container.
- Change the number of CPUs you prefer.
- Modify/add/delete other parameters of this snakemake pipeline.
- Copy
-
Prepare sample data:
- Copy
config/pep/samples-test.csv
toconfig/pep/samples.csv
, and updatesample_name
in the csv. - Copy
config/pep/config-test.yaml
andconfig/pep/config.yaml
. More information please see Portable Encapsulated Projects (PEP).
- Copy
-
Set up Conda environments:
snakemake --conda-create-envs-only
-
Run the pipeline locally:
snakemake
-
Run the pipeline on a cluster: If you want to run this pipeline on a cluster (e.g., SLURM, or PBS), you should customise your own profile and place it into
~/.config/snakemake/
, and then run the pipeline with the profile you have set as a parameter:snakemake --profile <your_profile_name>
Or run the pipeline with the profile you have set as an environment variable:
export SNAKEMAKE_PROFILE=<your_profile_name> snakemake
You can refer to the profile I have been using at
workflow/profiles/mycluster
, or turn to snakemake websites.
If you are using a cluster that does not support Singularity well, please switch to the without_docker
branch. This branch is tailored for environments where containers might not be the best option.
git checkout without_docker
Codes here are licensed under the GNU General Public License v3.