This repo contains the relevant snakemake rules for to carry out FastQ modification, BAM generation and variant calling as explained in our article "Clonal hematopoiesis in metastatic urothelial and kidney cancer".
Our analysis pipeline is written in Python using the Snakemake workflow management system. Please follow these instructions for setup:
- All dependencies needed to run the pipeline are provided in the
envs/snakemake.yaml
file. You can create the snakemake environment by runningconda env create -f snakemake.yaml
- GATK's base calibration tool that we use requires the following three files (
resources_broad_hg38_v0_Homo_sapiens_assembly38.known_indels.vcf
,resources_broad_hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf
,resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf
) which can be downloaded from this Google Cloud link. - Mutect2 requires a panel of normals, which can be obtained from this link.
- Edit the
config.yaml
file with the file locations of relevant files and directories. - Workflow expects raw FastQ files to be placed into the
results/data/fastq
directory. All files placed here will be processed by the pipeline. - The pipeline can be evoked with the
snakemake
command. By issuingsnakemake -n
you can issue a dry run.