Feature-Extraction-Workflow-for-Target-Mutation-Selection

Bioinformatics pipeline for the extraction of all tumor features necessary for the selection of clonal mutations from whole-exome sequencing (WES) data in tumor-normal setting. The pipeline employs two publicly accessible tools, Mutect2 for the detection of single nucleotide variants (SNV) and insertion and deletion (indel) variants (available at https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2), and PureCN for the estimation of tumor features such as cancer cell fraction (CCF), multiplicity of mutations, tumor sample purity, and ploidy (available at https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html).

paper link will be available

Requirements

Installed conda (available instructions: https://conda.io/projects/conda/en/latest/user-guide/install/index.html)

Input files

Tumor and normal bam files of queried sample

Provided here

A bash script create_interval_and_pon.sh, which is for a one-time run to create necessary files (NormalDB and interval.file) for running sample workflow.py
workflow.py that details the gwf pipeline which creates the desired output files for the queried sample (a gwf tutorial is found at https://gwf.app/guide/tutorial)

Quick start

1. Install conda environment from provided gatk-purecn.yml

conda env create -f gatk-purecn.yml

2. In the project folder create a sub-folder for generating NormalDB and the interval file needed to run the pipeline

Copy provided one-time run bash script
Download GRCh38 reference file (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz)
Download mappability file for GRCh38 (GCA_000001405.15_GRCh38_no_alt_analysis_set_76.bw from https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html)
Change paths and run the script

./create_interval_and_pon.sh

3.To set up Slurm back-end, run the following command (described at https://gwf.app/guide/tutorial)

gwf config set backend slurm

4. After successfully generating of necessary files, run gwf workflow.py for the queried sample(s) after adjusting paths within workflow.py (under config) file

gwf run

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
create_interval_and_pon.sh		create_interval_and_pon.sh
gatk-purecn.yml		gatk-purecn.yml
workflow.py		workflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature-Extraction-Workflow-for-Target-Mutation-Selection

Requirements

Input files

Provided here

Quick start

About

Releases

Packages

Contributors 2

Languages

maxanes/Feature-Extraction-Workflow-for-Target-Mutation-Selection

Folders and files

Latest commit

History

Repository files navigation

Feature-Extraction-Workflow-for-Target-Mutation-Selection

Requirements

Input files

Provided here

Quick start

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages