-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: (#405) somatic variant qc #407
Open
giacuong171
wants to merge
20
commits into
main
Choose a base branch
from
405-somatic-variant-qc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 11 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
ad07ebf
Adding wrapper for somatic variants QC
94cda27
Adding somatic variant qc pipeline
bd7b411
Update somatic_variant_checking
de36370
adding test for somatic_variant_checking
09d8088
format checking
73416b0
fixing format
c30f22e
Finalizing somatic variant checking
956904f
finalize somatic variants checking
4ce1159
Fix typo
1f5b0e0
fixing typo
e99ef3c
fixing small bugs
62b6e6f
fix: json format doesn't accept floats starting with .
ericblanc20 5d5246a
Finalizing somatic variant checking
64e7b61
Merge branch '405-somatic-variant-qc' of github.com:bihealth/snappy-p…
4750e75
update mantis enviroment
e9cb75b
finializing somatic_variant_checking
e9dbc5e
fixing vep environment
8d1d2c8
Merge branch 'main' into 405-somatic-variant-qc
giacuong171 70d5bb0
Merge branch '405-somatic-variant-qc' of github.com:bihealth/snappy-p…
fdbdfff
make isort happy
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
64 changes: 64 additions & 0 deletions
64
snappy_pipeline/workflows/somatic_variant_checking/Snakefile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# -*- coding: utf-8 -*- | ||
"""CUBI Pipeline tumor mutational burden step Snakefile""" | ||
|
||
import os | ||
|
||
from snappy_pipeline import expand_ref | ||
from snappy_pipeline.workflows.somatic_variant_checking import ( | ||
SomaticVariantQCWorkflow, | ||
) | ||
|
||
__author__ = "Gia Cuong Pham" | ||
__email__ = "[email protected]" | ||
|
||
|
||
# Configuration =============================================================== | ||
|
||
|
||
configfile: "config.yaml" | ||
|
||
|
||
# Expand "$ref" JSON pointers in configuration (also works for YAML) | ||
config, lookup_paths, config_paths = expand_ref("config.yaml", config) | ||
|
||
# WorkflowImpl Object Setup =================================================== | ||
wf = SomaticVariantQCWorkflow(workflow, config, lookup_paths, config_paths, os.getcwd()) | ||
|
||
|
||
localrules: | ||
# Linking files from work/ to output/ should be done locally | ||
somatic_variant_qc_link_out_run, | ||
|
||
|
||
rule all: | ||
input: | ||
wf.get_result_files(), | ||
|
||
|
||
# Generic linking out --------------------------------------------------------- | ||
|
||
|
||
rule somatic_variant_qc_link_out_run: | ||
input: | ||
wf.get_input_files("link_out", "run"), | ||
output: | ||
wf.get_output_files("link_out", "run"), | ||
run: | ||
shell(wf.get_shell_cmd("link_out", "run", wildcards)) | ||
|
||
|
||
rule somatic_variant_qc: | ||
input: | ||
**wf.get_input_files("SVQC_step", "run"), | ||
output: | ||
**wf.get_output_files("SVQC_step", "run"), | ||
threads: wf.get_resource("SVQC_step", "run", "threads") | ||
resources: | ||
time=wf.get_resource("SVQC_step", "run", "time"), | ||
memory=wf.get_resource("SVQC_step", "run", "memory"), | ||
partition=wf.get_resource("SVQC_step", "run", "partition"), | ||
tmpdir=wf.get_resource("SVQC_step", "run", "tmpdir"), | ||
log: | ||
**wf.get_log_file("SVQC_step", "run"), | ||
wrapper: | ||
wf.wrapper_path("somatic_variants_checking") |
253 changes: 205 additions & 48 deletions
253
snappy_pipeline/workflows/somatic_variant_checking/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,56 +1,213 @@ | ||
# -*- coding: utf-8 -*- | ||
"""Implementation of the germline ``somatic_variant_checking`` step | ||
import os | ||
import sys | ||
|
||
The ``somatic_variant_checking`` step takes as the input the results of the | ||
``somatic_variant_annotation`` step. It then executes various tools computing statistics on the | ||
result files and consistency checks with the pedigrees. | ||
from biomedsheets.shortcuts import CancerCaseSheet, CancerCaseSheetOptions, is_not_background | ||
from snakemake.io import expand | ||
|
||
.. note:: | ||
from snappy_pipeline.utils import dictify, listify | ||
from snappy_pipeline.workflows.abstract import BaseStep, BaseStepPart, LinkOutStepPart | ||
from snappy_pipeline.workflows.ngs_mapping import NgsMappingWorkflow, ResourceUsage | ||
from snappy_pipeline.workflows.somatic_variant_calling import ( | ||
SOMATIC_VARIANT_CALLERS_MATCHED, | ||
SomaticVariantCallingWorkflow, | ||
) | ||
|
||
Status: not implemented yet | ||
#: Extensions of files to create as main payload | ||
EXT_VALUES = (".json", ".json.md5") | ||
|
||
========== | ||
Step Input | ||
========== | ||
#: Names of the files to create for the extension | ||
EXT_NAMES = ("json", "json_md5") | ||
|
||
The variant calling step uses Snakemake sub workflows for using the result of the | ||
``somatic_variant_annotation`` step. | ||
|
||
=========== | ||
Step Output | ||
=========== | ||
|
||
.. note:: TODO | ||
|
||
==================== | ||
Global Configuration | ||
==================== | ||
|
||
.. note:: TODO | ||
|
||
===================== | ||
Default Configuration | ||
===================== | ||
|
||
The default configuration is as follows. | ||
|
||
.. include:: DEFAULT_CONFIG_somatic_variant_checking.rst | ||
|
||
======= | ||
Reports | ||
======= | ||
|
||
Currently, no reports are generated. | ||
""" | ||
|
||
__author__ = "Manuel Holtgrewe <[email protected]>" | ||
|
||
#: Available tools for checking variants | ||
VARIANT_CHECKERS = ("bcftools_stats", "peddy") | ||
|
||
#: Default configuration for the somatic_gene_fusion_calling step | ||
#: Default configuration for the tmb calculation step | ||
DEFAULT_CONFIG = r""" | ||
step_config: | ||
somatic_variant_checking: | ||
path_somatic_variant_calling: ../somatic_variant_calling # REQUIRED | ||
somatic_variant_checking: | ||
path_somatic_variant_calling: ../somatic_variant_calling # REQUIRED | ||
tools_ngs_mapping: [] # default to those configured for ngs_mapping | ||
tools_somatic_variant_calling: [] # default to those configured for somatic_variant_calling | ||
target_regions: # REQUIRED | ||
padding: 0 # Used for count the number of variants outside of exom + padding | ||
variant_allele_frequency_id: 'AF' # REQUIRED ID of allele frequency field used in vcf file | ||
ignore_regions: "" # hard mapping regions | ||
minimal_support_read: 1 # threshold for defining a variant that has minimal support reads | ||
limited_support_read: 5 # threshold for defining a variant that has limited support reads | ||
""" | ||
|
||
|
||
class SomaticVariantQCStepPart(BaseStepPart): | ||
"""""" | ||
|
||
name = "SVQC_step" | ||
|
||
actions = ("run",) | ||
|
||
def __init__(self, parent): | ||
super().__init__(parent) | ||
|
||
@dictify | ||
def get_input_files(self, action): | ||
self._validate_action(action) | ||
tpl = ( | ||
"output/{mapper}.{var_caller}.{tumor_library}/out/" | ||
"{mapper}.{var_caller}.{tumor_library}" | ||
) | ||
key_ext = { | ||
"full_vcf": ".full.vcf.gz", | ||
"full_vcf_tbi": ".full.vcf.gz.tbi", | ||
"passed_vcf": ".vcf.gz", | ||
"passed_vcf_tbi": ".vcf.gz.tbi", | ||
} | ||
variant_calling = self.parent.sub_workflows["somatic_variant_calling"] | ||
for key, ext in key_ext.items(): | ||
yield key, variant_calling(tpl + ext) | ||
|
||
@dictify | ||
def get_output_files(self, action): | ||
# Validate action | ||
self._validate_action(action) | ||
prefix = ( | ||
"work/{mapper}.{var_caller}.variantsqc.{tumor_library}/out/" | ||
"{mapper}.{var_caller}.variantsqc.{tumor_library}" | ||
) | ||
key_ext = {"json": ".json"} | ||
for key, ext in key_ext.items(): | ||
yield key, prefix + ext | ||
yield key + "_md5", prefix + ext + ".md5" | ||
|
||
@dictify | ||
def _get_log_file(self, action): | ||
self._validate_action(action=action) | ||
prefix = ( | ||
"work/{mapper}.{var_caller}.variantsqc.{tumor_library}/log/" | ||
"{mapper}.{var_caller}.variantsqc.{tumor_library}" | ||
) | ||
|
||
key_ext = ( | ||
("log", ".log"), | ||
("conda_info", ".conda_info.txt"), | ||
("conda_list", ".conda_list.txt"), | ||
) | ||
for key, ext in key_ext: | ||
yield key, prefix + ext | ||
|
||
def get_resource_usage(self, action): | ||
"""Get Resource Usage | ||
:param action: Action (i.e., step) in the workflow, example: 'run'. | ||
:type action: str | ||
:return: Returns ResourceUsage for step. | ||
:raises UnsupportedActionException: if action not in class defined list of valid actions. | ||
""" | ||
self._validate_action(action=action) | ||
mem_mb = 4 * 1024 # 4GB | ||
return ResourceUsage( | ||
threads=2, | ||
time="1:00:00", # 1 hour | ||
memory=f"{mem_mb}M", | ||
) | ||
|
||
|
||
class SomaticVariantQCWorkflow(BaseStep): | ||
"""Perform gathering variant information""" | ||
|
||
name = "somaticvariantchecking" | ||
sheet_shortcut_class = CancerCaseSheet | ||
sheet_shortcut_kwargs = { | ||
"options": CancerCaseSheetOptions(allow_missing_normal=True, allow_missing_tumor=True) | ||
} | ||
|
||
@classmethod | ||
def default_config_yaml(cls): | ||
"""Return default config YAML, to be overwritten by project-specific one.""" | ||
return DEFAULT_CONFIG | ||
|
||
def __init__(self, workflow, config, config_lookup_paths, config_paths, workdir): | ||
super().__init__( | ||
workflow, | ||
config, | ||
config_lookup_paths, | ||
config_paths, | ||
workdir, | ||
(SomaticVariantCallingWorkflow, NgsMappingWorkflow), | ||
) | ||
# Register sub step classes so the sub steps are available | ||
self.register_sub_step_classes((SomaticVariantQCStepPart, LinkOutStepPart)) | ||
# Register sub workflows | ||
self.register_sub_workflow( | ||
"somatic_variant_calling", | ||
self.w_config["step_config"]["somatic_variant_checking"][ | ||
"path_somatic_variant_calling" | ||
], | ||
) | ||
# Copy over "tools" setting from somatic_variant_calling/ngs_mapping if not set here | ||
if not self.w_config["step_config"]["somatic_variant_checking"]["tools_ngs_mapping"]: | ||
self.w_config["step_config"]["somatic_variant_checking"][ | ||
"tools_ngs_mapping" | ||
] = self.w_config["step_config"]["ngs_mapping"]["tools"]["dna"] | ||
if not self.w_config["step_config"]["somatic_variant_checking"][ | ||
"tools_somatic_variant_calling" | ||
]: | ||
self.w_config["step_config"]["somatic_variant_checking"][ | ||
"tools_somatic_variant_calling" | ||
] = self.w_config["step_config"]["somatic_variant_calling"]["tools"] | ||
|
||
@listify | ||
def get_result_files(self): | ||
callers = set( | ||
self.w_config["step_config"]["somatic_variant_checking"][ | ||
"tools_somatic_variant_calling" | ||
] | ||
) | ||
name_pattern = "{mapper}.{caller}.variantsqc.{tumor_library.name}" | ||
yield from self._yield_result_files_matched( | ||
os.path.join("output", name_pattern, "out", name_pattern + "{ext}"), | ||
mapper=self.w_config["step_config"]["somatic_variant_checking"]["tools_ngs_mapping"], | ||
caller=callers & set(SOMATIC_VARIANT_CALLERS_MATCHED), | ||
ext=EXT_VALUES, | ||
) | ||
yield from self._yield_result_files_matched( | ||
os.path.join("output", name_pattern, "log", name_pattern + "{ext}"), | ||
mapper=self.w_config["step_config"]["somatic_variant_checking"]["tools_ngs_mapping"], | ||
caller=callers & set(SOMATIC_VARIANT_CALLERS_MATCHED), | ||
ext=( | ||
".log", | ||
".log.md5", | ||
".conda_info.txt", | ||
".conda_info.txt.md5", | ||
".conda_list.txt", | ||
".conda_list.txt.md5", | ||
), | ||
) | ||
|
||
def _yield_result_files_matched(self, tpl, **kwargs): | ||
"""Build output paths from path template and extension list. | ||
|
||
This function returns the results from the matched somatic variant callers such as | ||
Mutect. | ||
""" | ||
for sheet in filter(is_not_background, self.shortcut_sheets): | ||
for sample_pair in sheet.all_sample_pairs: | ||
if not sample_pair.tumor_sample.dna_ngs_library: | ||
msg = ( | ||
"INFO: sample pair for cancer bio sample {} has is missing primary" | ||
"normal or primary cancer NGS library" | ||
) | ||
print(msg.format(sample_pair.tumor_sample.name), file=sys.stderr) | ||
continue | ||
yield from expand( | ||
tpl, | ||
tumor_library=[sample_pair.tumor_sample.dna_ngs_library], | ||
**kwargs, | ||
) | ||
|
||
def check_config(self): | ||
"""Check that the path to the NGS mapping is present""" | ||
self.ensure_w_config( | ||
("step_config", "somatic_variant_checking", "path_somatic_variant_calling"), | ||
"Path to variant calling not configured but required for somatic variant qc", | ||
) | ||
|
||
self.ensure_w_config( | ||
("step_config", "somatic_variant_checking", "target_regions"), | ||
giacuong171 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"Path to target_regions file (bed format)" | ||
"not configured but required for somatic variant qc", | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,8 @@ from snappy_pipeline.workflows.tumor_mutational_burden import ( | |
TumorMutationalBurdenCalculationWorkflow, | ||
) | ||
|
||
__author__ = "" | ||
__author__ = "Gia Cuong Pham" | ||
__email__ = "[email protected]" | ||
|
||
|
||
# Configuration =============================================================== | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tumor sample with DNA data should be used here.
Looping over tumor samples paired with a normal sample makes sense when calling somatic variants, because the method requires paired samples.
But we need to move away with the paired normal/tumor requirements is other somatic steps, because there will be panel data with only tumor samples