Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pindel artifact annotation and filter #71

Merged
merged 18 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .tests/integration/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ reference:
design_bed: "data/bed/design.bed"
design_intervals_gatk_cnv: "data/bed/design.bed"
artifacts: "reference/artifact_panel.tsv"
artifacts_pindel: "reference/artifact_panel.tsv"
background: "reference/background_panel.tsv"

bwa_mem:
Expand Down
1 change: 1 addition & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ reference:
design_intervals: "/projects/wp2/nobackup/Twist_Myeloid/Bed_files/Twist_myeloid_v.1.1_padd6_201126.sorted.intervals"
design_intervals_gatk_cnv: ""
artifacts: "FILL_ME_IN.tsv"
artifacts_pindel: "FILL_ME_IN.tsv"
background: "FILL_ME_IN.tsv"
skip_chrs:
- "chrM"
Expand Down
15 changes: 10 additions & 5 deletions config/config_soft_filter_pindel.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
filters:
vaf:
description: "Soft filter variants with low vaf (AF lower than 0.01)"
expression: "(INFO:AF:0 < 0.01)"
soft_filter_flag: "AF_0.01"
soft_filter: "True"
depth:
description: "Soft filter on depth lower than 200"
expression: "FORMAT:DP < 200"
Expand All @@ -9,11 +14,6 @@ filters:
expression: "(FORMAT:AD:1 < 5)"
soft_filter_flag: "AD_5"
soft_filter: "True"
vaf:
description: "Soft filter variants with low vaf (AF lower than 0.01)"
expression: "(INFO:AF:0 < 0.01)"
soft_filter_flag: "AF_0.01"
soft_filter: "True"
intron:
description: "Soft filter intronic variants except if also splice, in cosmic, or in gata2 or tert genes"
expression: "(exist[intron_variant, VEP:Consequence] and !exist[splice, VEP:Consequence] and VEP:SYMBOL != TERT and VEP:SYMBOL != GATA2 and !exist[COSV[0-9]+, VEP:Existing_variation])"
Expand All @@ -24,3 +24,8 @@ filters:
expression: "(VEP:MAX_AF > 0.02)"
soft_filter_flag: "PopAF_0.02"
soft_filter: "True"
artifacts:
description: "Soft filter position that occurs in 4 or more normal samples and AF is less than 5 sd from median af in normalpool"
expression: "(INFO:Artifact > 3 and INFO:ArtifactNrSD < 5)"
soft_filter_flag: "Artifact_gt_3"
soft_filter: "True"
4 changes: 4 additions & 0 deletions config/output_files_references.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ files:
input: references/create_artifact_file/artifact_panel.tsv
output: artifact_panel.tsv

- name: artifacts pindel tsv-file
input: references/create_artifact_file_pindel/artifact_panel.tsv
output: artifact_panel_pindel.tsv

- name: background tsv-file
input: "references/create_background_file/background_panel.tsv"
output: background_panel.tsv
101 changes: 95 additions & 6 deletions docs/softwares.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,112 @@

# Software used in Poppy
Rules specifically for Poppy listed here.

## [annotation_vep_pindel](https://www.ensembl.org/info/docs/tools/vep/index.html)
Since pindel is run on limited region it does not always produce results, if an empty vcf-file is used with VEP it will fail and the entire pipeline will stop, therefor a specific rule is needed to ensure there are variants in the pindel vcf before annotating the vcf. If no variants are found the empty vcf file is just copied to the output.
## pindel_processing.smk
[Pindel](http://gmt.genome.wustl.edu/packages/pindel/) creates an older type of VCF and therefore has to be processed slightly different than more modern VCFs. Here we add the AF and DP fields to the VCF INFO column, annotate the calls using [vep](https://www.ensembl.org/info/docs/tools/vep/index.html) and add artifact annotation based an on artifact panel created with the reference pipeline.

<!-- Since pindel is run on limited region it does not always produce results, if an empty vcf-file is used with VEP it will fail and the entire pipeline will stop, therefor a specific rule is needed to ensure there are variants in the pindel vcf before annotating the vcf. If no variants are found the empty vcf file is just copied to the output. -->

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__pindel_processing__pindel_processing_annotation_vep#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__pindel_processing__pindel_processing_annotation_vep#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__pindel_processing_annotation_vep#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__pindel_processing_annotation_vep#


### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__pindel_processing__pindel_processing_fix_af#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__pindel_processing__pindel_processing_fix_af#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__pindel_processing_fix_af#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__pindel_processing_fix_af#


### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__pindel_processing__pindel_processing_artifact_annotation#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__pindel_processing__pindel_processing_artifact_annotation#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__pindel_processing_artifact_annotation#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__pindel_processing_artifact_annotation#


## [svdb](https://github.com/J35P312/SVDB).smk
Since when running `svdb --merge` with the priority flag set, svdb cuts off the FORMAT column for cnvkit variants [git issue](). We use a non-Hydra Genetics rule for the `svdb --merge` command.

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__annotation_vep_pindel__annotation_vep_pindel#
#SNAKEMAKE_RULE_SOURCE__svdb__svdb_merge_wo_priority#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__annotation_vep_pindel__annotation_vep_pindel#
#SNAKEMAKE_RULE_TABLE__svdb__svdb_merge_wo_priority#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__annotation_vep_pindel#
#CONFIGSCHEMA__svdb_merge#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__annotation_vep_pindel#
#RESOURCESSCHEMA__svdb_merge#


---

## reference_rules.smk
Software used specifically to create the reference-files for Poppy.

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__reference_rules__reference_rules_create_artefact_file_pindel#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__reference_rules__reference_rules_create_artifact_file_pindel#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__reference_rules_create_artifact_file_pindel#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__reference_rules_create_artefact_file_pindel#


10 changes: 6 additions & 4 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@ __email__ = "[email protected]"
__license__ = "GPL-3"


ruleorder: annotation_vep_pindel > annotation_vep
ruleorder: pindel_processing_annotation_vep > annotation_vep
ruleorder: pindel_processing_artifact_annotation > annotation_artifact_annotation
ruleorder: svdb_merge_wo_priority > cnv_sv_svdb_merge


include: "rules/common.smk"
include: "rules/fix_af_pindel.smk"
include: "rules/annotation_vep_pindel.smk"
include: "rules/svdb.smk"
include: "rules/pindel_processing.smk"


report: "report/workflow.rst"
Expand Down Expand Up @@ -53,7 +55,7 @@ use rule * from annotation as annotation_*

module cnv_sv:
snakefile:
github("hydra-genetics/cnv_sv", path="workflow/Snakefile", tag="b549266")
github("hydra-genetics/cnv_sv", path="workflow/Snakefile", tag="1aa9a68")
config:
config

Expand Down
1 change: 1 addition & 0 deletions workflow/Snakefile_references.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
include: "rules/common_references.smk"
include: "rules/reference_rules.smk"


rule all:
Expand Down
37 changes: 0 additions & 37 deletions workflow/rules/annotation_vep_pindel.smk

This file was deleted.

24 changes: 21 additions & 3 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,10 @@ config = load_resources(config, config["resources"])
validate(config, schema="../schemas/resources.schema.yaml")

### Read and validate samples file

samples = pd.read_table(config["samples"], comment="#").set_index("sample", drop=False)
validate(samples, schema="../schemas/samples.schema.yaml")


### Read and validate units file
units = (
pandas.read_table(config["units"], dtype=str, comment="#")
Expand All @@ -109,9 +109,8 @@ with open(config["output"], "r") as f:
output_spec = yaml.safe_load(f.read())
validate(output_spec, schema="../schemas/output_files.schema.yaml", set_default=True)

### Set wildcard constraints


### Set wildcard constraints
wildcard_constraints:
barcode="[A-Z+]+",
chr="[^_]+",
Expand All @@ -121,4 +120,23 @@ wildcard_constraints:
type="N|T|R",


def get_vcfs_for_svdb_merge(wildcards, add_suffix=False):
vcf_dict = {}
for v in config.get("svdb_merge", {}).get("tc_method"):
tc_method = v["name"]
callers = v["cnv_caller"]
for caller in callers:
if add_suffix:
caller_suffix = f":{caller}"
else:
caller_suffix = ""
if tc_method in vcf_dict:
vcf_dict[tc_method].append(
f"cnv_sv/{caller}_vcf/{wildcards.sample}_{wildcards.type}.{tc_method}.vcf{caller_suffix}"
)
else:
vcf_dict[tc_method] = [f"cnv_sv/{caller}_vcf/{wildcards.sample}_{wildcards.type}.{tc_method}.vcf{caller_suffix}"]
return vcf_dict[wildcards.tc_method]


generate_copy_rules(output_spec)
2 changes: 2 additions & 0 deletions workflow/rules/common_references.smk
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ import yaml
from hydra_genetics.utils.resources import load_resources
from hydra_genetics import min_version as hydra_min_version
from hydra_genetics.utils.misc import replace_dict_variables
from hydra_genetics.utils.samples import *
from hydra_genetics.utils.units import *


include: "results.smk"
Expand Down
33 changes: 0 additions & 33 deletions workflow/rules/fix_af_pindel.smk

This file was deleted.

Loading