Skip to content

Commit

Permalink
feat: add pindel artifact annotation and filter (#71)
Browse files Browse the repository at this point in the history
* refactor: add pipeline version and restructure software versions

* feat: hydra genetics 3.0.0 to include software versions into multiqc

* refactor: remove missed comment on min_version()

* style: make snakefmt happy

* feat: add background tsv file from refereces module

* feat: add background annotation to snv_indels vcf-file

* test: add missing background in test config

* chore: update cnv_sv module to include type in pindel vcf samplename

* refactor: move all pindel processing rules into one snakefile

* feat: create an artifact tsv-file in reference pipeline

* fix: add samplename in dict, e.g. more than one del same pos

* feat: add artifact annotation to pindel vcf

* feat: add artifact filtering to pindel vcf

* fix: broken svdb-merge-vcf when using priority flag

* docs: slight rewording and formatting

---------

Co-authored-by: Niklas Mähler <[email protected]>
  • Loading branch information
elleira and maehler authored Jul 5, 2024
1 parent f5c1903 commit 8aaa171
Show file tree
Hide file tree
Showing 22 changed files with 661 additions and 122 deletions.
1 change: 1 addition & 0 deletions .tests/integration/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ reference:
design_bed: "data/bed/design.bed"
design_intervals_gatk_cnv: "data/bed/design.bed"
artifacts: "reference/artifact_panel.tsv"
artifacts_pindel: "reference/artifact_panel.tsv"
background: "reference/background_panel.tsv"

bwa_mem:
Expand Down
1 change: 1 addition & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ reference:
design_intervals: "/projects/wp2/nobackup/Twist_Myeloid/Bed_files/Twist_myeloid_v.1.1_padd6_201126.sorted.intervals"
design_intervals_gatk_cnv: ""
artifacts: "FILL_ME_IN.tsv"
artifacts_pindel: "FILL_ME_IN.tsv"
background: "FILL_ME_IN.tsv"
skip_chrs:
- "chrM"
Expand Down
15 changes: 10 additions & 5 deletions config/config_soft_filter_pindel.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
filters:
vaf:
description: "Soft filter variants with low vaf (AF lower than 0.01)"
expression: "(INFO:AF:0 < 0.01)"
soft_filter_flag: "AF_0.01"
soft_filter: "True"
depth:
description: "Soft filter on depth lower than 200"
expression: "FORMAT:DP < 200"
Expand All @@ -9,11 +14,6 @@ filters:
expression: "(FORMAT:AD:1 < 5)"
soft_filter_flag: "AD_5"
soft_filter: "True"
vaf:
description: "Soft filter variants with low vaf (AF lower than 0.01)"
expression: "(INFO:AF:0 < 0.01)"
soft_filter_flag: "AF_0.01"
soft_filter: "True"
intron:
description: "Soft filter intronic variants except if also splice, in cosmic, or in gata2 or tert genes"
expression: "(exist[intron_variant, VEP:Consequence] and !exist[splice, VEP:Consequence] and VEP:SYMBOL != TERT and VEP:SYMBOL != GATA2 and !exist[COSV[0-9]+, VEP:Existing_variation])"
Expand All @@ -24,3 +24,8 @@ filters:
expression: "(VEP:MAX_AF > 0.02)"
soft_filter_flag: "PopAF_0.02"
soft_filter: "True"
artifacts:
description: "Soft filter position that occurs in 4 or more normal samples and AF is less than 5 sd from median af in normalpool"
expression: "(INFO:Artifact > 3 and INFO:ArtifactNrSD < 5)"
soft_filter_flag: "Artifact_gt_3"
soft_filter: "True"
4 changes: 4 additions & 0 deletions config/output_files_references.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ files:
input: references/create_artifact_file/artifact_panel.tsv
output: artifact_panel.tsv

- name: artifacts pindel tsv-file
input: references/create_artifact_file_pindel/artifact_panel.tsv
output: artifact_panel_pindel.tsv

- name: background tsv-file
input: "references/create_background_file/background_panel.tsv"
output: background_panel.tsv
101 changes: 95 additions & 6 deletions docs/softwares.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,112 @@

# Software used in Poppy
Rules specifically for Poppy listed here.

## [annotation_vep_pindel](https://www.ensembl.org/info/docs/tools/vep/index.html)
Since pindel is run on limited region it does not always produce results, if an empty vcf-file is used with VEP it will fail and the entire pipeline will stop, therefor a specific rule is needed to ensure there are variants in the pindel vcf before annotating the vcf. If no variants are found the empty vcf file is just copied to the output.
## pindel_processing.smk
[Pindel](http://gmt.genome.wustl.edu/packages/pindel/) creates an older type of VCF and therefore has to be processed slightly different than more modern VCFs. Here we add the AF and DP fields to the VCF INFO column, annotate the calls using [vep](https://www.ensembl.org/info/docs/tools/vep/index.html) and add artifact annotation based an on artifact panel created with the reference pipeline.

<!-- Since pindel is run on limited region it does not always produce results, if an empty vcf-file is used with VEP it will fail and the entire pipeline will stop, therefor a specific rule is needed to ensure there are variants in the pindel vcf before annotating the vcf. If no variants are found the empty vcf file is just copied to the output. -->

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__pindel_processing__pindel_processing_annotation_vep#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__pindel_processing__pindel_processing_annotation_vep#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__pindel_processing_annotation_vep#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__pindel_processing_annotation_vep#


### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__pindel_processing__pindel_processing_fix_af#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__pindel_processing__pindel_processing_fix_af#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__pindel_processing_fix_af#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__pindel_processing_fix_af#


### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__pindel_processing__pindel_processing_artifact_annotation#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__pindel_processing__pindel_processing_artifact_annotation#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__pindel_processing_artifact_annotation#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__pindel_processing_artifact_annotation#


## [svdb](https://github.com/J35P312/SVDB).smk
Since when running `svdb --merge` with the priority flag set, svdb cuts off the FORMAT column for cnvkit variants [git issue](). We use a non-Hydra Genetics rule for the `svdb --merge` command.

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__annotation_vep_pindel__annotation_vep_pindel#
#SNAKEMAKE_RULE_SOURCE__svdb__svdb_merge_wo_priority#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__annotation_vep_pindel__annotation_vep_pindel#
#SNAKEMAKE_RULE_TABLE__svdb__svdb_merge_wo_priority#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__annotation_vep_pindel#
#CONFIGSCHEMA__svdb_merge#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__annotation_vep_pindel#
#RESOURCESSCHEMA__svdb_merge#


---

## reference_rules.smk
Software used specifically to create the reference-files for Poppy.

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__reference_rules__reference_rules_create_artefact_file_pindel#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__reference_rules__reference_rules_create_artifact_file_pindel#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__reference_rules_create_artifact_file_pindel#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__reference_rules_create_artefact_file_pindel#


10 changes: 6 additions & 4 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@ __email__ = "[email protected]"
__license__ = "GPL-3"


ruleorder: annotation_vep_pindel > annotation_vep
ruleorder: pindel_processing_annotation_vep > annotation_vep
ruleorder: pindel_processing_artifact_annotation > annotation_artifact_annotation
ruleorder: svdb_merge_wo_priority > cnv_sv_svdb_merge


include: "rules/common.smk"
include: "rules/fix_af_pindel.smk"
include: "rules/annotation_vep_pindel.smk"
include: "rules/svdb.smk"
include: "rules/pindel_processing.smk"


report: "report/workflow.rst"
Expand Down Expand Up @@ -53,7 +55,7 @@ use rule * from annotation as annotation_*

module cnv_sv:
snakefile:
github("hydra-genetics/cnv_sv", path="workflow/Snakefile", tag="b549266")
github("hydra-genetics/cnv_sv", path="workflow/Snakefile", tag="1aa9a68")
config:
config

Expand Down
1 change: 1 addition & 0 deletions workflow/Snakefile_references.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
include: "rules/common_references.smk"
include: "rules/reference_rules.smk"


rule all:
Expand Down
37 changes: 0 additions & 37 deletions workflow/rules/annotation_vep_pindel.smk

This file was deleted.

24 changes: 21 additions & 3 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,10 @@ config = load_resources(config, config["resources"])
validate(config, schema="../schemas/resources.schema.yaml")

### Read and validate samples file

samples = pd.read_table(config["samples"], comment="#").set_index("sample", drop=False)
validate(samples, schema="../schemas/samples.schema.yaml")


### Read and validate units file
units = (
pandas.read_table(config["units"], dtype=str, comment="#")
Expand All @@ -109,9 +109,8 @@ with open(config["output"], "r") as f:
output_spec = yaml.safe_load(f.read())
validate(output_spec, schema="../schemas/output_files.schema.yaml", set_default=True)

### Set wildcard constraints


### Set wildcard constraints
wildcard_constraints:
barcode="[A-Z+]+",
chr="[^_]+",
Expand All @@ -121,4 +120,23 @@ wildcard_constraints:
type="N|T|R",


def get_vcfs_for_svdb_merge(wildcards, add_suffix=False):
vcf_dict = {}
for v in config.get("svdb_merge", {}).get("tc_method"):
tc_method = v["name"]
callers = v["cnv_caller"]
for caller in callers:
if add_suffix:
caller_suffix = f":{caller}"
else:
caller_suffix = ""
if tc_method in vcf_dict:
vcf_dict[tc_method].append(
f"cnv_sv/{caller}_vcf/{wildcards.sample}_{wildcards.type}.{tc_method}.vcf{caller_suffix}"
)
else:
vcf_dict[tc_method] = [f"cnv_sv/{caller}_vcf/{wildcards.sample}_{wildcards.type}.{tc_method}.vcf{caller_suffix}"]
return vcf_dict[wildcards.tc_method]


generate_copy_rules(output_spec)
2 changes: 2 additions & 0 deletions workflow/rules/common_references.smk
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ import yaml
from hydra_genetics.utils.resources import load_resources
from hydra_genetics import min_version as hydra_min_version
from hydra_genetics.utils.misc import replace_dict_variables
from hydra_genetics.utils.samples import *
from hydra_genetics.utils.units import *


include: "results.smk"
Expand Down
33 changes: 0 additions & 33 deletions workflow/rules/fix_af_pindel.smk

This file was deleted.

Loading

0 comments on commit 8aaa171

Please sign in to comment.