Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module/gatk rnaseq/1.0 #184

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion demo/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ configfile: "../modules/starfish/2.0/config/default.yaml"
configfile: "../modules/sage/1.0/config/default.yaml"
configfile: "../modules/slms_3/1.0/config/default.yaml"
configfile: "../modules/ichorcna/1.0/config/default.yaml"
configfile: "../modules/gatk_rnaseq/1.0/config/default.yaml"

# Load project-specific config, which includes the shared
# configuration and some module-specific config updates
Expand Down Expand Up @@ -77,6 +78,7 @@ include: "../modules/lofreq/1.0/lofreq.smk"
include: "../modules/starfish/2.0/starfish.smk"
include: "../modules/sage/1.0/sage.smk"
include: "../modules/ichorcna/1.0/ichorcna.smk"
include: "../modules/gatk_rnaseq/1.0/gatk_rnaseq.smk"

##### TARGETS ######

Expand All @@ -99,5 +101,6 @@ rule all:
rules._vcf2maf_all.input,
rules._sage_all.input,
rules._slms_3_all.input,
rules._ichorcna_all.input
rules._ichorcna_all.input,
rules._gatk_rnaseq_all.input

6 changes: 6 additions & 0 deletions demo/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,9 @@ lcr-modules:
# include here any additional flags to modify default parameters
options:
sage_run: ""


gatk_rnaseq:
inputs:
sample_bam: "data/{sample_id}.bam"
sample_bai: "data/{sample_id}.bam.bai"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs new line added at the end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still shows no new line here, maybe I am looking at outdated file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I must have missed it. It's fixed

66 changes: 66 additions & 0 deletions modules/gatk_rnaseq/1.0/config/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
lcr-modules:

gatk_rnaseq:

inputs:
# Available wildcards: {seq_type} {genome_build} {sample_id}
sample_bam: "__UPDATE__"
sample_bai: "__UPDATE__"

scratch_subdirectories: []

options:
gatk_variant_calling:
min_conf_thres: 20.0
gatk_variant_filtration:
window: 35 # window size between SNPs in cluster
cluster_size: 3 # at least 3 SNPs in cluster
# hard filtering (filters OUT) based on metrics:
# FS (FisherStrand): Phred-scale probability that there is a strand bias from a Fisher's test. (default FS > 30.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for detailed documentation!

# QD (QualByDepth): variant confidence/unfiltered depth (normalizes variant quality to avoid inflation from deep coverage) (default QD < 2.0)
# DP (depth): minimum depth (default DP < 5.0)
filter_expression: "-filter-expression \"FS > 30.0\" -filter-name FS -filter-expression \"QD < 2.0\" -filter-name QD -filter-expression \"DP < 5.0\" -filter-name DP"
gatk_rnaseq_filter_passed:
params: "-f '.,PASS' "
# Can be modified to filter on additional criteria using bcftools view syntax
# For example, to remove all variants with -log10(POPAF) > 4.0:
#"-f '.,PASS' -i 'INFO/POPAF > 4'"


conda_envs:
gatk_rnaseq: "{MODSDIR}/envs/gatk_rnaseq.yaml"
bcftools: "{MODSDIR}/envs/bcftools-1.10.2.yaml"

threads:
gatk_splitntrim: 12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to combine these keys and reuse them across rules. In other words, if several rules need the same number of threads, they can refer to the same key in config. Same can be applied to resources as well. I think this reduces the number of keys to specify/adjust if needed, and reduces complexity of the config. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but wouldn't it also cause confusion if there's ever a need to change the numbers? Also, would there be a unique name that could be applied to a subset of the rules ("thread_12" would be non-descriptive and wouldn't indicate which rules would use this parameter)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, let's keep the more detailed and informative names

gatk_base_recalibration: 12
gatk_applybqsr: 12
gatk_variant_calling: 24
gatk_variant_filtration: 24
merge_vcfs: 10
gatk_rnaseq_passed: 10

resources:
gatk_splitntrim:
mem_mb: 48000
gatk_base_recalibration:
mem_mb: 12000
gatk_applybqsr:
mem_mb: 12000
gatk_variant_calling:
mem_mb: 48000
bam: 1
gatk_variant_filtration:
mem_mb: 12000
merge_vcfs:
mem_mb: 10000
gatk_rnaseq_passed:
mem_mb: 10000

pairing_config:
mrna:
run_paired_tumours: False
run_unpaired_tumours_with: "no_normal"
run_paired_tumours_as_unpaired: True


36 changes: 36 additions & 0 deletions modules/gatk_rnaseq/1.0/envs/bcftools-1.10.2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: test-bcftools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and another environment file) should be rather in lcr-modules/envs/ folder, and symlinked in the module. This will make it easier to reuse same environments across modules, therefore reducing the need to build multiple environments. There is already an environment for bcftools with the same version, so you can just as well symlink lcr-modules/envs/bcftools/bcftools-1.10.2.yaml to this module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

channels:
- conda-forge
- bioconda
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- bcftools=1.10.2
- bzip2=1.0.8
- c-ares=1.11.0
- ca-certificates=2020.6.20
- gsl=2.6
- htslib=1.10.2
- krb5=1.17.1
- libblas=3.8.0
- libcblas=3.8.0
- libcurl=7.71.1
- libdeflate=1.6
- libedit=3.1.20191231
- libev=4.33
- libgcc-ng=9.3.0
- libgfortran-ng=7.5.0
- libgomp=9.3.0
- libnghttp2=1.41.0
- libopenblas=0.3.10
- libssh2=1.9.0
- libstdcxx-ng=9.3.0
- ncurses=6.2
- openssl=1.1.1g
- perl=5.26.2
- tk=8.6.10
- xz=5.2.5
- zlib=1.2.11
prefix: /home/prasathp/miniconda3/envs/test-bcftools

Loading