Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: revamp somatic cnv calling #558

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

ericblanc20
Copy link
Contributor

CNV calling on WES & WGS data for cnvkit tool only (at the moment)

The new somatic_cnv_calling step will ultimately replace both somatic_wgs_cnv_calling & somatic_targeted_seq_cnv_calling.

The panel_of _normals step had to be adapted to comply with the new cnvkit implementation.

Coding principles

The code is quite involved, due to the complexity of the cnvkit pipeline, but I tried to implement some guiding principles:

  • One wrapper per sub-step.
  • All files required by a wrapper are passed as snakemake.input objects, none as parameters
  • All boilerplate wrapper code in parent class CnvkitWrapper
  • In wrappers, optional parameters are added as "--param {args[param]}" if args.get("param", None) is not None else "", in the formatting command itself. I figured that I want to keep the command as obvious as possible, near the top of the wrapper code. The formatting is tedious, but always the same in spirit, so I thought it is best ignored the formatting statements are near the bottom of the wrapper code.

Overview of the logic

A bit of an overview of the logic might help when reviewing the code.

  • 4 panel of normals possibilities, 2 are real p.o.n. (from the cohort, which connects the the panel_of_normals step output & from user-defined files). The 2 others are a flat reference, or a reference build from the paired normal sample only.
  • The different regions are:
    • The genome, which is the full reference sequence except from locii masked with Ns.
    • access, which is the regions accessible to mapping, i.e. the genome minus some excluded region(s) (low mappability, extreme GC content, ...).
    • The baits, which are the regions enriched by the exome kit (valid only for WES).
    • The targets, which are baits (WES) or access (WGS), post-processed to split large regions into smaller bins, and adding annotations.
    • The antitargets, which are the regions accessible to mapping, but with low coverage, because not enriched. This is only valid for WES data, and in that case it is access - targets.
  • In WGS mode, access is computed from genome, unless provided by the user. Target are then computed from access, with an automatically selected average bin width if access = genome, unless this value is provided by the user. Note that for paired p.o.n., there is one target file per sample, because the optimal bin width is computed separately for each normal sample.

…, fix path in cnvkit wrappers, better handling of arguments & fix method to obtain input, parameters & output from snakemake
…cnvkit (as much as possible), added to documentation
@ericblanc20 ericblanc20 linked an issue Dec 6, 2024 that may be closed by this pull request
@ericblanc20 ericblanc20 changed the title 535 revamp somatic cnv calling feat: revamp somatic cnv calling Dec 6, 2024
@ericblanc20 ericblanc20 requested a review from tedil December 6, 2024 13:29
ericblanc20 and others added 18 commits December 6, 2024 14:36
…, fix path in cnvkit wrappers, better handling of arguments & fix method to obtain input, parameters & output from snakemake
…cnvkit (as much as possible), added to documentation
Copy link

github-actions bot commented Dec 6, 2024

  • Please format your Python code with ruff: make fmt
  • Please check your Python code with ruff: make check
  • Please format your Snakemake code with snakefmt: make snakefmt

You can trigger all lints locally by running make lint

@coveralls
Copy link

Coverage Status

coverage: 85.948% (+0.2%) from 85.77%
when pulling 835acd8 on 535-revamp-somatic-cnv-calling
into 7239fd0 on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revamp somatic CNV calling
3 participants