Skip to content

Commit

Permalink
Merge pull request #50 from CenterForMedicalGeneticsGhent/leraman-pat…
Browse files Browse the repository at this point in the history
…ch-1

Manual fixes
  • Loading branch information
matthdsm authored Oct 31, 2019
2 parents a018b18 + 303660b commit 0a8eb52
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 25 deletions.
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ including [WISECONDOR](https://github.com/VUmcCGP/wisecondor), [QDNAseq](https:/
[cn.MOPS](https://bioconductor.org/packages/release/bioc/html/cn.mops.html),
WISECONDOR appeared to normalize sequencing data in the most consistent way, as shown by
[our paper](https://www.ncbi.nlm.nih.gov/pubmed/30566647). Nevertheless, WISECONDOR has limitations:
Stouffer's z-score approach is error-prone when dealing with large amounts of aberrations, the algorithm
Stouffer's Z-score approach is error-prone when dealing with large amounts of aberrations, the algorithm
is extremely slow (24h) when operating at small bin sizes (15 kb), and sex chromosomes are not part of the analysis.
Here, we present WisecondorX, an evolved WISECONDOR that aims at dealing with previous difficulties, resulting
in overall superior results and significantly lower computing times, allowing daily diagnostic use. WisecondorX is
Expand All @@ -24,18 +24,18 @@ requires low-quality reads to distinguish informative bins from non-informative

### Installation

Stable releases can be installed using [Conda](https://conda.io/docs/). This option takes care of all necessary
dependencies.
Stable releases can be installed through pip install. This option ascertains the latest version is
downloaded, however, it does not install R [dependencies](#dependencies).
```bash

conda install -f -c conda-forge -c bioconda wisecondorx
pip install -U git+https://github.com/CenterForMedicalGeneticsGhent/WisecondorX
```

Alternatively, WisecondorX can be installed through pip install. This option ascertains the latest version is
downloaded, yet it does not install R dependencies.
Alternatively, [Conda](https://conda.io/docs/) additionally installs all necessary [depedencies](#dependencies),
however, the latest version might not be downloaded.
```bash

pip install -U git+https://github.com/CenterForMedicalGeneticsGhent/WisecondorX
conda install -f -c conda-forge -c bioconda wisecondorx
```

### Running WisecondorX
Expand Down Expand Up @@ -66,6 +66,7 @@ WisecondorX convert input.bam output.npz [--optional arguments]
<br>Optional argument <br><br> | Function
:--- | :---
`--binsize x` | Size per bin in bp; the reference bin size should be a multiple of this value. Note that this parameter does not impact the resolution, yet it can be used to optimize processing speed (default: x=5e3)
`--normdup` | Use this flag to avoid duplicate removal


&rarr; Bash recipe at `./pipeline/convert.sh`
Expand Down Expand Up @@ -98,15 +99,15 @@ WisecondorX predict test_input.npz reference_input.npz output_id [--optional arg
:--- | :---
`--minrefbins x` | Minimum amount of sensible reference bins per target bin; should generally not be tweaked (default: x=150)
`--maskrepeats x` | Bins with distances > mean + sd * 3 in the reference will be masked. This parameter represents the number of masking cycles and defines the stringency of the blacklist (default: x=5)
`--zscore x` | z-score cutoff to call segments as aberrations (default: x=5)
`--alpha x` | p-value cutoff for calling a circular binary segmentation breakpoints (default: x=1e-4)
`--zscore x` | Z-score cutoff to call segments as aberrations (default: x=5)
`--alpha x` | P-value cutoff for calling circular binary segmentation breakpoints (default: x=1e-4)
`--beta x` | When beta is given, `--zscore` is ignored. Beta sets a ratio cutoff for aberration calling. It's a number between 0 (liberal) and 1 (conservative) and, when used, is optimally close to the purity (e.g. fetal/tumor fraction)
`--blacklist x` | Blacklist that masks additional regions in output; requires headerless .bed file. This is particularly useful when the reference set is a too small to recognize some obvious loci (such as centromeres; example at `./example.blacklist/centromere.hg38.txt`) (no default)
`--gender x` | Force WisecondorX to analyze this case as a male (M) or female (F). Useful when e.g. dealing with a loss of chromosome Y, which causes erroneous gender predictions (choices: x=F or x=M)
`--blacklist x` | Blacklist for masking additional regions; requires headerless .bed file. This is particularly useful when the reference set is too small to recognize some obvious loci (such as centromeres; example at `./example.blacklist/centromere.hg38.txt`) (no default)
`--gender x` | Force WisecondorX to analyze this case as male (M) or female (F). Useful when e.g. dealing with a loss of chromosome Y, which causes erroneous gender predictions (choices: x=F or x=M)
`--bed` | Outputs tab-delimited .bed files (trisomy 21 NIPT example at `./example.bed`), containing all necessary information **(\*)**
`--plot` | Outputs custom .png plots (trisomy 21 NIPT example at `./example.plot`), directly interpretable **(\*)**
`--ylim [a,b]` | Force WisecondorX to use y-axis interval [a,b] during plotting, e.g. [-2,2]
`--ciaro` | Some operating systems require the cairo bitmap type to write plots
`--cairo` | Some operating systems require the cairo bitmap type to write plots

<sup>**(\*)** At least one of these output formats should be selected</sup>

Expand All @@ -119,7 +120,7 @@ WisecondorX predict test_input.npz reference_input.npz output_id [--optional arg
WisecondorX gender test_input.npz reference_input.npz
```

Returns gender.
Returns gender according to the reference.

# Parameters

Expand All @@ -131,9 +132,9 @@ sizes ranging from 50 to 500 kb.
To understand the underlying algorithm, I highly recommend reading
[Straver et al (2014)](https://www.ncbi.nlm.nih.gov/pubmed/24170809); and its update shortly introduced in
[Huijsdens-van Amsterdam et al (2018)](https://www.nature.com/articles/gim201832.epdf). Numerous adaptations to this
algorithm have been made, yet the central principles remain. Changes include e.g. the inclusion of a gender
algorithm have been made, yet the central normalization principles remain. Changes include e.g. the inclusion of a gender
prediction algorithm, gender handling prior to normalization (ultimately enabling X and Y predictions), between-sample
z-scoring, inclusion of a weighted circular binary segmentation algorithm and improved codes for obtaining tables and
Z-scoring, inclusion of a weighted circular binary segmentation algorithm and improved codes for obtaining tables and
plots.

# Interpretation results
Expand All @@ -142,13 +143,13 @@ plots.

Every dot represents a bin. The dots range across the X-axis from chromosome 1 to X (or Y, in case of a male). The
vertical position of a dot represents the ratio between the observed number of reads and the expected number of reads,
the latter being the 'healthy' state. As these values are log2-transformed, 'healthy dots' should be close-to 0.
the latter being the 'normal' state. As these values are log2-transformed, copy neutral dots should be close-to 0.
Importantly, notice that the dots are always subject to Gaussian noise. Therefore, segments, indicated by horizontal
grey bars, cover bins of predicted equal copy number. The size of the dots represent the variability at the reference
set. Thus, the size increases with the certainty of an observation. The same goes for the line width of segments.
Vertical grey bars represent the blacklist, which will match hypervariable loci and repeats. Finally, the horizontal
white lines, cover bins of predicted equal copy number. The size of the dots represents the variability at the reference
set. Thus, the size increases with the certainty of an observation. The same goes for the line width of the segments.
Vertical grey bars represent the blacklist, which matches mostly hypervariable loci and repeats. Finally, the horizontal
colored dotted lines show where the constitutional 1n and 3n states are expected (when constitutional DNA is at 100%
purity). Often, an aberration does not surpass these limits, which has several potential causes: depending on your type
purity). Often, an aberration does not reach these limits, which has several potential causes: depending on your type
of analysis, the sample could be subject to tumor fraction, fetal fraction, a mosaicism, ... etc. Sometimes, the
segments do surpass these limits: here it's likely you are dealing with 0n, 4n, 5n, 6n, ...

Expand All @@ -161,7 +162,7 @@ The Z-scores are calculated as default using the within-sample reference bins as

### ID_segments.bed

This file contains all segment-wise information. A combined Z-score is calculated using a between-sample z-scoring
This file contains all segment-wise information. A combined Z-score is calculated using a between-sample Z-scoring
technique (the test case vs the reference cases).

### ID_aberrations.bed
Expand All @@ -171,7 +172,7 @@ This file contains aberrant segments, defined by the [`--beta`](#stage-3-predict

### ID_chr_statistics.bed

This file contains some interesting statistics for each chromosome. The definition of the z-scores matches the one from
This file contains some interesting statistics for each chromosome. The definition of the Z-scores matches the one from
the 'ID_segments.bed'. Particularly interesting for NIPT.

# Dependencies
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#! /usr/bin/env python
from setuptools import setup, find_packages

version = '1.1.4'
version = '1.1.5'
dl_version = 'master' if 'dev' in version else '{}'.format(version)

setup(
Expand Down
4 changes: 2 additions & 2 deletions wisecondorX/convert_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def convert_bam(args):
if not read.is_proper_pair:
reads_pairf += 1
continue
if larp == read.pos and larp2 == read.next_reference_start:
if not args.normdup and larp == read.pos and larp2 == read.next_reference_start:
reads_rmdup += 1
else:
if read.mapping_quality >= 1:
Expand All @@ -66,7 +66,7 @@ def convert_bam(args):
reads_seen += 1
larp = read.pos
else:
if larp == read.pos:
if not args.normdup and larp == read.pos:
reads_rmdup += 1
else:
if read.mapping_quality >= 1:
Expand Down
3 changes: 3 additions & 0 deletions wisecondorX/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,9 @@ def main():
type=float,
default=5e3,
help='Bin size (bp)')
parser_convert.add_argument('--normdup',
action='store_true',
help='Do not remove duplicates')
parser_convert.set_defaults(func=tool_convert)

parser_newref = subparsers.add_parser('newref',
Expand Down

0 comments on commit 0a8eb52

Please sign in to comment.