Skip to content

Commit

Permalink
Added ratio-metric examples to vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
ctokheim committed May 17, 2016
1 parent d0c8384 commit 963bdb2
Showing 1 changed file with 73 additions and 8 deletions.
81 changes: 73 additions & 8 deletions vignettes/cancerSeqStudy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,13 @@ The `*RequiredSampleSize` functions (\*="smg" or "ratiometric" followed by "Bbd"

In the situation where there is additional unaccounted variability in the mutation rate not captured by the model, then it is expected there will be inflated false positives. To evaluate the expected number of false positives, a binomial model is compared with a beta-binomial with a certain level of residual uncertainty in the mutation rate. The beta-binomial represents the actual true variation, while the binomial model represents that utilized for a SMG analysis. In this scenario the critical value establishing the threshold for statistical significance is established by the binomial model, and the probability that a beta-binomial reaches this baseline is calculated. Expected false positives for ratio-metric methods are computed similarly, except the variable which has variablitity is `P` rather than the mutation rate. Assuming a total number of genes (18,500 by default), the expected number of false positive significantly mutated genes is simply the probability times the number of genes.

## Sample Size Calculation
## SMG method

### Sample Size Calculation

An important aspect of designing cancer exome seqeuncing studies is to determine how many cancer samples are required for sufficient power to detect driver genes present at a certain prevalence.

### Assuming an exact mutation rate
#### Assuming an exact mutation rate

In general the mutation rate is not precisely known, but could be assumed to be known for the sake of power calculations. This results in an overly optimistic assessment of the required number of cancer samples. In the known mutation rate scenario, an exact binomial power calculation is performed.

Expand All @@ -70,12 +72,12 @@ samp.sizes <- seq(100, 4000, by=100)
desired.power <- .9
eff.size <- .02 # fraction of samples above background
mut.rate <- 1e-5
alpha.level <- 5e-6 # roughly a bonferoni corrected significance level
signif.level <- 5e-6 # roughly a bonferoni corrected significance level
smgBinomRequiredSampleSize(desired.power, mut.rate, samp.sizes, eff.size, alpha.level)
smgBinomRequiredSampleSize(desired.power, mut.rate, samp.sizes, eff.size, signif.level)
```

### Accounting for uncertain mutation rate
#### Accounting for uncertain mutation rate

Adjusting for uncertainty in mutation rate better represents the actual required number of sequenced samples. To handle uncertain mutation rates (with a certain coefficient of variation), a beta-binomial power test is performed.

Expand All @@ -88,7 +90,7 @@ smgBbdRequiredSampleSize(desired.power, mut.rate, cv, samp.sizes, eff.size)

Notice the minimum required samples raised from 1,500 to 3,500 by accounting for uncertainty of mutation rate with a coefficient of variation of .2.

## Calculating powered effect size
### Calculating powered effect size

If you already have a certain number of samples, often it is helpful to understand the extent to which rare significantly mutated genes are characterized.

Expand All @@ -97,12 +99,12 @@ If you already have a certain number of samples, often it is helpful to understa
possible.eff.sizes <- seq(.01, .2, by=.01) # fraction of samples above background
num.samples <- 1000 # number of samples in study
smgBbdPoweredEffectSize(possible.eff.sizes, desired.power, mut.rate, cv, num.samples, alpha.level)
smgBbdPoweredEffectSize(possible.eff.sizes, desired.power, mut.rate, cv, num.samples, signif.level)
```

In this example, drivers present in 4% of samples above background mutation rate have sufficient power to be detected.

## Expected false positives
### Expected false positives

```{r, fig.show='hold'}
bbd.params <- rateCvToAlphaBeta(mut.rate, cv)
Expand All @@ -111,6 +113,69 @@ smg.binom.false.pos(bbd.params$alpha, bbd.params$beta, samp.sizes)

Here, the total number of genes (`num.genes`) was left at the default of 18,500, and likewise for the significance level (5e-6) and effective gene length (1500*3/4).

## Ratio-metric method

### Sample Size Calculation

Since ratio-metric methods represent an alternative computational approach, they may have differences in statistical power compared to SMG methods.

#### Assuming an exact proportion

The expected background proportion out of total mutations in a gene that fall into a specific type is generally more stable than mutation rate, e.g. presumed inactivating mutations (nonsense, splice site, lost stop/start, and frameshift indels). In the fixed mutation proportion scenario, an exact binomial power calculation is performed.

```{r, fig.show='hold'}
# setup parameters
samp.sizes <- seq(100, 1000, by=10)
desired.power <- .9
eff.size <- .02 # fraction of samples above background
mut.rate <- 1e-5
signif.level <- 5e-6 # roughly a bonferoni corrected significance level
p <- .107 # 10.7% is approximately the percentage of inactivating mutations
ratiometricBinomRequiredSampleSize(p, desired.power, samp.sizes, mut.rate,
eff.size, signif.lvl=signif.level)
```

#### Accounting for uncertainty

Adjusting for uncertainty in the proprotion `P` can represent heterogeneity in acquiring mutations. To handle uncertainty in `P` (with a certain coefficient of variation), a beta-binomial power test is performed.

```{r, fig.show='hold'}
# setup parameters
cv <- .2 # coefficient of variation for ratio-metric proportion
ratiometricBbdRequiredSampleSize(p, cv, desired.power, samp.sizes,
mut.rate, eff.size)
```

Notice the minimum required samples raised from 770 to 890 by accounting for uncertainty of proportion of inactivating mutations with a coefficient of variation of .2. Both numbers are considerably lower than an approach based on mutation rate (1500 and 3500, respectively).

### Calculating powered effect size

If you already have a certain number of samples, often it is helpful to understand the extent to which rare significantly mutated genes are characterized.

```{r, fig.show='hold'}
# setup parameters
possible.eff.sizes <- seq(.01, .1, by=.005) # fraction of samples above background
num.samples <- 600 # number of samples in study
ratiometricBbdPoweredEffectSize(possible.eff.sizes, desired.power, p, cv,
mut.rate, num.samples, signif.level)
```

In this example, drivers present with the ratio-metric feature of interest at 3% of samples above background mutation rate have sufficient power to be detected with 600 samples.

### Expected false positives

```{r, fig.show='hold'}
bbd.params <- rateCvToAlphaBeta(p, cv)
samp.sizes <- seq(200, 4000, by=200)
ratiometric.binom.false.pos(bbd.params$alpha, bbd.params$beta,
samp.sizes, mut.rate)
```

Here, the total number of genes (`num.genes`) was left at the default of 18,500, and likewise for the significance level (5e-6) and effective gene length (1500*3/4).

## Systematically examining power and false postives

To fully understand the effects on power and false positives, a variable sweep over a grid of potential values can be done. This is best done in parallel on a server with multiple cores. Reducing the number of evaluate mutation rates or the effective number of sample sizes evaluated will substantially increase speed, but will provide lower resolution on the shape of statistical power and false positives. One approach is to download the source files from github and run cancerSeqStudy.R as a script. The following command runs the analysis for significantly mutated gene approaches.
Expand Down

0 comments on commit 963bdb2

Please sign in to comment.