Skip to content

Commit

Permalink
updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
gtonkinhill committed Mar 26, 2020
1 parent cc68ed9 commit 4b70f3f
Show file tree
Hide file tree
Showing 7 changed files with 105 additions and 8 deletions.
40 changes: 40 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,20 @@ If you would like to also build the vignette with your installation run:
devtools::install_github("gtonkinhill/fastbaps", build_vignettes = TRUE)
```

### Conda

`fastbaps` can also be installed using Conda

```
conda install -c conda-forge -c bioconda -c defaults r-fastbaps
```

## Choice of Prior

Fastbaps includes a number of options for the Dirichlet prior hyperparamters. These range in order from most conservative to least as `symmetric`, `baps`, `optimised.symmetric` and `optimised.baps`. The choice of prior can be set using the `optimise_prior` function.

It is also possible to condition on a pre-existing phylogeny, which allows a user to partition the phylogeny using the fastbaps algorithm. This is described in more detail further down in the introduction.

## Quick Start

Run fastbaps.
Expand All @@ -58,6 +72,32 @@ baps.hc <- fast_baps(sparse.data)
clusters <- best_baps_partition(sparse.data, as.phylo(baps.hc))
```

All these steps can be combined and the algorithm run over multiple levels by running

```{r}
sparse.data <- optimise_prior(sparse.data, type = "optimise.symmetric")
multi <- multi_res_baps(sparse.data)
```

## Command Line Script

The fastbaps package now includes a command line script. The location of this script can be found by running

```{r, eval=FALSE}
system.file("run_fastbaps", package = "fastbaps")
```

This script can then be copied to a location on the users path. If you have installed fastbaps using conda, this will already have been done for you.

## Citation

To cite fastbaps please use

> Tonkin-Hill,G., Lees,J.A., Bentley,S.D., Frost,S.D.W. and Corander,J. (2019) Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res., 10.1093/nar/gkz361.

## Introduction

```{r, echo = FALSE}
intro_rmd <- 'vignettes/introduction.Rmd'
Expand Down
59 changes: 51 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,19 @@ If you would like to also build the vignette with your installation run:
devtools::install_github("gtonkinhill/fastbaps", build_vignettes = TRUE)
```

### Conda

`fastbaps` can also be installed using Conda

conda install -c conda-forge -c bioconda -c defaults r-fastbaps

Choice of Prior
---------------

Fastbaps includes a number of options for the Dirichlet prior hyperparamters. These range in order from most conservative to least as `symmetric`, `baps`, `optimised.symmetric` and `optimised.baps`. The choice of prior can be set using the `optimise_prior` function.

It is also possible to condition on a pre-existing phylogeny, which allows a user to partition the phylogeny using the fastbaps algorithm. This is described in more detail further down in the introduction.

Quick Start
-----------

Expand All @@ -47,6 +60,35 @@ clusters <- best_baps_partition(sparse.data, as.phylo(baps.hc))
#> [1] "Finding best partition..."
```

All these steps can be combined and the algorithm run over multiple levels by running

``` r
sparse.data <- optimise_prior(sparse.data, type = "optimise.symmetric")
#> [1] "Optimised hyperparameter: 0.02"
multi <- multi_res_baps(sparse.data)
```

Command Line Script
-------------------

The fastbaps package now includes a command line script. The location of this script can be found by running

``` r
system.file("run_fastbaps", package = "fastbaps")
```

This script can then be copied to a location on the users path. If you have installed fastbaps using conda, this will already have been done for you.

Citation
--------

To cite fastbaps please use

> Tonkin-Hill,G., Lees,J.A., Bentley,S.D., Frost,S.D.W. and Corander,J. (2019) Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res., 10.1093/nar/gkz361.
Introduction
------------

The fast BAPS algorithm is based on applying the hierarchical Bayesian clustering (BHC) algorithm of (Heller and Ghahramani 2005) to the problem of clustering genetic sequences using the same likelihood as BAPS (Cheng et al. 2013). The Bayesian hierarchical clustering can be initiated with sequences as individual clusters or by running a faster conventional hierarchical clustering initially followed by BHC of the resulting clusters.

The algorithm has been written to take advantage of fast sparse matrix libraries and is able to handle 1000's of sequences and 100,000's of SNPs in under an hour on a laptop using a single core.
Expand Down Expand Up @@ -117,7 +159,7 @@ f2 <- facet_plot(gg, panel = "fastbaps", data = plot.df, geom = geom_tile, aes(x
f2
```

![](inst/vignette-supp/unnamed-chunk-12-1.png)
![](inst/vignette-supp/unnamed-chunk-14-1.png)

We can compare this result to other priors, the un-optimised symmetric or BAPS prior similar to STRUCTURE and hierBAPS, an optimised BAPS prior or the population mean based prior of Heller et al.

Expand All @@ -141,7 +183,7 @@ f2 <- facet_plot(gg, panel = "fastbaps", data = plot.df, geom = geom_tile, aes(x
f2
```

![](inst/vignette-supp/unnamed-chunk-13-1.png)
![](inst/vignette-supp/unnamed-chunk-15-1.png)

we can also use the same prior as used in the BHC algorithm of Heller et al. However this tends to overpartition population genetic data.

Expand All @@ -166,7 +208,7 @@ f2 <- facet_plot(gg, panel = "fastbaps", data = plot.df, geom = geom_tile, aes(x
f2
```

![](inst/vignette-supp/unnamed-chunk-14-1.png)
![](inst/vignette-supp/unnamed-chunk-16-1.png)

we can also investigate multiple levels

Expand All @@ -186,7 +228,7 @@ f2 <- facet_plot(f2, panel = "fastbaps level 2", data = plot.df, geom = geom_til
f2
```

![](inst/vignette-supp/unnamed-chunk-15-1.png)
![](inst/vignette-supp/unnamed-chunk-17-1.png)

We can also partition an initial hierarchy or phylogeny.

Expand All @@ -198,16 +240,15 @@ best.partition <- best_baps_partition(sparse.data, iqtree.rooted)
#> [1] "Calculating node marginal llks..."
#> [1] "Finding best partition..."

plot.df <- data.frame(id = iqtree.rooted$tip.label, fastbaps = best.partition,
stringsAsFactors = FALSE)
plot.df <- data.frame(id = iqtree.rooted$tip.label, fastbaps = best.partition, stringsAsFactors = FALSE)

gg <- ggtree(iqtree.rooted)
f2 <- facet_plot(gg, panel = "fastbaps", data = plot.df, geom = geom_tile, aes(x = fastbaps),
color = "blue")
f2
```

![](inst/vignette-supp/unnamed-chunk-16-1.png)
![](inst/vignette-supp/unnamed-chunk-18-1.png)

finally we can also look at the stability of the inferred clusters using the Bootstrap

Expand All @@ -222,7 +263,7 @@ dendro <- as.dendrogram(fast_baps(sparse.data))
gplots::heatmap.2(boot.result, dendro, dendro, tracecol = NA)
```

![](inst/vignette-supp/unnamed-chunk-17-1.png)
![](inst/vignette-supp/unnamed-chunk-19-1.png)

References
----------
Expand All @@ -237,4 +278,6 @@ Paradis, Emmanuel, Julien Claude, and Korbinian Strimmer. 2004. “APE: Analyses

Revell, Liam J. 2012. “Phytools: An R Package for Phylogenetic Comparative Biology (and Other Things).” *Methods Ecol. Evol.* 3 (2). Blackwell Publishing Ltd: 217–23. doi:[10.1111/j.2041-210X.2011.00169.x](https://doi.org/10.1111/j.2041-210X.2011.00169.x).

Tonkin-Hill, Gerry, John A Lees, Stephen D Bentley, Simon D W Frost, and Jukka Corander. 2019. “Fast Hierarchical Bayesian Analysis of Population Structure.” *Nucleic Acids Res.*, May. doi:[10.1093/nar/gkz361](https://doi.org/10.1093/nar/gkz361).

Yu, Guangchuang, David K Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” *Methods Ecol. Evol.* 8 (1): 28–36. doi:[10.1111/2041-210X.12628](https://doi.org/10.1111/2041-210X.12628).
Binary file modified inst/vignette-supp/unnamed-chunk-14-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified inst/vignette-supp/unnamed-chunk-15-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified inst/vignette-supp/unnamed-chunk-16-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified inst/vignette-supp/unnamed-chunk-17-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions vignettes/bibliography.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
@ARTICLE{Tonkin-Hill2019-lc,
title = "Fast hierarchical Bayesian analysis of population structure",
author = "Tonkin-Hill, Gerry and Lees, John A and Bentley, Stephen D and
Frost, Simon D W and Corander, Jukka",
journal = "Nucleic Acids Res.",
month = may,
year = 2019,
url = "http://dx.doi.org/10.1093/nar/gkz361",
language = "en",
issn = "0305-1048, 1362-4962",
pmid = "31076776",
doi = "10.1093/nar/gkz361"
}

@ARTICLE{Cheng2013-mp,
title = "Hierarchical and spatially explicit clustering of {DNA}
sequences with {BAPS} software",
Expand Down

0 comments on commit 4b70f3f

Please sign in to comment.