-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.qmd
155 lines (109 loc) · 4.29 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
format: gfm
knitr:
opts_chunk:
fig.path: "man/figures/"
---
<!--[![](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)-->
[![R CI](https://github.com/USCbiostats/aphylo/actions/workflows/ci.yml/badge.svg)](https://github.com/USCbiostats/aphylo/actions/workflows/ci.yml)
[![Build status](https://ci.appveyor.com/api/projects/status/1xpgpv10yifojyab?svg=true)](https://ci.appveyor.com/project/gvegayon/phylogenetic)
[![Coverage Status](https://codecov.io/gh/USCbiostats/aphylo/branch/master/graph/badge.svg)](https://codecov.io/gh/USCbiostats/aphylo) [![Integrative Methods of Analysis for Genetic Epidemiology](https://raw.githubusercontent.com/USCbiostats/badges/master/tommy-image-badge.svg)](https://image.usc.edu)
[![CRAN status](https://www.r-pkg.org/badges/version/aphylo)](https://CRAN.R-project.org/package=aphylo)
[![status](https://tinyverse.netlify.com/badge/aphylo)](https://CRAN.R-project.org/package=aphylo)
[![CRAN downloads](http://cranlogs.r-pkg.org/badges/grand-total/aphylo)](https://cran.r-project.org/package=aphylo)
# aphylo: Statistical Inference of Annotated Phylogenetic Trees <img src="man/figures/logo.svg" align="right" width="180px"/>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
The `aphylo` R package implements estimation and data imputation methods for Functional Annotations in Phylogenetic Trees. The core function consists of the log-likelihood computation of observing a given phylogenetic tree with functional annotation on its leaves and the probabilities associated to gain and loss of function, including probabilities of experimental misclassification. The log-likelihood is computed using peeling algorithms, which required developing and implementing efficient algorithms for re-coding and preparing phylogenetic tree data to be used with the package. Finally, `aphylo` works smoothly with popular tools for analysis of phylogenetic data such as `ape` R package, "Analyses of Phylogenetics and Evolution."
The package is under MIT License and is developed by the Computing and Software Cores of the Biostatistics Division's NIH Project Grant (P01) at the Department of Preventive Medicine at the University of Southern California.
## Citation
```{r, comment = ""}
citation(package="aphylo")
```
## Install
This package depends on another on-development R package, the [`fmcmc`](https://github.com/USCbiostats/fmcmc). So first, you need to install it:
```r
devtools::install_github("USCbiostats/fmcmc")
```
Then you can install the `aphylo` package
```r
devtools::install_github("USCbiostats/aphylo")
```
## Reading data
```{r Loading the package}
library(aphylo)
```
```{r Reading Data}
# This datasets are included in the package
data("fakeexperiment")
data("faketree")
head(fakeexperiment)
head(faketree)
```
```{r get-offspring}
O <- new_aphylo(
tip.annotation = fakeexperiment[,2:3],
tree = as.phylo(faketree)
)
O
as.phylo(O)
# We can visualize it
plot(O)
plot_logLik(O)
```
## Simulating annotated trees
```{r}
set.seed(198)
dat <- raphylo(
50,
P = 1,
psi = c(0.05, 0.05),
mu_d = c(0.8, 0.3),
mu_s = c(0.1, 0.1),
Pi = .4
)
dat
```
## Likelihood
```{r Computing likelihood}
# Parameters and data
psi <- c(0.020,0.010)
mu_d <- c(0.40,.10)
mu_s <- c(0.04,.01)
eta <- c(.7, .9)
pi_root <- .05
# Computing likelihood
str(LogLike(dat, psi = psi, mu_d = mu_d, mu_s = mu_s, eta = eta, Pi = pi_root))
```
# Estimation
```{r MLE}
# Using L-BFGS-B (MLE) to get an initial guess
ans0 <- aphylo_mle(dat ~ psi + mu_d + Pi + eta)
# MCMC method
ans2 <- aphylo_mcmc(
dat ~ mu_d + mu_s + Pi,
prior = bprior(c(9, 1, 1, 1, 5), c(1, 9, 9, 9, 5)),
control = list(nsteps=5e3, burnin=500, thin=10, nchains=2))
ans2
plot(
ans2,
nsample = 200,
loo = TRUE,
ncores = 2L
)
```
```{r MCMC}
# MCMC Diagnostics with coda
library(coda)
gelman.diag(ans2$hist)
plot(ans2$hist)
```
# Prediction
```{r Predict}
pred <- prediction_score(ans2, loo = TRUE)
pred
plot(pred)
```
# Misc
During the development process, we decided to allow the user to choose what 'tree-reader' function he would use, particularly between using either the **rncl** R package or ape. For such, we created a short benchmark that compares both functions [here](playground/ape_now_supports_singletons.md).