-
Notifications
You must be signed in to change notification settings - Fork 0
/
DAISIE.qmd
305 lines (204 loc) · 11.5 KB
/
DAISIE.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
---
title: "DAISIE"
author: Luis Valente
---
## Fitting DAISIE models using the Galápagos birds as example {#sec-daisie}
![Phylogenies of Galápagos birds](images/Galapagos_picture.png)
In this part of the practical, we will learn how to use DAISIE. As a demonstraction, we are going to fit the DAISIE model to phylogenetic data for species of native land birds of the Galápagos islands (data from Valente, Phillimore and Etienne 2015 *Ecology Letters*; and Valente et al 2020 *Nature*). We will use software DAISIE to estimate rates of speciation, colonisation and extinction in the Galápagos. We will simulate islands with these estimated rates to see how species diversity has varied through time in the Galápagos.
### Prepare the R environment
Empty workspace of existing previous objects, just in case.
```{r}
rm(list = ls())
```
### Load the required packages:
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.align = "center",
fig.height = 5,
fig.width = 7
)
```
```{r setup, warning=FALSE,message=FALSE}
library(DAISIE)
library(ape)
```
### Load and visualise Galapágos bird data
The [dataset](data/galapagos_datalist.Rdata) of Galápagos birds includes colonisation and branching times for a total of 27 species of terrestrial birds of the Galápagos islands, distributed across 8 lineages. Some of the lineages have only 1 species, e.g. the Galápagos dove *Zenaida*. Others have radiated, including the Darwin's finch radiation (16 species) and the mockingbirds *Mimus* radiation (4 species). **Note**: the dataset we will use is slightly different from the example dataset that is included as data in the DAISIE R package.
#### Load Galápagos DAISIE datalist
**Download the Galápagos dataset file: [galapagos_datalist.Rdata](data/galapagos_datalist.Rdata)**
Store it locally.
Load it into R, by setting the path to the location of the file on your computer:
```{r , eval=F}
load(file = "PATH_TO_YOUR_DATALIST_FILE")
```
```{r , echo = F}
load(file = "data/galapagos_datalist.Rdata")
```
#### View Galápagos datalist
Just type:
```{r datalist, results='hide'}
galapagos_datalist
```
**This datalist is of the same format that DAISIEprep produces (e.g., same format as the `data_list` object that you just produced in the DAISIEprep tutorial).**
View your Galápagos datalist by scrolling up and down in R. Or use function `View()`
```{r, eval = F }
View(galapagos_datalist)
```
You can find detailed information using the R documentation file:
```{r, results='hide'}
? Galapagos_datalist
```
To view the top of the list
```{r toplist}
galapagos_datalist[[1]]
```
This element contains the island age as well as all the mainland species that do not have extant descendants on the island. These are important pieces of information for DAISIE.
To view just the *Mimus* colonisation:
```{r mimus}
galapagos_datalist[[4]]
```
The first element of the `$branching_times` item is the island age, followed by the colonisation time and the branching times (if any). The `$stac` item is a code that tells DAISIE whether the lineage is endemic, non-endemic, and whether or not a precise age or a maximum-age should be used.
Or to view just the Darwin's finches lineage:
```{r finches}
galapagos_datalist[[3]]
```
As you can see, there is 1 missing species in the Darwin's finch lineage - this species was not sampled in the phylogeny, so does not have a corresponding branching time in the `$branching_times` item. However, `$missing_species` item tells DAISIE that it needs to consider one extra species in its computations.
#### Visualise Galápagos data
You can visualize the data in a different way using:
```{r, echo=TRUE}
DAISIE_plot_island(galapagos_datalist)
```
The above plot shows the 8 colonisation events (independent lineages), the time of colonisation, number of species sampled in each lineage (n), number of missing species (m) and the times of cladogenetic speciation events (horizontal lines). The dashed horizontal line is the island age. Unfilled symbols indicate that the time of colonisation was set as "max-age" (not precise), that is, the colonisation time could have happened any time between the time plotted and the present. This applies for instance to lineages with stem ages older than the island age, whose age is then set to be the age of the island (as was the cases of genus *Mimus*, whose stem age is older than the island age of 4 million years).
#### Plot age versus diversity
To visualise the diversity of each lineage according to their age:
```{r , echo=TRUE}
DAISIE_plot_age_diversity(galapagos_datalist)
```
The above plot shows the number of species per lineage plotted against their colonisation time. You can see that one lineage stands out, it is not the oldest lineage, but it has 16 species, many more than the other lineages (there is no label, but it corresponds to the Darwin's finches). One of the lineages, *Mimus*, is not plotted, because it's age is set as max-age, that is why the number of species is 23 (27-4 species=23).\
\
### Fitting DAISIE models
We will now fit three different DAISIE models using maximum likelihood, using the function `DAISIE_ML`. The DAISIE function searches for the parameters that maximise the likelihood of the data given the specified model. We will then compare the likelihoods of the 3 different models to see which one is preferred
We will fit the 3 different DAISIE models to the phylogenetic data contained in the object `galapagos_datalist`.
\
#### Fit a full DAISIE model (M1)
The first model (M1) we will fit is a full DAISIE model with 5 parameters. The parameters in DAISIE are always placed in this order:
1. Cladogenesis rate (lambda_c) (unit: cladogenesis events per island lineage per time unit)
2. Extinction rate (mu) (unit: extinction events per island lineage per time unit)
3. Carrying capacity (K) (unit: number of species)
4. Colonisation rate (gamma) (unit: colonisation events per mainland lineage per time unit)
5. Anagenesis rate (lambda_a) (unit: anagenesis events per island lineage per time unit)
Important settings of the `DAISIE_ML` function:
- `initparsopt` refers to the starting parameters of the maximum likelihood optimisation (the search for the optimal parameters). 1.5 is the starting cladogenesis rate, 1.1 is the starting extinction rate, 20 is the initial carrying capacity (20 species), 0.009 is the colonisation rate; 1.1 is the initial anagenesis rate.
- `idparsopt` shows the parameters to optimise, in this case all 5, so 1:5
- `parsfix` and `idparsfix` are set to NULL because we are not fixing any parameters this time.
**The code below takes a while to run.**
```{r ML_DAISIE_M1, warning = TRUE, cache = TRUE}
M1_results <- DAISIE_ML(
datalist = galapagos_datalist,
initparsopt = c(1.5,1.1,20,0.009,1.1),
ddmodel = 11,
idparsopt = 1:5,
parsfix = NULL,
idparsfix = NULL
)
M1_results
```
The output gives the ML parameters values, the loglikelihood, and the number of parameters (df).\
#### Fit model with no carrying-capacity (M2)
As you can see above, the value of K in M1 is very high (hundreds of thousands of species). This suggest that there may be no upper bound to diversity (so K is in fact infinite). So let's try a model without K as a free parameter.
The second model (M2) is just like M1, but we will fix K to be infinite. In comparison to M1, we now remove the 3rd parameter (carrying capacity) from the `initparsopt` and `idparsopt` settings. So we now have only parameters 1,2,4 and 5. We fix the parameter number 3 to `Inf` (carrying capacity K fixed to infinite):
```{r ML_DAISIE_M2, warning = TRUE, cache = TRUE}
M2_results <- DAISIE_ML(
datalist = galapagos_datalist,
initparsopt = c(1.5,1.1,0.009,1.1),
idparsopt = c(1,2,4,5),
parsfix = Inf,
idparsfix = 3,
ddmodel = 0
)
M2_results
```
\
#### Fit model with no carrying capacity AND no anagenesis (M3)
Now, it appears that the parameter of anagenesis is close to 0 in M2 (see above). Perhaps a model without anagenesis will perform well.
The third model (M3) is like M2, but with no carrying-capacity and no anagenesis. This time we switch off the 3rd and 5th parameters (carrying-capacity and anagenesis). We fix the 3rd parameter (K) to infinite; and the 5th parameter (anagenesis) to zero:
```{r ML_DAISIE_M3, warning = TRUE, cache = TRUE}
M3_results <- DAISIE_ML(
datalist = galapagos_datalist,
initparsopt = c(1.5,1.1,0.009),
idparsopt = c(1,2,4),
parsfix = c(Inf,0),
idparsfix = c(3,5),
ddmodel = 0
)
M3_results
```
\
#### Select the best model using AIC
Now compare the log-likelihoods of the 3 models. They all have similar likelihoods, but one has fewer parameters, so it is preferred. Which model is it?
A more proper way to compare the models is using AIC (or other information criteria), which takes into account the number of parameters and the loglikelihood.
Create new function to compute AIC values base on the likelihood values and number of parameters for each model.
```{r AICfunction, warning = TRUE}
AIC_compare <- function(LogLik,k){
aic <- (2 * k) - (2 * LogLik)
return(aic)
}
```
Fill in the values from your data in this format:
`AIC_compare(c(likelihood_M1,likelihood_M2,likelihood_M3), c(n_parameters_M1,n_parameters_M2,n_parameters_M3))`.
```{r AICcompare, warning=TRUE}
AICs <- AIC_compare(LogLik = c(M1_results$loglik,M2_results$loglik,M3_results$loglik),
k = c(M1_results$df,M2_results$df,M3_results$df))
names(AICs) <- c('M1','M2','M3')
AICs
```
The model with the **lowest** AIC value is the preferred model. (In this case, M3)
\
### Simulate islands
Now let's use the ML parameters we estimated from the Galápagos bird data to simulate islands. Choose the parameters of the best model. We will simulate for 4 million years (time = 4). M is the mainland pool (keep it at 1000).
#### Simulate islands with the parameters estimated from the best model for the Galápagos bird data (takes a while to run, you can reduce number of replicates if you want)
```{r DAISIE_sim, message=FALSE, warning=FALSE, cache=TRUE}
Galapagos_sims <- DAISIE_sim(
time = 4,
M = 1000,
pars = c(1.26, 1.146, Inf, 0.005,0),
replicates = 100,
plot_sims = FALSE)
```
#### Plot the species-through-time plots resulting from the simulations
```{r, echo = TRUE}
DAISIE_plot_sims(Galapagos_sims)
```
#### Play with different simulations settings
Play with different parameter settings in `DAISIE_sim` (e.g. changing extinction rate, colonisation rate, island age, mainland pool size, replicates). Is equilibrium reached?
\
#### Comparison with bird data from the Azores islands
Now let's briefly compare with simulations for a completely different dataset, of birds from the Azores archipelago (Atlantic Ocean). Load datatable containing Azores bird data from Valente et al 2017 (*Current Biology*)
```{r}
data(Macaronesia_datalist)
Azores <- Macaronesia_datalist[[1]]
```
##### Visualise Azores data
```{r, echo = TRUE}
DAISIE_plot_island(Azores)
```
Three of the species are extinct and only known from fossils.
##### Simulate Azores with pre-identified ML parameters
```{r DAISIE_sim_Azores, message=FALSE, warning=FALSE, cache=TRUE}
Azores_sims <- DAISIE_sim(
time = 6.3,
M = 300,
pars = c(0,1.053151832,Inf,0.052148979,0.512939011),
replicates = 100,
plot_sims = FALSE)
```
\
##### Plot the species-through-time plot for the Azores from the resulting from the simulations
```{r, echo = TRUE}
DAISIE_plot_sims(Azores_sims)
```
How does the Azores plot differ from that of the Galápagos?
\
The end of the DAISIE practical. Well done! Now you are ready to work on the *Insula* exercise