forked from microbiome/OMA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
14_alpha_diversity.Rmd
257 lines (193 loc) · 9.13 KB
/
14_alpha_diversity.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# Community diversity {#community-diversity}
```{r setup, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble()
```
Diversity estimates are a central topic in microbiome data analysis.
There are three commonly employed levels of diversity measurements,
which are trying to put a number on different aspects of the questions
associated with diversity [@Whittaker1960].
Many different ways for estimating such diversity measurements have been
described in the literature. Which measurement is best or applicable for your
samples, is not the aim of the following sections.
```{r load-pkg-data}
library(mia)
data("GlobalPatterns", package="mia")
tse <- GlobalPatterns
```
**_Alpha diversity_**, also sometimes interchangeably used with the
term **_species diversity_**, summarizes the distribution of species
abundances in a given sample into a single number that depends on
species richness and evenness. Diversity indices measure the overall
community heterogeneity. A number of ecological diversity measures are
available. The Hill coefficient combines many standard indices into a
single equation that provides observed richness, inverse Simpson, and
Shannon diversity, and generalized diversity as special cases. In
general, diversity increases together with increasing richness and
evenness. Sometimes richness, phylogenetic diversity, evenness, dominance,
and rarity are considered to be variants of alpha diversity.
**Richness** refers to the total number of species in a community
(sample). The simplest richness index is the number of observed
species (observed richness). Assuming limited sampling from the
community, however, this may underestimate the true species
richness. Several estimators are available, including for instance ACE
[@Chao1992] and Chao1 [@Chao1984]. Richness estimates are unaffected
by species abundances.
**Phylogenetic diversity** was first proposed by [@Faith1992]. Unlike the
diversity measures mentioned above, Phylogenetic diversity (PD)
measure incorporates information from phylogenetic relationships
stored in `phylo` tree between species in a community (sample). The
Faith's PD is calculated as the sum of branch length of all species in
a community (sample).
**Evenness** focuses on species abundances, and can thus complement
the number of species. A typical evenness index is the Pielou's
evenness, which is Shannon diversity normalized by the observed
richness.
**Dominance** indices are in general negatively correlated with
diversity, and sometimes used in ecological literature. High
dominance is obtained when one or few species have a high share of
the total species abundance in the community.
**Rarity** indices characterize the concentration of taxa at low abundance.
Prevalence and detection thresholds determine rare taxa whose total concentration
is represented as a rarity index.
## Estimation
Alpha diversity can be estimated with wrapper functions that interact
with other packages implementing the calculation, such as _`vegan`_
[@R-vegan].
### Richness {#richness}
Richness gives the number of features present within a community and can be calculated with `estimateRichness`. Each of the estimate diversity/richness/evenness/dominance functions adds the calculated measure(s) to the `colData` of the `SummarizedExperiment` under the given column `name`. Here, we calculate `observed` features as a measure of richness.
```{r}
tse <- mia::estimateRichness(tse,
assay_name = "counts",
index = "observed",
name="observed")
head(colData(tse)$observed)
```
This allows access to the values to be analyzed directly from the `colData`, for example
by plotting them using `plotColData` from the _`scater`_ package [@R-scater].
```{r plot-div-shannon, message=FALSE, fig.cap="Shannon diversity estimates plotted grouped by sample type with colour-labeled barcode."}
library(scater)
plotColData(tse,
"observed",
"SampleType",
colour_by = "Final_Barcode") +
theme(axis.text.x = element_text(angle=45,hjust=1)) +
ylab(expression(Richness[Observed]))
```
### Diversity {#estimate-diversity}
The main function, `estimateDiversity`, calculates the selected
diversity index based on the selected assay data.
```{r estimate-shannon}
tse <- mia::estimateDiversity(tse,
assay_name = "counts",
index = "shannon",
name = "shannon")
head(colData(tse)$shannon)
```
Alpha diversities can be visualized with boxplot. Here, Shannon index is compared
between different sample type groups. Individual data points are visualized by
plotting them as points with `geom_jitter`.
`geom_signif` is used to test whether these differences are statistically significant.
It adds p-values to plot.
```{r visualize-shannon}
if( !require(ggsignif) ){
install.packages(ggsignif)
}
library(ggplot2)
library(patchwork)
library(ggsignif)
# Subsets the data. Takes only those samples that are from feces, skin, or tongue,
# and creates data frame from the collected data
df <- as.data.frame(colData(tse)[colData(tse)$SampleType %in%
c("Feces", "Skin", "Tongue"), ])
# Changes old levels with new levels
df$SampleType <- factor(df$SampleType)
# For significance testing, all different combinations are determined
comb <- split(t(combn(levels(df$SampleType), 2)),
seq(nrow(t(combn(levels(df$SampleType), 2)))))
ggplot(df, aes(x = SampleType, y = shannon)) +
# Outliers are removed, because otherwise each data point would be plotted twice;
# as an outlier of boxplot and as a point of dotplot.
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.2) +
geom_signif(comparisons = comb, map_signif_level = FALSE) +
theme(text = element_text(size = 10))
```
### Faith phylogenetic diversity {#faith-diversity}
The Faith index is returned by the function `estimateFaith`.
```{r phylo-div-1}
tse <- mia::estimateFaith(tse,
assay_name = "counts")
head(colData(tse)$faith)
```
**Note**: because `tse` is a `TreeSummarizedExperiment` object, its phylogenetic tree is used by default. However, the optional argument `tree` must be provided if `tse` does not contain one.
Below a visual comparison between shannon and faith indices is shown with a violin plot.
```{r phylo-div-2}
plots <- lapply(c("shannon", "faith"),
plotColData,
object = tse, colour_by = "SampleType")
plots[[1]] + plots[[2]] +
plot_layout(guides = "collect")
```
Alternatively, the phylogenetic diversity can be calculated by `mia::estimateDiversity`. This is a faster re-implementation of
the widely used function in _`picante`_ [@R-picante, @Kembel2010].
Load `picante` R package and get the `phylo` stored in `rowTree`.
```{r phylo-div-3}
tse <- mia::estimateDiversity(tse,
assay_name = "counts",
index = "faith",
name = "faith")
```
### Evenness
Evenness can be calculated with `estimateEvenness`.
```{r evenness-1}
tse <- estimateEvenness(tse,
assay_name = "counts",
index="simpson")
head(colData(tse)$simpson)
```
### Dominance
Dominance can be calculated with `estimateDominance`. Here, the `Relative index` is calculated which is the relative abundance of the most dominant species in the sample.
```{r dominance-1}
tse <- estimateDominance(tse,
assay_name = "counts",
index="relative")
head(colData(tse)$relative)
```
### Rarity
`mia` package provides one rarity index called log-modulo skewness. It can be
calculated with `estimateDiversity`.
```{r rarity-1}
tse <- mia::estimateDiversity(tse,
assay_name = "counts",
index = "log_modulo_skewness")
head(colData(tse)$log_modulo_skewness)
```
### Divergence
Divergence can be evaluated with `estimateDivergence`. Reference and algorithm for the calculation of divergence can be specified as `reference` and `FUN`, respectively.
```{r}
tse <- mia::estimateDivergence(tse,
assay_name = "counts",
reference = "median",
FUN = vegan::vegdist)
```
## Visualization
A plot comparing all the diversity measures calculated above and stored in `colData` can then be constructed directly.
```{r plot-all-diversities, fig.width = 6.5}
plots <- lapply(c("observed", "shannon", "simpson", "relative", "faith", "log_modulo_skewness"),
plotColData,
object = tse,
x = "SampleType",
colour_by = "SampleType")
plots <- lapply(plots, "+",
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_blank()))
((plots[[1]] | plots[[2]] | plots[[3]]) /
(plots[[4]] | plots[[5]] | plots[[6]])) +
plot_layout(guides = "collect")
```
## Session Info {-}
```{r sessionInfo, echo=FALSE, results='asis'}
prettySessionInfo()
```