forked from microbiome/OMA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path11_taxonomic_information.Rmd
234 lines (173 loc) · 7.18 KB
/
11_taxonomic_information.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# Taxonomic Information {#taxonomic-information}
```{r setup, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble()
```
```{r, message=FALSE}
library(mia)
data("GlobalPatterns", package="mia")
tse <- GlobalPatterns
```
Taxonomic information is a key part of analyzing microbiome data and without
it, any type of data analysis probably will not make much sense. However,
the degree of detail of taxonomic information differs depending on the dataset
and annotation data used.
Therefore, the mia package expects a loose assembly of taxonomic information
and assumes certain key aspects:
* Taxonomic information is given as character vectors or factors in the
`rowData` of a `SummarizedExperiment` object.
* The columns containing the taxonomic information must be named `domain`,
`kingdom`, `phylum`, `class`, `order`, `family`, `genus`, `species` or with
a capital first letter.
* the columns must be given in the order shown above
* column can be omited, but the order must remain
## Assigning taxonomic information.
There are a number of methods to assign taxonomic information. We like to give
a short introduction about the methods available without ranking one over the
other. This has to be your choice based on the result for the individual
dataset.
### dada2
The dada2 package [@R-dada2] implements the `assignTaxonomy` function, which
takes as input the ASV sequences associated with each row of data and a training
dataset. For more information visit the
[dada2 homepage](https://benjjneb.github.io/dada2/assign.html).
### DECIPHER
The DECIPHER package [@R-DECIPHER] implements the `IDTAXA` algorithm to assign
either taxonomic information or function information. For `mia`
only the first option is of interest for now and more information can be
found on the [DECIPHER website](http://www2.decipher.codes/Classification.html).
## Functions to access taxonomic information
`checkTaxonomy` checks whether the taxonomic information is usable for `mia`
```{r}
checkTaxonomy(tse)
```
Since the `rowData` can contain other data, `taxonomyRanks` will return the
columns `mia` assumes to contain the taxonomic information.
```{r}
taxonomyRanks(tse)
```
This can then be used to subset the `rowData` to columns needed.
```{r}
rowData(tse)[,taxonomyRanks(tse)]
```
`taxonomyRankEmpty` checks for empty values in the given `rank` and returns a
logical vector of `length(x)`.
```{r}
all(!taxonomyRankEmpty(tse, rank = "Kingdom"))
table(taxonomyRankEmpty(tse, rank = "Genus"))
table(taxonomyRankEmpty(tse, rank = "Species"))
```
`getTaxonomyLabels` is a multi-purpose function, which turns taxonomic
information into a character vector of `length(x)`
```{r}
head(getTaxonomyLabels(tse))
```
By default, this will use the lowest non-empty information to construct a
string with the following scheme `level:value`. If all levels are the same,
this part is omitted, but can be added by setting `with_rank = TRUE`.
```{r}
phylum <- !is.na(rowData(tse)$Phylum) &
vapply(data.frame(apply(rowData(tse)[,taxonomyRanks(tse)[3:7]],1L,is.na)),all,logical(1))
head(getTaxonomyLabels(tse[phylum,]))
head(getTaxonomyLabels(tse[phylum,], with_rank = TRUE))
```
By default the return value of `getTaxonomyLabels` contains only unique elements
by passing it through `make.unique`. This step can be omitted by setting
`make_unique = FALSE`.
```{r}
head(getTaxonomyLabels(tse[phylum,], with_rank = TRUE, make_unique = FALSE))
```
To apply the loop resolving function `resolveLoop` from the
`TreeSummarizedExperiment` package [@R-TreeSummarizedExperiment] within
`getTaxonomyLabels`, set `resolve_loops = TRUE`.
The function `getUniqueTaxa` gives a list of unique taxa for the specified taxonomic rank.
```{r}
head(getUniqueTaxa(tse, rank = "Phylum"))
```
### Generate a taxonomic tree on the fly
To create a taxonomic tree, `taxonomyTree` used the information and returns a
`phylo` object. Duplicate information from the `rowData` is removed.
```{r}
taxonomyTree(tse)
```
```{r}
tse <- addTaxonomyTree(tse)
tse
```
The implementation is based on the `toTree` function from the
`TreeSummarizedExperiment` package [@R-TreeSummarizedExperiment].
## Data agglomeration {#data-agglomeration}
One of the main applications of taxonomic information in regards to count data
is to agglomerate count data on taxonomic levels and track the influence of
changing conditions through these levels. For this `mia` contains the
`agglomerateByRank` function. The ideal location to store the agglomerated data
is as an alternative experiment.
```{r}
tse <- relAbundanceCounts(tse)
altExp(tse, "Family") <- agglomerateByRank(tse, rank = "Family",
agglomerateTree = TRUE)
altExp(tse, "Family")
```
If multiple assays (counts and relabundance) exist, both will be agglomerated.
```{r}
assayNames(tse)
assayNames(altExp(tse, "Family"))
```
```{r}
assay(altExp(tse, "Family"), "relabundance")[1:5,1:7]
```
```{r}
assay(altExp(tse, "Family"), "counts")[1:5,1:7]
```
`altExpNames` now consists of `Family` level data. This can be extended to use
any level present in `r mia::taxonomyRanks(tse)`.
## Data transformation
Data transformation is a very common procedure in microbiome analysis.
In transformation, each data point is replaced with transformed value that is
calculated by applying transformation formula to the data point. Transformation
can be used, for example, to normalize skewed data, or to reduce weight of bigger
values compared to smaller values.
In mia package, transformations are applied to abundance data. The transformed
abundance table is stored back to 'assays'. mia includes transformation
functions for sample-wise or column-wise transformation ('transformSamples()'),
and for feature-wise or row-wise transformation ('transformFeatures()').
For complete list of available transformations and parameters, see function
[help](https://microbiome.github.io/mia/reference/transformCounts.html).
```{r}
tse <- transformSamples(tse, method = "relabundance")
tse <- transformSamples(x = tse, assay_name = "relabundance", method = "clr",
pseudocount = 1, name = "clr_transformation")
head(assay(tse, "clr_transformation"))
```
- In 'pa' transformation, 'threshold' specifies the value that divides observations to
be absent or present. By default, it is 0.
```{r}
tse <- transformFeatures(tse, method = "pa", threshold = 10)
head(assay(tse, "pa"))
```
```{r}
# list of abundance tables that assays slot contains
assays(tse)
```
## Pick specific
Retrieving of specific elements that are required for specific analysis. For
instance, extracting abundances for a specific taxa in all samples or all taxa
in one sample.
### Abundances of all taxa in specific sample
```{r}
taxa.abund.cc1 <- getAbundanceSample(tse,
sample_id = "CC1",
assay_name = "counts")
taxa.abund.cc1[1:10]
```
### Abundances of specific taxa in all samples
```{r}
taxa.abundances <- getAbundanceFeature(tse,
feature_id = "Phylum:Bacteroidetes",
assay_name = "counts")
taxa.abundances[1:10]
```
## Session Info {-}
```{r sessionInfo, echo=FALSE, results='asis'}
prettySessionInfo()
```