forked from microbiome/OMA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04_containers.Rmd
764 lines (550 loc) · 24.1 KB
/
04_containers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
# Microbiome Data {#containers}
```{r setup, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble()
```
## Data science framework
The building blocks of the framework are **data container**
(SummarizedExperiment and its derivatives), **packages** from various
developers using the TreeSE container, open **demonstration data
sets**, in a separate chapter \@ref(example-data), and **online
tutorials** including this online book as well as the various package
vignettes and other material.
```{r echo=FALSE}
knitr::include_graphics("general/figures/FigureOverviewV2_mod.png")
```
## Data containers
[`SummarizedExperiment`](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html)
(`SE`) is a generic and highly optimized container for complex data
structures. It has become a common choice for analysing various types
of biomedical profiling data, such as RNAseq, ChIp-Seq, microarrays,
flow cytometry, proteomics, and single-cell
sequencing.
[`TreeSummarizedExperiment`](https://www.bioconductor.org/packages/release/bioc/html/TreeSummarizedExperiment.html)
(`TreeSE`) was developed as an extension to incorporate hierarchical
information (such as phylogenetic trees and sample hierarchies) and
reference sequences.
[`MultiAssayExperiment`](https://www.bioconductor.org/packages/release/bioc/html/MultiAssayExperiment.html)
(`MAE`) provides an organized way to bind several different data
structures together in a single object. For example, we can bind
microbiome data (in `TreeSE` format) with metabolomic profiling data
(in `SE`) format, with shared sample metadata. This is convenient and
robust for instance in subsetting and other data manipulation
tasks. Microbiome data can be part of multiomics experiments and
analysis strategies and we want to outline the understanding in which
we think the packages explained and used in this book relate to these
experiment layouts using the `TreeSummarizedExperiment` and classes
beyond.
This section provides an introductions to these data containers. In
microbiome data science, these containers link taxonomic abundance
tables with rich side information on the features and
samples. Taxonomic abundance data can be obtained by 16S rRNA amplicon
or metagenomic sequencing, phylogenetic microarrays, or by other
means. Many microbiome experiments include multiple versions and types
of data generated independently or derived from each other through
transformation or agglomeration. We start by providing recommendations
on how to represent different varieties of multi-table data within the
`TreeSummarizedExperiment` class.
The options and recommendations are summarized in Table \@ref(tab:options).
### Assay data
The original count-based taxonomic abundance tables may have different
transformations, such as logarithmic, Centered Log-Ratio (CLR), or relative
abundance. These are typically stored in _**assays**_.
```{r}
library(mia)
data(GlobalPatterns, package="mia")
tse <- GlobalPatterns
assays(tse)
```
The `assays` slot contains the experimental data as count matrices. Multiple
matrices can be stored the result of `assays` is actually a list of matrices.
```{r}
assays(tse)
```
Individual assays can be accessed via `assay`
```{r}
assay(tse, "counts")[1:5,1:7]
```
To illustrate the use of multiple assays, the relative abundance data can be
calcualted and stored along the original count data using `relAbundanceCounts`.
```{r}
tse <- relAbundanceCounts(tse)
assays(tse)
```
Now there are two assays available in the `tse` object, `counts` and
`relabundance`.
```{r}
assay(tse, "relabundance")[1:5,1:7]
```
Here the dimension of the count data remains unchanged. This is in
fact a requirement for any `SummarizedExperiment` object.
### colData
`colData` contains data on the samples.
```{r coldata}
colData(tse)
```
### rowData
`rowData` contains data on the features of the analyzed samples. Of particular
interest for the microbiome field this is used to store taxonomic information.
```{r rowdata}
rowData(tse)
```
### rowTree
Phylogenetic trees also play an important role for the microbiome field. The
`TreeSummarizedExperiment` class is able to keep track of feature and node
relations via two functions, `rowTree` and `rowLinks`.
A tree can be accessed via `rowTree` as `phylo` object.
```{r rowtree}
rowTree(tse)
```
The links to the individual features are available through `rowLinks`.
```{r rowlinks}
rowLinks(tse)
```
Please note that there can be a 1:1 relationship between tree nodes and
features, but this is not a must have. This means there can be features, which
are not linked to nodes, and nodes, which are not linked to features. To change
the links in an existing object, the `changeTree` function is available.
### Alternative experiments
_**Alternative experiments**_ differ from transformations as they can
contain complementary data, which is no longer tied to the same
dimensions as the assay data. However, the number of samples (columns)
must be the same.
This can come into play for instance when one has taxonomic abundance
profiles quantified with different measurement technologies, such as
phylogenetic microarrays, amplicon sequencing, or metagenomic
sequencing. Such alternative experiments that concern the same samples
can be stored as
1. Separate _assays_ assuming that the taxonomic information can be mapped
between feature directly 1:1; or
2. data in the _altExp_ slot of the `TreeSummarizedExperiment`, if the feature
dimensions differ. Each element of the _altExp_ slot is a `SummarizedExperiment`
or an object from a derived class with independent feature data.
As an example, we show how to store taxonomic abundance tables
agglomerated at different taxonomic levels. However, the data could as
well originate from entirely different measurement sources as long as
the samples are matched.
```{r}
# Agglomerate the data to Phylym level
tse_phylum <- agglomerateByRank(tse, "Phylum")
# both have the same number of columns (samples)
dim(tse)
dim(tse_phylum)
# Add the new table as an alternative experiment
altExp(tse, "Phylum") <- tse_phylum
altExpNames(tse)
# Pick a sample subset: this acts on both altExp and assay data
tse[,1:10]
dim(altExp(tse[,1:10],"Phylum"))
```
For more details of altExp have a look at the [Intro vignette](https://bioconductor.org/packages/release/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html) of the
`SingleCellExperiment` package [@R-SingleCellExperiment].
### MultiAssayExperiments
_**Multiple experiments**_ relate to complementary measurement types,
such as transcriptomic or metabolomic profiling of the microbiome or
the host. Multiple experiments can be represented using the same
options as alternative experiments, or by using the
`MultiAssayExperiment` class [@R-MultiAssayExperiment]. Depending on how the
datasets relate to each other the data can be stored as:
1. Separate _altExp_ if the samples can be matched directly 1:1; or
2. As `MultiAssayExperiment` objects, in which the connections between
samples are defined through a `sampleMap`. Each element on the
`experimentsList` of an `MultiAssayExperiment` is `matrix` or
`matrix`-like object including `SummarizedExperiment` objects, and the
number of samples can differ between the elements.
```{r}
#TODO: Find the right dataset to explain a non 1:1 sample relationship
```
For information have a look at the [intro vignette](https://bioconductor.org/packages/release/bioc/vignettes/MultiAssayExperiment/inst/doc/MultiAssayExperiment.html) of the `MultiAssayExperiment` package.
Option Rows (features) Cols (samples) Recommended
--------- -------------- --------------- ------------------------
assays match match Data transformations
altExp free match Alternative experiments
MultiAssay free free (mapping) Multi-omic experiments
Table: (\#tab:options) **Recommended options for storing multiple data tables in microbiome studies** The _assays_ are best suited for data transformations (one-to-one match between samples and columns across the assays). The _alternative experiments_ are particularly suitable for alternative versions of the data that are of same type but may have a different number of features (e.g. taxonomic groups); this is for instance the case with taxonomic abundance tables agglomerated at different levels (e.g. genus vs. phyla) or alternative profiling technologies (e.g. amplicon sequencing vs. shallow shotgun metagenomics). For alternative experiments one-to-one match between samples (cols) is required but the alternative experiment tables can have different numbers of features (rows). Finally, elements of the _MultiAssayExperiment_ provide the most flexible way to incorporate multi-omic data tables with flexible numbers of samples and features. We recommend these conventions as the basis for methods development and application in microbiome studies.
## Loading experimental microbiome data
### 16S workflow
Result of amplicon sequencing is large number of files that include all the sequences
that were read from samples. Those sequences need to be matched with taxa. Additionally,
we need to know how many times each taxa were found from each sample.
There are several algorithms to do that, and DADA2 is one of the most common.
You can find DADA2 pipeline tutorial for example from
[here](https://benjjneb.github.io/dada2/tutorial.html).
After DADA2 portion of the tutorial is the data is stored into _phyloseq_ object
(Bonus: Handoff to phyloseq). To store the data to _TreeSummarizedExperiment_,
follow the example below.
You can find full workflow script without further explanations and comments from
[here](https://github.com/microbiome/OMA/blob/master/dada2_workflow.Rmd)
```{r dada2_1, include=FALSE}
# Load objects
seqtab.nochim <- readRDS("data/dada2_seqtab.nochim")
taxa <- readRDS("data/dada2_taxa")
```
Load required packages.
```{r dada2_2}
library(mia)
library(ggplot2)
if( !require("BiocManager") ){
install.packages("BiocManager")
library("BiocManager")
}
if( !require("Biostrings") ){
BiocManager::install("Biostrings")
library("Biostrings")
}
library(Biostrings)
```
Create arbitrary example sample metadata like it was done in tutorial. Usually,
sample metadata is imported as a file.
```{r dada2_3}
samples.out <- rownames(seqtab.nochim)
subject <- sapply(strsplit(samples.out, "D"), `[`, 1)
gender <- substr(subject,1,1)
subject <- substr(subject,2,999)
day <- as.integer(sapply(strsplit(samples.out, "D"), `[`, 2))
samdf <- data.frame(Subject=subject, Gender=gender, Day=day)
samdf$When <- "Early"
samdf$When[samdf$Day>100] <- "Late"
rownames(samdf) <- samples.out
```
Convert data into right format and create _TreeSE_ object.
```{r dada2_4}
# Create a list that contains assays
counts <- t(seqtab.nochim)
counts <- as.matrix(counts)
assays <- SimpleList(counts = counts)
# Convert colData and rowData into DataFrame
samdf <- DataFrame(samdf)
taxa <- DataFrame(taxa)
# Create TreeSE
tse <- TreeSummarizedExperiment(assays = assays,
colData = samdf,
rowData = taxa
)
# Remove mock sample like it is also done in DADA2 pipeline tutorial
tse <- tse[ , colnames(tse) != "mock"]
```
Add sequences into _referenceSeq_ slot and convert rownames into simpler format.
```{r dada2_5}
# Convert sequences into right format
dna <- Biostrings::DNAStringSet( rownames(tse) )
# Add sequences into referenceSeq slot
referenceSeq(tse) <- dna
# Convert rownames into ASV_number format
rownames(tse) <- paste0("ASV", seq( nrow(tse) ))
tse
```
### Import from external files
Microbiome (taxonomic) profiling data is commonly distributed in
various file formats. You can import such external data files as a
(Tree)SummarizedExperiment object but the details depend on the file
format. Here, we provide examples for common formats.
**CSV data tables** can be imported with the standard R functions,
then converted to the desired format. For detailed examples, you can
check the [Bioconductor course
material](https://bioconductor.org/help/course-materials/2019/BSS2019/04_Practical_CoreApproachesInBioconductor.html)
by Martin Morgan. The following example reads abundance tables,
taxonomic mapping tables, and sample metadata, assuming that the
input data files are properly prepared with appropriate row and
column names.
```{r importingcsv1, message=FALSE}
count_file <- "data/assay_taxa.csv"
tax_file <- "data/rowdata_taxa.csv"
sample_file <- "data/coldata.csv"
# Load files
counts <- read.csv(count_file) # Abundance table (e.g. ASV data; to assay data)
tax <- read.csv(tax_file) # Taxonomy table (to rowData)
samples <- read.csv(sample_file) # Sample data (to colData)
```
**Always ensure that the tables have rownames!** The _TreeSE_ constructor compares
rownames and makes sure that, for example, right samples are linked with right patient.
```{r importingcsv2}
# Add rownames and remove an additional column
rownames(counts) <- counts$X
counts$X <- NULL
# Add rownames and remove an additional column
rownames(samples) <- samples$X
samples$X <- NULL
# Add rownames and remove an additional column
rownames(tax) <- tax$X
tax$X <- NULL
# As an example:
# If e.g. samples do not match between colData and counts table, you must order
# counts based on colData
if( any( colnames(counts) != rownames(samples) ) ){
counts <- counts[ , rownames(samples) ]
}
# And same with rowData and counts...
if( any( rownames(counts) != rownames(tax) ) ){
counts <- counts[ rownames(tax), ]
}
```
The tables must be in correct format:
- counts --> matrix
- rowData --> DataFrame
- colData --> DataFrame
```{r importingcsv3}
# Ensure that the data is in correct format
# counts should be in matrix format
counts <- as.matrix(counts)
# And it should be added to a SimpleList
assays <- SimpleList(counts = counts)
# colData and rowData should be in DataFrame format
colData <- DataFrame(colData)
rowData <- DataFrame(rowData)
# Create a TreeSE
tse_taxa <- TreeSummarizedExperiment(assays = assays,
colData = samples,
rowData = tax)
tse_taxa
```
To construct a _MultiAssayExperiment_ object, just combine multiple _TreeSE_ data containers.
Here we import metabolite data from the same study.
```{r importingcsv4, message=FALSE}
count_file <- "data/assay_metabolites.csv"
sample_file <- "data/coldata.csv"
# Load files
counts <- read.csv(count_file)
samples <- read.csv(sample_file)
# Add rownames and remove an additional column
rownames(counts) <- counts$X
counts$X <- NULL
rownames(samples) <- samples$X
samples$X <- NULL
# Convert into right format
counts <- as.matrix(counts)
assays <- SimpleList(concs = counts)
colData <- DataFrame(colData)
# Create a TreeSE
tse_metabolite <- TreeSummarizedExperiment(assays = assays,
colData = samples)
tse_metabolite
```
Now we can combine these two experiments into _MAE_.
```{r importingcsv5}
# Create an ExperimentList that includes experiments
experiments <- ExperimentList(microbiome = tse_taxa,
metabolite = tse_metabolite)
# Create a MAE
mae <- MultiAssayExperiment(experiments = experiments)
mae
```
Specific import functions are provided for:
- Biom files (see `help(mia::loadFromBiom)`)
- QIIME2 files (see `help(mia::loadFromQIIME2)`)
- Mothur files (see `help(mia::loadFromMothur)`)
#### Biom example
This example shows how Biom files are imported into a
`TreeSummarizedExperiment` object.
The data is from following publication:
Tengeler AC _et al._ (2020) [**Gut microbiota from persons with
attention-deficit/hyperactivity disorder affects the brain in
mice**](https://doi.org/10.1186/s40168-020-00816-x).
The data set consists of 3 files:
- biom file: abundance table and taxonomy information
- csv file: sample metadata
- tree file: phylogenetic tree
Store the data in your desired local directory (for instance, _data/_ under the
working directory), and define source file paths
```{r}
biom_file_path <- "data/Aggregated_humanization2.biom"
sample_meta_file_path <- "data/Mapping_file_ADHD_aggregated.csv"
tree_file_path <- "data/Data_humanization_phylo_aggregation.tre"
```
Now we can load the biom data into a SummarizedExperiment (SE) object.
```{r}
library(mia)
# Imports the data
se <- loadFromBiom(biom_file_path)
# Check
se
```
The `assays` slot includes a list of abundance tables. The imported
abundance table is named as "counts". Let us inspect only the first
cols and rows.
```{r}
assays(se)$counts[1:3, 1:3]
```
The `rowdata` includes taxonomic information from the biom file. The `head()` command
shows just the beginning of the data table for an overview.
`knitr::kable()` is for printing the information more nicely.
```{r}
head(rowData(se))
```
These taxonomic rank names (column names) are not real rank
names. Let’s replace them with real rank names.
In addition to that, the taxa names include, e.g., '"k__' before the name, so let's
make them cleaner by removing them.
```{r}
names(rowData(se)) <- c("Kingdom", "Phylum", "Class", "Order",
"Family", "Genus")
# Goes through the whole DataFrame. Removes '.*[kpcofg]__' from strings, where [kpcofg]
# is any character from listed ones, and .* any character.
rowdata_modified <- BiocParallel::bplapply(rowData(se),
FUN = stringr::str_remove,
pattern = '.*[kpcofg]__')
# Genus level has additional '\"', so let's delete that also
rowdata_modified <- BiocParallel::bplapply(rowdata_modified,
FUN = stringr::str_remove,
pattern = '\"')
# rowdata_modified is a list, so it is converted back to DataFrame format.
rowdata_modified <- DataFrame(rowdata_modified)
# And then assigned back to the SE object
rowData(se) <- rowdata_modified
# Now we have a nicer table
head(rowData(se))
```
We notice that the imported biom file did not contain the sample meta data
yet, so it includes an empty data frame.
```{r}
head(colData(se))
```
Let us add a sample metadata file.
```{r}
# We use this to check what type of data it is
# read.table(sample_meta_file_path)
# It seems like a comma separated file and it does not include headers
# Let us read it and then convert from data.frame to DataFrame
# (required for our purposes)
sample_meta <- DataFrame(read.table(sample_meta_file_path, sep = ",", header = FALSE))
# Add sample names to rownames
rownames(sample_meta) <- sample_meta[,1]
# Delete column that included sample names
sample_meta[,1] <- NULL
# We can add headers
colnames(sample_meta) <- c("patient_status", "cohort", "patient_status_vs_cohort", "sample_name")
# Then it can be added to colData
colData(se) <- sample_meta
```
Now `colData` includes the sample metadata.
```{r}
head(colData(se))
```
Now, let's add a phylogenetic tree.
The current data object, se, is a SummarizedExperiment object. This
does not include a slot for adding a phylogenetic tree. In order to do
this, we can convert the SE object to an extended TreeSummarizedExperiment
object which includes also a `rowTree` slot.
TreeSummarizedExperiment contains also other additional slots and features which
is why we recommend to use `TreeSE`.
```{r}
tse <- as(se, "TreeSummarizedExperiment")
# tse includes same data as se
tse
```
Next, let us read the tree data file and add it to the R data object (tse).
```{r}
# Reads the tree file
tree <- ape::read.tree(tree_file_path)
# Add tree to rowTree
rowTree(tse) <- tree
# Check
tse
```
Now `rowTree` includes a phylogenetic tree:
```{r, eval=FALSE}
head(rowTree(tse))
```
### Conversions between data formats in R
If the data has already been imported in R in another format, it
can be readily converted into `TreeSummarizedExperiment`, as shown in our next
example. Note that similar conversion functions to
`TreeSummarizedExperiment` are available for multiple data formats via
the `mia` package (see makeTreeSummarizedExperimentFrom* for phyloseq,
Biom, and DADA2).
```{r, message=FALSE}
library(mia)
# phyloseq example data
data(GlobalPatterns, package="phyloseq")
GlobalPatterns_phyloseq <- GlobalPatterns
GlobalPatterns_phyloseq
```
```{r, message=FALSE}
# convert phyloseq to TSE
GlobalPatterns_TSE <- makeTreeSummarizedExperimentFromPhyloseq(GlobalPatterns_phyloseq)
GlobalPatterns_TSE
```
We can also convert `TreeSummarizedExperiment` objects into `phyloseq`
with respect to the shared components that are supported by both
formats (i.e. taxonomic abundance table, sample metadata, taxonomic
table, phylogenetic tree, sequence information). This is useful for
instance when additional methods are available for `phyloseq`.
```{r, message=FALSE}
# convert TSE to phyloseq
GlobalPatterns_phyloseq2 <- makePhyloseqFromTreeSummarizedExperiment(GlobalPatterns_TSE)
GlobalPatterns_phyloseq2
```
Conversion is possible between other data formats. Interested readers can refer to the following functions:
* [makeTreeSummarizedExperimentFromDADA2](https://microbiome.github.io/mia/reference/makeTreeSummarizedExperimentFromDADA2.html)
* [makeSummarizedExperimentFromBiom](https://microbiome.github.io/mia/reference/makeSummarizedExperimentFromBiom.html)
* [loadFromMetaphlan](https://microbiome.github.io/mia/reference/loadFromMetaphlan.html)
* [readQZA](https://microbiome.github.io/mia/reference/loadFromQIIME2.html)
## Demonstration data {#example-data}
Open demonstration data for testing and benchmarking purposes is
available from multiple locations. This chapter introduces some
options. The other chapters of this book provide ample examples about
the use of the data.
### Package data {#package-data}
The `mia` R package contains example data sets that are direct
conversions from the alternative `phyloseq` container to the
`TreeSummarizedExperiment` container.
List the [available
datasets](https://microbiome.github.io/mia/reference/index.html) in
the `mia` package:
```{r, message=FALSE, eval=FALSE}
library(mia)
data(package="mia")
```
Load the `GlobalPatterns` data from the `mia` package:
```{r, message=FALSE}
data("GlobalPatterns", package="mia")
GlobalPatterns
```
Check the documentation for this data set:
```{r, message=FALSE, echo=FALSE}
help(GlobalPatterns)
```
### ExperimentHub data
[ExperimentHub](https://bioconductor.org/packages/release/bioc/vignettes/ExperimentHub/inst/doc/ExperimentHub.html)
provides a variety of data resources, including the
[microbiomeDataSets](https://bioconductor.org/packages/devel/data/experiment/html/microbiomeDataSets.html)
package.
A table of the available data sets is available through the `availableDataSets`
function.
```{r, message=FALSE}
library(microbiomeDataSets)
availableDataSets()
```
All data are downloaded from ExperimentHub and cached for local
re-use. Check the [man pages of each
function](https://microbiome.github.io/microbiomeDataSets/reference/index.html)
for a detailed documentation of the data contents and references. Let
us retrieve a `r Biocpkg("MultiAssayExperiment")` data set:
```{r, message=FALSE, eval=FALSE}
mae <- HintikkaXOData()
```
Data is available in `r Biocpkg("SummarizedExperiment")`, `r
Biocpkg("TreeSummarizedExperiment")`, and `r
Biocpkg("MultiAssayExperiment")` data containers; see the separate
page on [alternative
containers](https://microbiome.github.io/OMA/multitable.html) for more
details.
### Other data sources
The
[curatedMetagenomicData](https://waldronlab.io/curatedMetagenomicData)
is an independent source that provides various example data sets as
`(Tree)SummarizedExperiment` objects. This resource provides curated
human microbiome data including gene families, marker abundance,
marker presence, pathway abundance, pathway coverage, and relative
abundance for samples from different body sites. See the package
homepage for more details on data availability and access.
As one example, let us retrieve the Vatanen (2016) [@Vatanen2016] data
set. This is a larger collection with a bit longer download time.
```{r, message=FALSE, eval=FALSE}
library(curatedMetagenomicData)
tse <- curatedMetagenomicData("Vatanen*", dryrun = FALSE, counts = TRUE)
```
## Session Info {-}
```{r sessionInfo, echo=FALSE, results='asis'}
prettySessionInfo()
```