forked from microbiome/course_2021_radboud
-
Notifications
You must be signed in to change notification settings - Fork 0
/
10-misc.Rmd
66 lines (51 loc) · 2.23 KB
/
10-misc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
```{r, echo=FALSE}
library(mia)
library(miaViz)
library(dplyr)
library(ANCOMBC)
library(DESeq2)
se <- readRDS("data/se.rds")
tse <- readRDS("data/tse.rds")
tse <- transformCounts(tse, method = "relabundance")
# Agglomerates data to Genus level
tse_genus <- agglomerateByRank(tse, rank = "Genus")
# Perform clr transformation. A Pseudocount of 1 needs to be added,
# because the data contains zeros and the clr transformation includes a
# log transformation.
tse_genus <- transformCounts(tse_genus, method = "clr", pseudocount = 1)
# Does transpose, so samples are in rows, then creates a data frame.
abundance_analysis_data <- data.frame(t(assay(tse_genus, "clr")))
# We will analyse whether abundances differ depending on the"patient_status".
# There are two groups: "ADHD" and "control".
# Let's include that to the data frame.
abundance_analysis_data <- cbind(
abundance_analysis_data,
patient_status = colData(tse_genus)$patient_status
)
```
# Miscellaneous material
## Shapiro-Wilk test
If necessary, it is possible to assess normality of the data with Shapiro-Wilk test.
```{r}
# Does Shapiro-Wilk test. Does it only for columns that contain abundances, not for
# column that contain Groups.
normality_test_p <- c()
for (column in
abundance_analysis_data[, !names(abundance_analysis_data) %in% "patient_status"]){
# Does Shapiro-Wilk test
result <- shapiro.test(column)
# Stores p-value to vector
normality_test_p <- c(normality_test_p, result$p.value)
}
print(paste0("P-values over 0.05: ", sum(normality_test_p>0.05), "/",
length(normality_test_p)))
```
## Deseq details
1. Raw counts are normalized by log-based scaling.
2. Taxa-wise variance is estimated. These values tell how much each taxa varies between samples.
3. A curve is fitted over all those taxa-wise variance estimates that we got in the last step.
This model tells how big the variance is in a specific abundance level.
4. The model is used to shrink those individual variance estimates to avoid the effect of,
e.g., small sample size and higher variance. This reduces the likelihood to get
false positives.
5. Variance estimates are used to compare different groups. We receive a result that shows whether the variance is explained by groups.