-
Notifications
You must be signed in to change notification settings - Fork 4
/
05-statistics.Rmd
207 lines (159 loc) · 10 KB
/
05-statistics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# Basic Statistics
## Preamble
This could very well be a book on its own. We are not going to go into too much detail here, but I will show a few statistics you can run for exploratory measures. Please understand that these stats should *NOT* be what is included in your final manuscript. They are simply a quick and dirty way to gain an understanding of your data.
## Running your models
For starters I am a big fan of using something that allows me to run through several models to get an overall *feel* for the data. This usually involves either [broom](http://varianceexplained.org/r/broom-intro/) or [rstatix](https://rpkgs.datanovia.com/rstatix/) package which allows for pipe friendly modelling. I will commonly do this in order to evaluate if random factors should be included in my final model (covered in the mixed model chapter).
```{r, eval = FALSE, echo = TRUE}
# Question: Is there a subject effect in our data?
# if there is then we introduce it as a random effect.
tbl.SubjectEffect <- df.nirs %>%
group_by(metric) %>%
do(tidy(lm(value ~ Subject, .))) # requires broom
tbl.SubjectEffect <- df.nirs %>%
group_by(metric) %>%
rstatix::anova_test(value~Subject)
# answer is yes. There is a subject effect that we must accomodate
```
For a very complete guide on running ANOVAs in R for your first time. [This](https://www.datanovia.com/en/lessons/anova-in-r/) post is immaculate.
You can see some exercises [here](https://stats.idre.ucla.edu/r/seminars/interactions-r/)
## Modeling your data
To test the normality of your data you can use a few different methods
* http://www.sthda.com/english/wiki/normality-test-in-r
* https://rstudio-pubs-static.s3.amazonaws.com/2002_1f803b2bc84c46008d3599a07867a95a.html
* https://www.datanovia.com/en/lessons/anova-in-r/
1. Plot your data
2. Check skewness and kurtosis
3. Shapiro test.
In order to visualize your linear model I use 3 packages/BlandAltmanLeh/vignettes/Intro
1. [jtools](
https://cran.r-project.org/web/packages/jtools/vignettes/summ.html)
2. [interactions](https://github.com/jacob-long/interactions)
3. [ggeffects](https://strengejacke.github.io/ggeffects/]) (used less)
4. To visualize `lmer` mixed models see [here](https://rstudio-pubs-static.s3.amazonaws.com/38628_54b19baf70b64eb5936a3f1f84beb7da.html)
5. To follow up a mixed-model [here](https://benwhalley.github.io/just-enough-r/contrasts-lmer.html)
If you want a very quick way to look at your data, using [rstatix](https://rpkgs.datanovia.com/rstatix/) is my #1 recommendation.
[ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot_slides/slides/ggstatsplot_presentation.html#1) is a nice package that gets around a few problems you may run into.
The [`broom`](http://varianceexplained.org/r/broom-intro/) package for visualising models
## Statistics - Resources
### Correlations in R/random_slopes
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5383908/
https://cran.r-project.org/web/packages/sjPlot/vignettes/sjtitemanalysis.html
### Correlation Plots:
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
### General Good Sites
https://statisticsbyjim.com/jim_frost/
### Learn about Residuals:
http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/
https://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/
https://rpubs.com/iabrady/residual-analysis
### Regression Diagnostics
https://www.statmethods.net/stats/rdiagnostics.html
tidymodels : https://www.tidymodels.org/
### Bland-Altman Plots
https://cran.r-project.org/web/packages/BlandAltmanLeh/vignettes/Intro.html
### ANOVA
https://www.datanovia.com/en/lessons/anova-in-r/
### Mixed Models
Good introduction: https://ourcodingclub.github.io/tutorials/mixed-models/
More intro: https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r
https://dynamicecology.wordpress.com/2015/02/05/how-many-terms-in-your-model-before-statistical-machismo/
https://m-clark.github.io/mixed-models-with-R/random_slopes.html
Package that gives pseudo-R^2 for mixed models --> https://cran.r-project.org/web/packages/jtools/vignettes/summ.html#report_robust_standard_errors
https://rinterested.github.io/statistics/mixed_effects_comparison.html
### Cluster Analysis (see bottom of the page for more links)
https://www.datanovia.com/en/product/practical-guide-to-cluster-analysis-in-r/
## Looking at interactions ------
src: https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html
### Packages
mertool is a gui for modelling --> https://www.jaredknowles.com/journal/2015/8/12/announcing-mertools
tidymodels: https://www.tidymodels.org/learn/statistics/tidy-analysis/
To assess model performance I use the [performance](https://easystats.github.io/performance/) package. You can visualize your model using this [vignette](https://easystats.github.io/modelbased/articles/estimate_contrasts.html)
```{r}
# Running an ANOVA ----------------
# testing to see if the type of service has an effect on the Journey Time
#model <- lm(journey_time_avg ~ service, data=df)
model <- aov(journey_time_avg ~ service, data=df)
modsum <- summary(model)
#anova(model) #show me the answer in an ANOVA table
modsum
# ANOVA w/ Type III SS --------------
# requires the "car" library
model <- Anova(lm(Measure ~ Group + Brain_Region + Group*Brain_Region, data = df, contrasts=list(Group=contr.sum, Brain_Region=contr.sum), type=3))
report(model) #requires easystats
describe.aov(model, 'Group:Brain_Region', sstype = 2) #requires apastats
options(contrasts = c("contr.sum", "contr.poly"))
Anova(model, type="III") # Type III tests
# Post-Hoc according to Field's Book ---------
contrasts(df$Group)=contr.poly(3)
#model.1=aov(dv~covariate+factorvariable, data=dataname)
model.1=aov(Measure ~ Baseline + Group + Brain_Region + Group*Brain_Region, data = df)
Anova(model.1, type="III")
# Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.
posthoc=glht(model.1, linfct=mcp(Group="Tukey")) ##gives the post-hoc Tukey analysis
summary(posthoc) ##shows the output in a nice format.
confint(posthoc) # Give me the confidence intervals for the post-hoc tests
effect("Group", model.1) # Give me the group effects
```
Below is another loop
```{r, echo=TRUE}
# Run Several models using sapply --------------------
# IVs and covariates
ivs <- c("Neurological_Infection")
covariates <- c("Etiology_of_Injury", "Open_Closed_Injury", "SDH", "EDH", "Intracranial_monitors", "Non_neurologic_surgery", "Skull_fracture", "ICU_LOS_in_days", "Time_to_surgery", "Site_of_surgery", "Craniotomy", "Craniotomy_OR_duration", "Craniectomy", "Craniectomy_OR_duration", "Bone_removal_surgeries", "Initial_Cranioplasty", "Days_to_initial_cranioplasty", "Initial_Cranioplasty_OR_Duration", "Initial_Cranioplasty_bone_material", "Bone_replacement_surgeries", "Duraplasty_material", "Other_Infection", "Non_periop_antibiotics", "Time_of_day_of_initial_surgery", "Week_weekend_of_initial_surgery", "Hydrocephalus", "Post_op_complications", "EtOH_substance_abuse", "Anticoagulation", "Seizure", "HTN", "Smoker", "Diabetes", "Heart_disease", "Liver_disease", "Kidney_disease", "Previous_brain_injury_surgery")
univ_formulas <- sapply(covariates,
function(x) as.formula(paste('Neurological_Infection~', x)))
univ_models <- lapply( univ_formulas, function(x){glm(x, ,family=binomial(link='logit'),data=df1)})
# Making a loop to run several stats and saving them as a table --------
# I have an example of this in JumpCut projects
cond = levels(erp$Condition)
comp = levels(erp$ERP_component)
erp$NumUndiagConc <- as.factor(erp$NumUndiagConc)
erp$NumPriorConc <- as.factor(erp$NumPriorConc)
res <- data.frame()
for (yy in (1:(length(cond)))) {
data <- subset(erp, Condition == cond[yy])
if ((cond[yy]=="rGoC") | (cond[yy]=="rNgW")){
comp <- c("ERN_Amp","ERN_Lat","Pe_Amp","Pe_Lat")
} else if ((cond[yy]!="rGoC") | (cond[yy]!="rNgW")){
comp = c( "N2_Amp", "N2_Lat", "P3a_Amp","P3a_Lat", "P3b_Amp", "P3b_Lat")
}
res1 <- data.frame() #reset after each condition loop
for (zz in (1:(length(comp)))) {
compData <- subset(data,ERP_component == comp[zz])
electNum = compData$electNum
if (length(unique(electNum))==1){
fm1 <- lmer(ERP_value~AthleteType + Time + AthleteType*Time + (1|id) ,data=compData)
} else if (length(unique(electNum))>=1){
fm1 <- lmer(ERP_value~AthleteType + NumPriorConc+ AthleteType*Time +AthleteType*NumPriorConc*Time + (1|id) + (1|electNum),data=compData)
}
assign(paste("mod", cond[yy],gsub("_","",comp[zz]), sep="_"), fm1) #save it in the form of modsum_GoC_N2Amp
assign(paste("modsum", cond[yy],gsub("_","",comp[zz]), sep="_"), summary(fm1)) #save it in the form of modsum_GoC_N2Amp
tmod <- as.data.frame(summary(fm1)$coefficients)
tmod <- round(tmod[-c(1), ],3)
tmod <- subset(tmod,`Pr(>|t|)`<= 0.1)
if (nrow(tmod)==0){
a <- 1
} else if (nrow(tmod)>=0){
Test <- rownames(tmod)
tmod <- cbind(Test,tmod)
rownames(tmod) <- NULL
#Component <- c(rep_len(paste(cond[yy],' ',comp[zz]),length.out = length(Test)))#comp[zz]
Component <- c(rep_len(paste(comp[zz]),length.out = length(Test)))#comp[zz]
tmod <- cbind(Component,tmod)
res <- rbind(res,tmod)
tmod$`Pr(>|t|)`[ tmod$`Pr(>|t|)` <0.001] <- "<0.001"
tmod$`Pr(>|t|)` <- sub("[0].", ".", tmod$`Pr(>|t|)`)
#tmod$`t value` <- round(tmod$`t value`,digits=2)
tmod$Result <- paste0("t(", round(tmod$df,0), ")=", sprintf("%.2f", tmod$`t value`), ", p=", tmod$`Pr(>|t|)`)
tmod <- subset(tmod,select=c("Component", "Test","Result"))
tmod$Test <- gsub("AthleteType.L","Athlete Type",tmod$Test)
tmod$Test <- gsub(":","$\\times$", tmod$Test, fixed=TRUE)
tmod$Component <- gsub("_"," ",tmod$Component)
tmod$Test <- gsub("Time","Time ",tmod$Test)
res1 <- rbind(res1,tmod)
}
assign(paste0(cond[yy],"statsTable"), res1) #save it in the form of modsum_GoC_N2Amp
}
}
```