05-statistics.Rmd

# Basic Statistics

## Preamble 

This could very well be a book on its own. We are not going to go into too much detail here, but I will show a few statistics you can run for exploratory measures. Please understand that these stats should *NOT* be what is included in your final manuscript. They are simply a quick and dirty way to gain an understanding of your data.

## Running your models

For starters I am a big fan of using something that allows me to run through several models to get an overall *feel* for the data. This usually involves either [broom](http://varianceexplained.org/r/broom-intro/) or [rstatix](https://rpkgs.datanovia.com/rstatix/) package which allows for pipe friendly modelling. I will commonly do this in order to evaluate if random factors should be included in my final model (covered in the mixed model chapter).

```{r, eval = FALSE, echo = TRUE}
# Question: Is there a subject effect in our data?
# if there is then we introduce it as a random effect.

tbl.SubjectEffect <- df.nirs %>% 
  group_by(metric) %>% 
  do(tidy(lm(value ~ Subject, .))) # requires broom

tbl.SubjectEffect <- df.nirs %>% 
  group_by(metric) %>% 
  rstatix::anova_test(value~Subject)
# answer is yes. There is a subject effect that we must accomodate
```

For a very complete guide on running ANOVAs in R for your first time. [This](https://www.datanovia.com/en/lessons/anova-in-r/) post is immaculate.

You can see some exercises [here](https://stats.idre.ucla.edu/r/seminars/interactions-r/)

## Modeling your data

To test the normality of your data you can use a few different methods

* http://www.sthda.com/english/wiki/normality-test-in-r
* https://rstudio-pubs-static.s3.amazonaws.com/2002_1f803b2bc84c46008d3599a07867a95a.html
* https://www.datanovia.com/en/lessons/anova-in-r/

1. Plot your data
2. Check skewness and kurtosis
3. Shapiro test.

In order to visualize your linear model I use 3 packages/BlandAltmanLeh/vignettes/Intro

1. [jtools](
https://cran.r-project.org/web/packages/jtools/vignettes/summ.html)
2. [interactions](https://github.com/jacob-long/interactions)
3. [ggeffects](https://strengejacke.github.io/ggeffects/]) (used less)
4. To visualize `lmer` mixed models see [here](https://rstudio-pubs-static.s3.amazonaws.com/38628_54b19baf70b64eb5936a3f1f84beb7da.html)
5. To follow up a mixed-model [here](https://benwhalley.github.io/just-enough-r/contrasts-lmer.html)


If you want a very quick way to look at your data, using [rstatix](https://rpkgs.datanovia.com/rstatix/) is my #1 recommendation.


[ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot_slides/slides/ggstatsplot_presentation.html#1) is a nice package that gets around a few problems you may run into.


The [`broom`](http://varianceexplained.org/r/broom-intro/) package for visualising models

## Statistics - Resources
### Correlations in R/random_slopes
	http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5383908/
  https://cran.r-project.org/web/packages/sjPlot/vignettes/sjtitemanalysis.html
				
### Correlation Plots: 
		http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
### General Good Sites
	https://statisticsbyjim.com/jim_frost/
	
### Learn about Residuals: 
	http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/
	https://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/
	https://rpubs.com/iabrady/residual-analysis
	
### Regression Diagnostics
		https://www.statmethods.net/stats/rdiagnostics.html
			
		tidymodels : https://www.tidymodels.org/
			
### Bland-Altman Plots 
		https://cran.r-project.org/web/packages/BlandAltmanLeh/vignettes/Intro.html

### ANOVA 
	https://www.datanovia.com/en/lessons/anova-in-r/

### Mixed Models
Good introduction: https://ourcodingclub.github.io/tutorials/mixed-models/
More intro: https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r
https://dynamicecology.wordpress.com/2015/02/05/how-many-terms-in-your-model-before-statistical-machismo/
https://m-clark.github.io/mixed-models-with-R/random_slopes.html
Package that gives pseudo-R^2 for mixed models --> https://cran.r-project.org/web/packages/jtools/vignettes/summ.html#report_robust_standard_errors
https://rinterested.github.io/statistics/mixed_effects_comparison.html
				
### Cluster Analysis (see bottom of the page for more links)
https://www.datanovia.com/en/product/practical-guide-to-cluster-analysis-in-r/

## Looking at interactions ------

src: https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html
			
### Packages
mertool is a gui for modelling --> https://www.jaredknowles.com/journal/2015/8/12/announcing-mertools
tidymodels: https://www.tidymodels.org/learn/statistics/tidy-analysis/

To assess model performance I use the [performance](https://easystats.github.io/performance/) package. You can visualize your model using this [vignette](https://easystats.github.io/modelbased/articles/estimate_contrasts.html)

```{r}
# Running an ANOVA ----------------
# testing to see if the type of service has an effect on the Journey Time 
#model <- lm(journey_time_avg ~ service, data=df)
model <- aov(journey_time_avg ~ service, data=df)
modsum <- summary(model)
#anova(model) #show me the answer in an ANOVA table
modsum

# ANOVA w/ Type III SS --------------
# requires the "car" library
model <- Anova(lm(Measure ~ Group + Brain_Region + Group*Brain_Region, data = df, contrasts=list(Group=contr.sum, Brain_Region=contr.sum), type=3))
report(model) #requires easystats
describe.aov(model, 'Group:Brain_Region', sstype = 2) #requires apastats

options(contrasts = c("contr.sum", "contr.poly"))
Anova(model, type="III")  # Type III tests


# Post-Hoc according to Field's Book ---------
  contrasts(df$Group)=contr.poly(3)
  #model.1=aov(dv~covariate+factorvariable, data=dataname)
  model.1=aov(Measure ~ Baseline + Group + Brain_Region + Group*Brain_Region, data = df)
  Anova(model.1, type="III") 
  # Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.
  posthoc=glht(model.1, linfct=mcp(Group="Tukey"))  ##gives the post-hoc Tukey analysis
  summary(posthoc) ##shows the output in a nice format.
  
  confint(posthoc) # Give me the confidence intervals for the post-hoc tests
  effect("Group", model.1) # Give me the group effects
 
```


Below is another loop

```{r, echo=TRUE}
# Run Several models using sapply --------------------
# IVs and covariates 
ivs <- c("Neurological_Infection")
covariates <- c("Etiology_of_Injury", "Open_Closed_Injury", "SDH", "EDH", "Intracranial_monitors", "Non_neurologic_surgery", "Skull_fracture", "ICU_LOS_in_days", "Time_to_surgery", "Site_of_surgery", "Craniotomy", "Craniotomy_OR_duration", "Craniectomy", "Craniectomy_OR_duration", "Bone_removal_surgeries", "Initial_Cranioplasty", "Days_to_initial_cranioplasty", "Initial_Cranioplasty_OR_Duration", "Initial_Cranioplasty_bone_material", "Bone_replacement_surgeries", "Duraplasty_material", "Other_Infection", "Non_periop_antibiotics", "Time_of_day_of_initial_surgery", "Week_weekend_of_initial_surgery", "Hydrocephalus", "Post_op_complications", "EtOH_substance_abuse", "Anticoagulation", "Seizure", "HTN", "Smoker", "Diabetes", "Heart_disease", "Liver_disease", "Kidney_disease", "Previous_brain_injury_surgery")

univ_formulas <- sapply(covariates,
                        function(x) as.formula(paste('Neurological_Infection~', x)))

univ_models <- lapply( univ_formulas, function(x){glm(x, ,family=binomial(link='logit'),data=df1)})
# Making a loop to run several stats and saving them as a table --------
# I have an example of this in JumpCut projects
cond = levels(erp$Condition)
comp = levels(erp$ERP_component)
erp$NumUndiagConc <- as.factor(erp$NumUndiagConc)
erp$NumPriorConc <- as.factor(erp$NumPriorConc)

res <- data.frame()
for (yy in (1:(length(cond)))) {
  data <- subset(erp, Condition == cond[yy])
  if ((cond[yy]=="rGoC") | (cond[yy]=="rNgW")){
    comp <- c("ERN_Amp","ERN_Lat","Pe_Amp","Pe_Lat")
  } else if ((cond[yy]!="rGoC") | (cond[yy]!="rNgW")){
    comp = c( "N2_Amp",  "N2_Lat",  "P3a_Amp","P3a_Lat", "P3b_Amp", "P3b_Lat")
  }
  res1 <- data.frame() #reset after each condition loop
  for (zz in (1:(length(comp)))) {
    compData <- subset(data,ERP_component == comp[zz])
    electNum = compData$electNum
    if (length(unique(electNum))==1){
      fm1 <- lmer(ERP_value~AthleteType + Time + AthleteType*Time + (1|id) ,data=compData)
    } else if (length(unique(electNum))>=1){
      fm1 <- lmer(ERP_value~AthleteType + NumPriorConc+ AthleteType*Time +AthleteType*NumPriorConc*Time  + (1|id) + (1|electNum),data=compData)
    }
    assign(paste("mod", cond[yy],gsub("_","",comp[zz]), sep="_"), fm1) #save it in the form of modsum_GoC_N2Amp
    assign(paste("modsum", cond[yy],gsub("_","",comp[zz]), sep="_"), summary(fm1)) #save it in the form of modsum_GoC_N2Amp
    tmod <- as.data.frame(summary(fm1)$coefficients)
    tmod <- round(tmod[-c(1), ],3)
    tmod <- subset(tmod,`Pr(>|t|)`<= 0.1)
    if (nrow(tmod)==0){
      a <- 1
    } else if (nrow(tmod)>=0){
      Test <- rownames(tmod)
      tmod <- cbind(Test,tmod)
      rownames(tmod) <- NULL
      #Component <- c(rep_len(paste(cond[yy],' ',comp[zz]),length.out = length(Test)))#comp[zz]
      Component <- c(rep_len(paste(comp[zz]),length.out = length(Test)))#comp[zz]
      tmod <- cbind(Component,tmod)
      res <- rbind(res,tmod)
      tmod$`Pr(>|t|)`[ tmod$`Pr(>|t|)` <0.001] <- "<0.001"
      tmod$`Pr(>|t|)` <- sub("[0].", ".", tmod$`Pr(>|t|)`) 
      #tmod$`t value` <- round(tmod$`t value`,digits=2)
      tmod$Result <- paste0("t(", round(tmod$df,0), ")=", sprintf("%.2f", tmod$`t value`), ", p=", tmod$`Pr(>|t|)`)
      tmod <- subset(tmod,select=c("Component", "Test","Result"))
      tmod$Test <- gsub("AthleteType.L","Athlete Type",tmod$Test)
      tmod$Test <- gsub(":","$\\times$", tmod$Test, fixed=TRUE)
      tmod$Component <- gsub("_"," ",tmod$Component)
      tmod$Test <- gsub("Time","Time ",tmod$Test)
      res1 <- rbind(res1,tmod)
    }
    assign(paste0(cond[yy],"statsTable"), res1) #save it in the form of modsum_GoC_N2Amp
  }
}
```