more on HE_manova vignette

friendly · Apr 28, 2024 · 605cc85 · 605cc85
1 parent 94c3582
commit 605cc85
Show file tree

Hide file tree

Showing 9 changed files with 335 additions and 111 deletions.
diff --git a/docs/articles/HE_manova.html b/docs/articles/HE_manova.html
diff --git a/docs/articles/HE_mmra.html b/docs/articles/HE_mmra.html
diff --git a/docs/articles/fig/manova-addhealth-means-each-1.png b/docs/articles/fig/manova-addhealth-means-each-1.png
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
@@ -4,7 +4,7 @@ pkgdown_sha: ~
 articles:
   HE_manova: HE_manova.html
   HE_mmra: HE_mmra.html
-last_built: 2024-04-27T22:11Z
+last_built: 2024-04-28T18:21Z
 urls:
   reference: https://friendly.github.io/heplots/reference
   article: https://friendly.github.io/heplots/articles

diff --git a/docs/reference/ellipse3d.axes.html b/docs/reference/ellipse3d.axes.html
diff --git a/docs/search.json b/docs/search.json
diff --git a/extra/addhealth-plot.R b/extra/addhealth-plot.R
@@ -3,6 +3,7 @@ data("AddHealth", package = "heplots")
 library(ggplot2)
 library(dplyr, warn=FALSE)
 library(car)
+library(patchwork)
 
 
 means <- AddHealth |>
@@ -32,6 +33,25 @@ means <- AddHealth |>
 
 # plot means with error bars
 
+p1 <-ggplot(data = means, aes(x = grade, y = anxiety)) +
+  geom_point(size = 4) +
+  geom_line(aes(group = 1), linewidth = 1.2) +
+  geom_errorbar(aes(ymin = anxiety - anx_se, 
+                   ymax = anxiety + anx_se),
+                width = .2) +
+  theme_bw(base_size = 15)
+
+p2 <-ggplot(data = means, aes(x = grade, y = depression)) +
+  geom_point(size = 4) +
+  geom_line(aes(group = 1), linewidth = 1.2) +
+  geom_errorbar(aes(ymin = depression - dep_se, 
+                    ymax = depression + dep_se),
+                width = .2) +
+  theme_bw(base_size = 15)
+
+p1 + p2
+
+
 ggplot(data = means, aes(x = anxiety, y = depression, 
                          color = grade)) +
   geom_point(size = 3) +
@@ -93,6 +113,8 @@ lmtest::coeftest(AH.mlm)
 # test for all non-linear trends together
 # linearHypothesis(AH.mlm, rownames(coef(AH.mlm)[-(1:2)]))
 linearHypothesis(AH.mlm, "grade.L")
+linearHypothesis(AH.mlm, "grade.Q")
+
 linearHypothesis(AH.mlm, rownames(coef(AH.mlm))[3:5])
 
 

diff --git a/vignettes/HE_manova.Rmd b/vignettes/HE_manova.Rmd
@@ -98,16 +98,35 @@ The first question can be answered by fitting separate linear models for each re
 (e.g., `lm(anxiety ~ grade))`). However the second question is more interesting because it 
 considers the two responses together and takes their correlation into account. This would
 be fit as the MLM:
-```{r ah-mlm, eval=FALSE}
+
+$$
+\mathbf{y} = \boldsymbol{\beta}_0 + \boldsymbol{\beta}_1 x + \boldsymbol{\beta}_2 x^2 + \cdots \boldsymbol{\beta}_5 x^5
+(\#eq:AH-mod)
+$$
+or,
+$$
+\begin{eqnarray*}
+\begin{bmatrix} y_{\text{anx}} \\y_{\text{dep}} \end{bmatrix} & = &
+\begin{bmatrix} \beta_{0,\text{anx}} \\ \beta_{0,\text{dep}} \end{bmatrix} +
+\begin{bmatrix} \beta_{1,\text{anx}} \\ \beta_{1,\text{dep}} \end{bmatrix} \text{grade} +
+\begin{bmatrix} \beta_{2,\text{anx}} \\ \beta_{2,\text{dep}} \end{bmatrix} \text{grade}^2 + \cdots
+\begin{bmatrix} \beta_{5,\text{anx}} \\ \beta_{5,\text{dep}} \end{bmatrix} \text{grade}^5
+\end{eqnarray*}
+$$
+
+Using `lm()` we get the coefficients for each of the polynomial terms in `grade`:
+
+```{r ah-mlm}
 lm(cbind(anxiety, depression) ~ grade, data=AddHealth)
 ```
 
-## Exploratory plots
+## Exploratory plots {-}
 Some exploratory analysis is useful before fitting and visualizing models.
 As a first step, find the means, standard deviations, and standard errors of the means:
 ```{r addhealth-means}
 library(ggplot2)
 library(dplyr)
+library(patchwork)
 
 means <- AddHealth |>
   group_by(grade) |>
@@ -126,8 +145,33 @@ means <- AddHealth |>
 Now, plot the means with $\pm 1$ error bars. It appears that average level of both depression
 and anxiety increase steadily with grade, except for grades 11 and 12 which don't differ much.
 
+```{r addhealth-means-each}
+#| out.width = "100%",
+#| fig.width = 7,
+#| fig.height = 4,
+#| fig.cap = "Means of anxiety and depression by grade, with $\\pm 1$ standard error bars."
+p1 <-ggplot(data = means, aes(x = grade, y = anxiety)) +
+  geom_point(size = 4) +
+  geom_line(aes(group = 1), linewidth = 1.2) +
+  geom_errorbar(aes(ymin = anxiety - anx_se, 
+                   ymax = anxiety + anx_se),
+                width = .2) +
+  theme_bw(base_size = 15)
+
+p2 <-ggplot(data = means, aes(x = grade, y = depression)) +
+  geom_point(size = 4) +
+  geom_line(aes(group = 1), linewidth = 1.2) +
+  geom_errorbar(aes(ymin = depression - dep_se, 
+                    ymax = depression + dep_se),
+                width = .2) +
+  theme_bw(base_size = 15)
+
+p1 + p2
+```
+
+Treating anxiety and depression as multivariate outcomes, we can also plot their bivariate means.
 ```{r addhealth-means-plot}
-#| fig.cap = "Means of anxiety and depression by grade, with $\\pm 1$ error bars"
+#| fig.cap = "Joint plot of means of anxiety and depression by grade, with $\\pm 1$ standard error bars."
 ggplot(data = means, aes(x = anxiety, y = depression, 
                          color = grade)) +
   geom_point(size = 3) +
@@ -155,39 +199,85 @@ covEllipses(AddHealth[, 3:2], group = AddHealth$grade,
             fill = TRUE, fill.alpha = 0.05)
 ```
 
-## Fit the MLM
+## Fit the MLM {-}
+
 
 Now, let's fit the MLM for both responses jointly in relation to `grade`. The null hypothesis is that
 the means for anxiety and depression are the same at all six grades,
 $$
-H_0 : \mathbf{\mu}_7 = \mathbf{\mu}_8 = \cdots = \mathbf{\mu}_{12} \; .
+H_0 : \mathbf{\mu}_7 = \mathbf{\mu}_8 = \cdots = \mathbf{\mu}_{12} \; ,
+$$
+or equivalently, that all coefficients except the intercept in the model \@ref(eq:AH-mod) are zero,
+$$
+H_0 : \boldsymbol{\beta}_1 =  \boldsymbol{\beta}_2  = \cdots =  \boldsymbol{\beta}_5 = \boldsymbol{0} \; .
 $$
+
 The overall test, with 5 degrees of freedom is diffuse, in that it can be rejected if any
 pair of means differ.
-Given that `grade` is an ordered factor, it makes sense to examine narrower hypotheses
-of linear and nonlinear trends.
 
 `car::Anova()` gives a simple display of the multivariate test, using the Pillai trace criterion.
-The `summary()` method for this gives all four test statistics.
 ```{r addhealth-mlm}
 AH.mlm <- lm(cbind(anxiety, depression) ~ grade, data = AddHealth)
 
 # overall test of `grade`
 Anova(AH.mlm)
+```
 
-# show separate multivariate tests
+The `summary()` method for this gives all four test statistics.
+```{r addhealth-summary}
+## show separate multivariate tests
 summary(Anova(AH.mlm)) |> print(SSP = FALSE)
 ```
 
-## HE plot
+## Testing linear hypotheses {-}
+Given that `grade` is an ordered factor, it makes sense to examine narrower hypotheses
+of linear and nonlinear trends. `car::linearHypothesis()` provides a general way to
+do this, giving multivariate tests for one or more linear combinations of coefficients.
+
+The joint test of the linear coefficients for anxiety and depression, 
+$H_0 : \boldsymbol{\beta}_1 = \boldsymbol{0}$ is highly significant,
+```{r addhealth-linhyp1}
+## linear effect
+linearHypothesis(AH.mlm, "grade.L") |> print(SSP = FALSE)
+```
+
+The test of the quadratic coefficients $H_0 : \boldsymbol{\beta}_2 = \boldsymbol{0}$
+indicates significant curvature in trends across grade, as we saw in the plots of their means,
+Figures \@ref(fig:addhealth-means-each) and \@ref(fig:addhealth-means-plot).
+```{r addhealth-linhyp2}
+## quadratic effect
+linearHypothesis(AH.mlm, "grade.Q") |> print(SSP = FALSE)
+```
+
+We can also test the hypothesis that all higher order terms beyond the quadratic are zero,
+H_0 : \boldsymbol{\beta}_3 =  \boldsymbol{\beta}_4 =  \boldsymbol{\beta}_5 = \boldsymbol{0}$:
+```{r addhealth-linhyp3}
+## joint test of all higher terms
+linearHypothesis(AH.mlm, rownames(coef(AH.mlm))[3:5]) |> print(SSP = FALSE)
+```
+
+
+## HE plot {-}
+
+Figure \@ref(fig:addhealth-heplot) shows the HE plot for this problem.
+The **H** ellipse for the `grade` effect reflects the increasing pattern in the means across grades:
+depression increases along with anxiety. The error **E** ellipse reflects the pooled with-group
+covariance, the weighted average of those shown in \@ref{fig-addhealth-covellipse}.
+
+You can include any linear hypotheses or contrasts using the `hypotheses` argument.
+The **H** ellipses for the 1 df linear and quadratic terms plot as lines.
+The linear effect corresponds to the major axis of the **H** ellipse for the `grade` effect.
 
-Figure \@ref(fig:addhealth-heplot)
+Again, to preserve resolution in the plot, I show the **H** and **E** ellipses with only 10% coverage,
+but it is only the relative size of an **H** ellipse relative to **E** that matters:
+With the default significance scaling, any effect is significant _iff_ the
+corresponding **H** ellipse projects anywhere outside the **E** ellipse.
 
 ```{r addhealth-heplot, echo = -1}
-#| fig.cap = "HE plot for the model `AH.mlm`."
+#| fig.cap = "HE plot for the multivariate model `AH.mlm`, showing the overall effect of `grade` as well as tests for the linear and quadratic terms in this model."
 op <- par(mar = c(4,4,1,1)+0.1)
 heplot(AH.mlm, 
-       hypotheses=c("grade.L", "grade.Q"), 
+       hypotheses = c("grade.L", "grade.Q"), 
        hyp.labels = c("linear", "quad"),
        label.pos = c(4, 3, 1, 1),
        fill=c(TRUE, FALSE),
@@ -219,7 +309,7 @@ This example illustrates:
 * the difference between "effect" scaling and "evidence" (significance) scaling, and 
 * visualizing composite linear hypotheses.
 
-### Multivariate tests
+## Multivariate tests {-}
 
 We begin with an overall MANOVA for the two-way MANOVA model. In all these analyses, we use
 `car::Anova()` for significance tests rather than `stats::anova()`, which only provides
@@ -262,7 +352,7 @@ Traditional univariate plots of the means for each variable separately
 are useful, but they don't allow visualization of the
 *relations* among the response variables.
 
-### HE plots
+## HE plots {-}
 We can visualize these
 effects for pairs of variables in an HE plot, showing the "size" and
 orientation of hypothesis variation ($\mathbf{H}$) in relation to error
@@ -373,11 +463,13 @@ Thus, for example, the joint test of both main effects tests the parameters
 `rateHigh` and `additiveHigh`.
 
 ```{r, plastic-tests}
-print(linearHypothesis(plastic.mod, c("rateHigh", "additiveHigh"), title="Main effects"), 
-      SSP=FALSE)
+linearHypothesis(plastic.mod, c("rateHigh", "additiveHigh"), 
+                 title="Main effects") |>
+  print(SSP=FALSE)
 
-print(linearHypothesis(plastic.mod, c("rateHigh", "additiveHigh", "rateHigh:additiveHigh"), 
-                       title="Groups"), SSP=FALSE)
+linearHypothesis(plastic.mod, c("rateHigh", "additiveHigh", "rateHigh:additiveHigh"),
+                 title="Groups") |>
+  print(SSP=FALSE)
 ```
 
 Correspondingly, we can display these tests in the HE plot by specifying these tests in the
@@ -453,7 +545,7 @@ The main questions of interest were:
 * Does attractiveness of the "defendant" influence the sentence or perceived seriousness of the crime?  
 * Does attractiveness interact with the nature of the crime?
 
-## Manipulation check
+## Manipulation check {-}
 
 But first, as a check on the manipulation of attractiveness,
 we try to assess the ratings of the photos in relation to the 
@@ -550,7 +642,7 @@ is determined nearly entirely by ratings on `happy` and `independent`.
 This display gives a relatively simple account of the results of the MANOVA
 and the relations of each of the ratings to discrimination among the photos.
 
-## Main analysis
+## Main analysis {-}
 
 Proceeding to the main questions of interest, we carry out a two-way MANOVA of the responses
 `Years` and `Serious` in relation to the independent variables
@@ -800,4 +892,4 @@ but nowhere does it protrude beyond the boundary of the $\mathbf{E}$ ellipsoid.
 
 
 
-## References
+# References
diff --git a/vignettes/fig/manova-addhealth-means-each-1.png b/vignettes/fig/manova-addhealth-means-each-1.png