-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.Rmd
340 lines (264 loc) · 13.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
---
output: github_document
editor_options:
markdown:
wrap: 72
---
<!-- README.md is generated from README.Rmd. Please edit that file and knit again -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
warning = FALSE, # avoid warnings and messages in the output
message = FALSE,
collapse = TRUE,
fig.width = 4,
fig.height = 4,
dpi = 96,
comment = "#>",
fig.path = "man/figures/README-"
)
par(mar=c(3,3,1,1)+.1)
```
```{r, echo=FALSE}
library(heplots)
```
<!-- badges: start -->
[![Lifecycle:
stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/heplots)](http://cran.r-project.org/package=heplots)
[![R_Universe](https://friendly.r-universe.dev/badges/heplots)](https://friendly.r-universe.dev)
[![Last Commit](https://img.shields.io/github/last-commit/friendly/heplots)](https://github.com/friendly/heplots)
[![](http://cranlogs.r-pkg.org/badges/grand-total/heplots)](https://cran.r-project.org/package=heplots)
[![DOI](https://zenodo.org/badge/13908453.svg)](https://zenodo.org/badge/latestdoi/13908453)
[![](https://img.shields.io/badge/pkgdown%20site-blue)](https://friendly.github.io/heplots)
<!-- badges: end -->
# heplots <img src="man/figures/logo.png" height="200" style="float:right; height:200px;"/>
## **Visualizing Hypothesis Tests in Multivariate Linear Models**
<!-- Version 1.7.2 -->
Version `r getNamespaceVersion("heplots")`
## Description
The `heplots` package provides functions for visualizing hypothesis
tests in multivariate linear models (MANOVA, multivariate multiple
regression, MANCOVA, and repeated measures designs).
HE plots represent sums-of-squares-and-products matrices for linear
hypotheses (**H**) and for error (**E**) using ellipses (in two
dimensions), ellipsoids (in three dimensions), or by line segments in
one dimension. For the theory and applications, see:
- [Friendly (2007)](http://datavis.ca/papers/jcgs-heplots.pdf) for the basic theory on which this is based.
- [Fox, Friendly and Monette (2009)]((http://datavis.ca/palers/FoxFriendlyMonette-2009.pdf)) for a brief introduction,
- [Friendly (2010)](http://www.jstatsoft.org/v37/i04/paper) for the application of these ideas to repeated
measure designs,
- [Friendly, Monette and Fox (2013)](http://datavis.ca/palers/ellipses-STS402.pdf) for a general discussion of the role of elliptical geometry in statistical understanding,
- [Friendly & Sigal (2017)](https://doi.org/10.20982/tqmp.13.1.p020) for an applied R tutorial,
- [Friendly & Sigal (2018)](https://www.datavis.ca/papers/EqCov-TAS.pdf) for theory and examples of visualizing equality of covariance matrices.
If you use this work in teaching or research, please cite it as given by `citation("heplots")` or see [Citation](authors.html#citation).
Other topics now addressed here include:
- robust MLMs, using iteratively re-weighted least squared to
down-weight observations with large multivariate residuals,
`robmlm()`.
- `Mahalanobis()` calculates classical and robust Mahalanobis squared
distances using MCD and MVE estimators of center and covariance.
- visualizing tests for equality of covariance matrices in MLMs (Box's
M test), `boxM()` and `plot.boxM()`.
- $\chi^2$ Q-Q plots for MLMs (`cqplot()`) to detect outliers and
assess multivariate normality of residuals.
- bivariate coefficient plots showing elliptical confidence regions
(`coefplot()`).
In this respect, the `heplots` package now aims to provide a wide range
of tools for analyzing and visualizing multivariate response linear
models, together with other packages:
<a href="https://friendly.github.io/candisc/"><img src='https://raw.githubusercontent.com/friendly/candisc/master/candisc-logo.png' height='80' style="float:right; height:80px;"></a>
- The related [`candisc`](https://friendly.github.io/candisc/) package
provides HE plots in **canonical discriminant** space, the space of
linear combinations of the responses that show the maximum possible
effects and for canonical correlation in multivariate regression
designs.
<a href="https://friendly.github.io/mvinfluence/"><img src='https://raw.githubusercontent.com/friendly/mvinfluence/master/man/figures/logo.png' height='80' style="float:right; height:80px;"></a>
- Another package,
[`mvinfluence`](https://friendly.github.io/mvinfluence/), provides
diagnostic measures and plots for **influential observations** in MLM
designs. See the [package documentation](https://friendly.github.io/mvinfluence/)
for details.
Several tutorial vignettes are also included. See
`vignette(package="heplots")`.
## Installation
+-------------------+-------------------------------------------------+
| CRAN version | `install.packages("heplots")` |
+-------------------+-------------------------------------------------+
| Development | `remotes::install_github("friendly/heplots")` |
| version | |
+-------------------+-------------------------------------------------+
## HE plot functions
The graphical functions contained here all display multivariate model
effects in variable (**data**) space, for one or more response variables
(or contrasts among response variables in repeated measures designs).
- `heplot()` constructs two-dimensional HE plots for model terms and
linear hypotheses for pairs of response variables in multivariate
linear models.
- `heplot3d()` constructs analogous 3D plots for triples of response
variables.
- The `pairs` method, `pairs.mlm()` constructs a scatterplot matrix of
pairwise HE plots.
- `heplot1d()` constructs 1-dimensional analogs of HE plots for model
terms and linear hypotheses for single response variables.
## Other functions
- `glance.mlm()` extends `broom::glance.lm()` to multivariate response
models, giving a one-line statistical summary for each response
variable.
- `boxM()` Calculates Box's *M* test for homogeneity of covariance
matrices in a MANOVA design. A `plot` method displays a visual
representation of the components of the test. Associated with this,
`bartletTests()` and `levineTests()` give the univariate tests of
homogeneity of variance for each response measure in a MLM.
- `covEllipses()` draw covariance (data) ellipses for one or more
group, optionally including the ellipse for the pooled within-group
covariance.
### Repeated measure designs
For repeated measure designs, between-subject effects and within-subject
effects must be plotted separately, because the error terms (**E**
matrices) differ. For terms involving within-subject effects, these
functions carry out a linear transformation of the matrix **Y** of
responses to a matrix **Y M**, where **M** is the model matrix for a
term in the intra-subject design and produce plots of the H and E
matrices in this transformed space. The vignette `"repeated"` describes
these graphical methods for repeated measures designs. (At present, this
vignette is only available at [HE plots for repeated measures
designs](http://www.jstatsoft.org/v37/i04/paper).)
## Datasets
The package also provides a large collection of data sets illustrating a
variety of multivariate linear models of the types listed above,
together with graphical displays. The table below classifies these with
method tags. Their names are linked to their documentation with graphical output on the
`pkgdown` website, [<http://friendly.github.io/heplots>].
```{r datasets, echo=FALSE}
library(here)
library(dplyr)
library(tinytable)
#dsets <- read.csv(here::here("extra", "datasets.csv"))
dsets <- read.csv("https://raw.githubusercontent.com/friendly/heplots/master/extra/datasets.csv")
dsets <- dsets |> dplyr::select(-X) |> arrange(tolower(dataset))
# link dataset to pkgdown doc
refurl <- "http://friendly.github.io/heplots/reference/"
dsets <- dsets |>
mutate(dataset = glue::glue("[{dataset}]({refurl}{dataset}.html)"))
tinytable::tt(dsets)
#knitr::kable(dsets)
```
## Examples
This example illustrates HE plots using the classic `iris` data set. How
do the means of the flower variables differ by `Species`? This dataset
was the impetus for R. A. Fisher (1936) to propose a method of
discriminant analysis using data collected by Edgar Anderson (1928).
Though some may rightly deprecate Fisher for being a supporter of
eugenics, Anderson's `iris` dataset should not be blamed.
A basic HE plot shows the **H** and **E** ellipses for the first two
response variables (here: `Sepal.Length` and `Sepal.Width`). The
multivariate test is significant (by Roy's test) *iff* the **H** ellipse
projects *anywhere* outside the **E** ellipse.
The positions of the group means show how they differ on the two
response variables shown, and provide an interpretation of the
orientation of the **H** ellipse: it is long in the directions of
differences among the means.
```{r iris1}
#| echo=-1,
#| out.width="70%"
par(mar=c(4,4,1,1)+.1)
iris.mod <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~
Species, data=iris)
heplot(iris.mod)
```
### Contrasts
Contrasts or other linear hypotheses can be shown as well, and the
ellipses look better if they are filled. We create contrasts to test the
differences between `versacolor` and `virginca` and also between
`setosa` and the average of the other two. Each 1 df contrast plots as
degenerate 1D ellipse-- a line.
Because these contrasts are orthogonal, they add to the total 2 df
effect of `Species`. Note how the first contrast, labeled `V:V`,
distinguishes the means of *versicolor* from *virginica*; the second
contrast, `S:VV` distinguishes `setosa` from the other two.
```{r iris2, out.width="70%"}
par(mar=c(4,4,1,1)+.1)
contrasts(iris$Species)<-matrix(c(0, -1, 1,
2, -1, -1), nrow=3, ncol=2)
contrasts(iris$Species)
iris.mod <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~
Species, data=iris)
hyp <- list("V:V"="Species1","S:VV"="Species2")
heplot(iris.mod, hypotheses=hyp,
fill=TRUE, fill.alpha=0.1)
```
### All pairwise HE plots
All pairwise HE plots are produced using the `pairs()` method for MLM
objects.In the plot, note how the means of most pairs of variables are very
highly correlated, in the order Setosa < Versicolor < Virginica, but this
pattern doesn't hold for relations with `Sepal.Width`.
```{r, iris3}
#| out.width="100%",
#| fig.height = 6,
#| fig.width = 6
pairs(iris.mod, hypotheses=hyp, hyp.labels=FALSE,
fill=TRUE, fill.alpha=0.1)
```
### Canonical discriminant view
For more than two response variables, a multivariate effect can be viewed more simply by projecting
the data into canonical space --- the linear combinations of the responses which show the greatest
differences among the group means relative to within-group scatter. The computations are performed
with the [`candisc`](http://github.com/friendly/candisc) package, which has an `heplot.candisc()`
method.
```{r iris-can0}
library(candisc)
iris.can <- candisc(iris.mod) |> print()
```
The HE plot in canonical space shows that the differences among species are nearly
entirely one-dimensional. The weights for the variables on the first dimension
show how `Sepal.Width` differs from the other size variables.
```{r iris-can}
#| out.width = "60%",
#| echo = -1
par(mar=c(4,4,1,1)+.1)
# HE plot in canonical space
heplot(iris.can, var.pos = 1, scale = 40)
```
### Covariance ellipses
MANOVA relies on the assumption that within-group covariance matrices are all equal.
It is useful to visualize these in the space of some of the predictors.
`covEllipses()` provides this both for classical and robust (`method="mve"`) estimates.
The figure below shows these for the three Iris species and the
pooled covariance matrix, which is the same as the **E** matrix used
in MANOVA tests.
```{r iris4, out.width="80%"}
par(mar=c(4,4,1,1)+.1)
covEllipses(iris[,1:4], iris$Species)
covEllipses(iris[,1:4], iris$Species,
fill=TRUE, method="mve", add=TRUE, labels="")
```
## References
Anderson, E. (1928). The Problem of Species in the Northern Blue Flags,
Iris versicolor L. and Iris virginica L. *Annals of the Missouri
Botanical Garden*, **13**, 241--313.
Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic
Problems. *Annals of Eugenics*, **8**, 379--388.
Friendly, M. (2006).
[Data Ellipses, HE Plots and Reduced-Rank Displays for Multivariate Linear Models:
SAS Software and Examples.](https://www.jstatsoft.org/article/view/v017i06)
_Journal of Statistical Software_, **17**, 1-42.
Friendly, M. (2007). [HE plots for Multivariate General Linear Models](http://datavis.ca/papers/jcgs-heplots.pdf).
_Journal of Computational and Graphical Statistics_, **16**(2) 421-444.
[DOI](http://dx.doi.org/10.1198/106186007X208407).
Fox, J., Friendly, M. & Monette, G. (2009). [Visualizing hypothesis
tests in multivariate linear models: The heplots package for
R](http://datavis.ca/palers/FoxFriendlyMonette-2009.pdf) *Computational
Statistics*, **24**, 233-246.
Friendly, M. (2010). [HE plots for repeated measures
designs](http://www.jstatsoft.org/v37/i04/paper). *Journal of
Statistical Software*, **37**, 1--37.
Friendly, M.; Monette, G. & Fox, J. (2013). [Elliptical Insights:
Understanding Statistical Methods Through Elliptical
Geometry](http://datavis.ca/palers/ellipses-STS402.pdf) *Statistical
Science*, **28**, 1-39.
Friendly, M. & Sigal, M. (2017). [Graphical Methods for Multivariate
Linear Models in Psychological Research: An R
Tutorial.](https://doi.org/10.20982/tqmp.13.1.p020) *The Quantitative
Methods for Psychology*, **13**, 20-45.
Friendly, M. & Sigal, M. (2018): [Visualizing Tests for Equality of
Covariance Matrices](https://www.datavis.ca/papers/EqCov-TAS.pdf), _The American Statistician_, [DOI](https://doi.org/10.1080/00031305.2018.1497537)