-
Notifications
You must be signed in to change notification settings - Fork 31
/
README.Rmd
154 lines (117 loc) · 6.32 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "Beeswarm-style plots with ggplot2"
output: github_document
---
```{r setup, echo=FALSE, message=FALSE}
knitr::opts_chunk$set(
fig.retina=2,
fig.width=6,
fig.height=4
)
```
[![Build Status](https://travis-ci.org/eclarke/ggbeeswarm.svg?branch=master)](https://travis-ci.org/eclarke/ggbeeswarm)
[![CRAN status](https://www.r-pkg.org/badges/version/ggbeeswarm)](https://cran.r-project.org/package=ggbeeswarm)
## Introduction
Beeswarm plots (aka column scatter plots or violin scatter plots) are a way of plotting points that would ordinarily overlap so that they fall next to each other instead. In addition to reducing overplotting, it helps visualize the density of the data at each point (similar to a violin plot), while still showing each data point individually.
`ggbeeswarm` provides two different methods to create beeswarm-style plots using [ggplot2](http://ggplot2.org). It does this by adding two new ggplot geom objects:
- `geom_quasirandom`: Uses a [van der Corput sequence](http://en.wikipedia.org/wiki/Van_der_Corput_sequence) or Tukey texturing (Tukey and Tukey "Strips displaying empirical distributions: I. textured dot strips") to space the dots to avoid overplotting. This uses [sherrillmix/vipor](https://github.com/sherrillmix/vipor).
- `geom_beeswarm`: Uses the [beeswarm](https://cran.r-project.org/web/packages/beeswarm/index.html) library to do point-size based offset.
Features:
- Can handle categorical variables on the y-axis (thanks @smsaladi, @koncina)
- Automatically dodges if a grouping variable is categorical and `dodge.width` is specified (thanks @josesho)
See the examples below.
## Installation
This package is on CRAN so install should be a simple:
```{r, eval=FALSE}
install.packages('ggbeeswarm')
```
If you want the development version from GitHub, you can do:
```{r, eval=FALSE}
devtools::install_github("eclarke/ggbeeswarm")
```
## Examples
Here is a comparison between `geom_jitter` and `geom_quasirandom` on the `iris` dataset:
```{r ggplot2-compare}
set.seed(12345)
library(ggplot2)
library(ggbeeswarm)
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
```
### geom_quasirandom()
Using `geom_quasirandom`:
```{r ggplot2-examples}
#default geom_quasirandom
ggplot(mpg,aes(class, hwy)) + geom_quasirandom()
# With categorical y-axis
ggplot(mpg,aes(hwy, class)) + geom_quasirandom()
# Some groups may have only a few points. Use `varwidth=TRUE` to adjust width dynamically.
ggplot(mpg,aes(class, hwy)) + geom_quasirandom(varwidth = TRUE)
# Automatic dodging
sub_mpg <- mpg[mpg$class %in% c("midsize", "pickup", "suv"),]
ggplot(sub_mpg, aes(class, displ, color=factor(cyl))) + geom_quasirandom(dodge.width=1)
```
#### Alternative methods
`geom_quasirandom` can also use several other methods to distribute points. For example:
```{r ggplot2-methods,tidy=TRUE}
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom(method='tukey') + ggtitle('Tukey texture')
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom(method='tukeyDense') + ggtitle('Tukey + density')
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom(method='frowney') + ggtitle('Banded frowns')
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom(method='smiley') + ggtitle('Banded smiles')
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom(method='pseudorandom') + ggtitle('Jittered density')
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm() + ggtitle('Beeswarm')
```
### geom_beeswarm()
Using `geom_beeswarm`:
```{r ggplot2-beeswarm}
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm(side = 1L)
ggplot(mpg,aes(class, hwy)) + geom_beeswarm(size=.5)
# With categorical y-axis
ggplot(mpg,aes(hwy, class)) + geom_beeswarm(size=.5)
# Also watch out for points escaping from the plot with geom_beeswarm
ggplot(mpg,aes(hwy, class)) + geom_beeswarm(size=.5) + scale_y_discrete(expand=expansion(add=c(0.5,1)))
ggplot(mpg,aes(class, hwy)) + geom_beeswarm(size=1.1)
# With automatic dodging
ggplot(sub_mpg, aes(class, displ, color=factor(cyl))) + geom_beeswarm(dodge.width=0.5)
```
#### Alternative methods
```{r ggplot2-beeswarm-alt}
df <- data.frame(
x = "A",
y = sample(1:100, 200, replace = TRUE)
)
ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "swarm") + ggtitle('method = "swarm" (default)')
ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "compactswarm") + ggtitle('method = "compactswarm"')
ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "hex") + ggtitle('method = "hex"')
ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "square") + ggtitle('method = "square"')
ggplot(df, aes(x = x, y = y)) + geom_beeswarm(cex = 2.5, method = "center") + ggtitle('method = "center"')
```
#### Different point distribution priority
```{r ggplot2-priority}
#With different beeswarm point distribution priority
dat<-data.frame(x=rep(1:3,c(20,40,80)))
dat$y<-rnorm(nrow(dat),dat$x)
ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2) + ggtitle('Default (ascending)') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))
ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2,priority='descending') + ggtitle('Descending') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))
ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2,priority='density') + ggtitle('Density') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))
ggplot(dat,aes(x,y)) + geom_beeswarm(cex=2,priority='random') + ggtitle('Random') + scale_x_continuous(expand=expansion(add=c(0.5,.5)))
```
#### Corral runaway points
```{r ggplot2-corral}
set.seed(1995)
df2 <- data.frame(
y = rnorm(1000),
id = sample(c("G1", "G2", "G3"), size = 1000, replace = TRUE)
)
p <- ggplot(df2, aes(x = id, y = y, colour = id))
# use corral.width to control corral width
p + geom_beeswarm(cex = 2.5, corral = "none", corral.width = 0.9) + ggtitle('corral = "none" (default)')
p + geom_beeswarm(cex = 2.5, corral = "gutter", corral.width = 0.9) + ggtitle('corral = "gutter"')
p + geom_beeswarm(cex = 2.5, corral = "wrap", corral.width = 0.9) + ggtitle('corral = "wrap"')
p + geom_beeswarm(cex = 2.5, corral = "random", corral.width = 0.9) + ggtitle('corral = "random"')
p + geom_beeswarm(cex = 2.5, corral = "omit", corral.width = 0.9) + ggtitle('corral = "omit"')
```
------
Authors: Erik Clarke, Scott Sherrill-Mix, and Charlotte Dawson