-
Notifications
You must be signed in to change notification settings - Fork 1
/
ch-04.Rmd
111 lines (73 loc) · 3.63 KB
/
ch-04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
title: "Solutions to Chapter 4 Exercises"
author: "Howard Baek"
date: "Last compiled on `r format(Sys.time(), '%B %d, %Y')`"
output: html_document
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, include=FALSE}
library(tidyverse)
```
### 4.5 Exercises
1. Draw a boxplot of `hwy` for each value of `cyl`, without turning `cyl` into a factor. What extra aesthetic do you need to set?
```{r}
mpg %>%
ggplot(aes(cyl, hwy, group = cyl)) +
geom_boxplot()
```
- Since the variable `cyl` is an integer, you need to set `group = cyl`. According to the [ggplot2 docs](https://ggplot2.tidyverse.org/reference/aes_group_order.html), "when no discrete variable is used in the plot, you will need to explicitly define the grouping structure by mapping group to a variable that has a different value for each group."
2. Modify the following plot so that you get one boxplot per integer value of `displ`.
```{r}
ggplot(mpg, aes(displ, cty)) +
geom_boxplot()
```
- As discussed in the previous question, you need to set `group = displ` because `displ` is not a discrete variaable.
```{r}
ggplot(mpg, aes(displ, cty, group = displ)) +
geom_boxplot()
```
3. When illustrating the difference between mapping continuous and discrete colours to a line, the discrete example needed `aes(group = 1)`. Why? What happens if that is omitted? What's the difference between `aes(group = 1)` and `aes(group = 2)`? Why?
- Let's examine the example in the book:
```{r}
df <- data.frame(x = 1:3, y = 1:3, colour = c(1,3,5))
ggplot(df, aes(x, y, colour = factor(colour))) +
geom_line(aes(group = 2), size = 2) +
geom_point(size = 5)
ggplot(df, aes(x, y, colour = colour)) +
geom_line(aes(group = 1), size = 2) +
geom_point(size = 5)
```
- When omitted, we don't get a line that connects all these points. In fact, we get a message saying "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?" This happens because we included the colour aesthetic and made each color include only one observation. In order to tell ggplot that all these points are in the same group, we need to include `aes(group = 1)`. It doesn't matter what group is equal to. As long as we include all the points in the same group, we should be able to connect the points with a line.
4. How many bars are in each of the following plots?
```{r}
ggplot(mpg, aes(drv)) +
geom_bar()
# There are 3 bars in this plot.
ggplot(mpg, aes(drv, fill = hwy, group = hwy)) +
geom_bar()
# In this plot, the “shaded bars” for each drv have been constructed by stacking many distinct bars on top of each other, each filled with a different shade based on the value of hwy.
mpg2 <- mpg %>%
arrange(hwy) %>%
mutate(id = seq_along(hwy))
ggplot(mpg2, aes(drv, fill = hwy, group = id)) +
geom_bar()
# In this plot, the “shaded bars” for each drv have been constructed by stacking many distinct bars on top of each other, each filled with a different shade based on the value of hwy.
```
5. Install the babynames package. It contains data about the popularity of babynames in the US. Run the following code and fix the resulting graph. Why does this graph make me unhappy?
```{r}
# install.packages("babynames")
library(babynames)
hadley <- dplyr::filter(babynames, name == "Hadley")
ggplot(hadley, aes(year, n)) +
geom_line()
```
- To fix this, you need to differentiate the sex- Male and Female.
```{r}
ggplot(hadley, aes(year, n, group = sex, color = sex)) +
geom_line()
```
- The reason this graph makes Hadley unhappy is there are alot more female babies named Hadley than male babies!