-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdata_visualization_example.Rmd
815 lines (617 loc) · 19.4 KB
/
data_visualization_example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
---
title: "3 Data Visualization Example"
author: "Russ Conte"
date: "8/4/2018"
output: html_document
---
This chapter will be a look at ggplot2. ggplot2 is part of tidyverse. We will look at the mpg data set.
```{r, echo=FALSE}
library(ggplot2)
mpg
```
#3.2 First Steps
Let's start by looking at a scatterplot of engine size (displacement) vs highway mileage (hwy), and answer this question:
Do cars with bigger enginues user more gas? Let's look at the graph of mpg vs engine size
Let's start by looking at a ggplot of the data:
```{r fig.width=7, fig.height=4, echo=FALSE}
library(ggplot2)
ggplot(data = mpg,mapping = aes(x=displ, y=hwy)) +
geom_point()
```
###3.2.4 Exercises
1. Run ggplot(data = mpg). What do you see?
```{r}
library(ggplot2)
ggplot(data=mpg)
```
Answer: A blank plot
2. How many rows are in mpg? How many columns?
```{r}
library(ggplot2)
print(paste0("number of rows in mpg = ",nrow(mpg)))
print(paste0("number of columns in mpg = ",ncol(mpg)))
```
What does the drv variable describe? Read the help for ?mpg to find out.
```{r}
library(ggplot2)
?mpg
```
The drv variable describes the types of drive systems on the cars:
f = front-wheel drive, r = rear wheel drive, 4 = 4wd
4. Make a scatterplot of hwy vs cyl.
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_point(mapping = aes(x=hwy, y=cyl))
```
5. What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
```{r}
library(ggplot2)
ggplot(mpg, aes(x=class, y=drv)) +
geom_point()
```
The plot of class vs drv does not provide useful information.
#3.3 Aesthetic Mappings
Separate class of the cars by color (note aes in geom_point to get the color):
```{r}
library(ggplot2)
ggplot(mpg, aes(y=hwy, x=displ)) +
geom_point(aes(color=class))
```
What does the data look like if we map class to size?
```{r}
library(ggplot2)
ggplot(mpg, aes(y=hwy, x=displ)) +
geom_point(aes(size=class))
```
We can also map class to alpha (transparency), or shape, we follows:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
```
Set the entire data set to the color blue. Note that the color command goes outside of the aes command:
```{r}
library(ggplot2)
ggplot(mpg)+
geom_point(mapping=aes(y=hwy, x=displ), color="blue")
```
### 3.3.1 Exercises
1. What's wrong with this code? Why aren't the points blue?
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```
The points are not blue because the color command goes *outside* of the aes command if it pertains to all points:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
```
2. Which variables in mpg are categorical? Which variables are continuous?
```{r}
library(ggplot2)
str(mpg)
```
Categorical: Manufacturer, Model, Trans, Drv, Fl, Class <br>
Continuous: displ, year, cyl, cty, hwy
3. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
Three examples of continuous variables:
```{r, warning=FALSE}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=displ, y=cty), color=mpg$hwy)
ggplot(mpg) +
geom_point(aes(x=displ, y=cty), size=mpg$hwy)
ggplot(mpg) +
geom_point(aes(x=displ, y=cty), shape=mpg$hwy)
```
An example of discrete variable mapping, this gives multiple errors and halts execution:
4. What happens if you map the same variable to multiple aesthetics?
```{r}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=mpg$displ, y=mpg$cty), color=mpg$displ, shape=mpg$displ, size=mpg$displ)
```
It works, lots of layers on the same plot
5. What does the stroke aesthetic do? What shapes does it work with?
```{r}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=mpg$displ, y=mpg$cty), stroke=mpg$cty)
```
It adds stroke to the shape - the outline surrounding a shape
6 What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)?
This set of commands returns errors and halts execution
# 3.5 Facets
When doing a facet on one variable (such as class), use facet_wrap:
```{r}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=mpg$displ, y=mpg$cty)) +
facet_wrap(~class, nrow=2)
```
when using 2 factors, use facet_grid (note this is set to 4 rows, but that can be changed):
```{r}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=mpg$displ, y=mpg$cty)) +
facet_wrap(~cty, nrow=4)
```
### 3.5.1 Exercises
1. What happens if you facet on a continuous variable, for example milage in the city? (noted as cty in the data)
```{r}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=mpg$displ, y=mpg$cty)) +
facet_wrap(~cty, nrow=4)
```
2 What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?
```{r}
library(ggplot2)
ggplot(mpg) +
geom_point(aes(x=mpg$displ, y=mpg$cty)) +
facet_grid(drv~cyl)
```
No values match the intersection of those variables where the plots are blank
3 What plots does the following code make? What does . do?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv~.) # facets on the drv variable
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv~.)
```
This creates facets on the drv variable
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
```
This plot creates facets on the number of cylinders
Let's remove the . and see what is different:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(~cyl)
```
hmmm, no differences that I can see.
# 3.6 Geometric Objects
Left plot in the book:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
Right plot in the book:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
How to set up different linetypes by a factor (in this case the drv variable, the car's drivetrain)
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_smooth(mapping = aes(x=displ, y=cty, linetype=drv))
```
Also vary by color according to color:
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_smooth(mapping = aes(x=displ, y=cty, linetype=drv, color=drv)) +
geom_point(mapping = aes(x=displ, y=cty, color=drv))
```
Let's look at grouping and how that impacts the plot:
First, no grouping:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
```
Now group by drive system:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))
```
```{r}
library(ggplot2)
ggplot(data=mpg)+
geom_smooth(mapping = aes(x=displ, y=hwy, color=drv),
show.legend = FALSE)
```
Here is an example of showing multiple geoms on the same plot:
```{r}
library(ggplot2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
ggplot(data=mpg,mapping = aes(x=displ, y=cty)) +
geom_point(mapping = aes(color=class)) +
geom_smooth()
```
Let's look at how to filter (more on that later) for a specific variable:
```{r}
library(ggplot2)
library(dplyr)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
```
## 3.6.1 Exercises
1. What geom would you use to draw a line chart? geom_line()
A boxplot? geom_box()
A histogram? geom_hist()
An area chart? geom_area()
```{r}
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_line(mapping = aes(x=displ, y=hwy))
#boxplot
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_boxplot()
#histogram
ggplot(data = mpg, mapping = aes(x=hwy)) +
geom_histogram()
# area cheart
ggplot(data = mpg, mapping = aes(x=hwy, y=displ)) +
geom_area()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
```
2. Run this code in your head and predict what it will do, then see what it actually looks like:
```{r}
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
```
3. What does show.legend = FALSE do? Removes the legend for the graph
What happens if you remove it? Show the legend, if there is one
Why do you think I used it earlier in the chapter? To show the legend :)
4. What does the se argument to geom_smooth() do?
It shows the standard error for the line/point/etc.
here is se=FALSE:
```{r}
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
```
Here is se=TRUE
```{r}
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = TRUE)
```
5. Will these two graphs look different? Why/why not? - they will look the same.
```{r}
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
```
6. Recreate the R code necessary to generate the following graphs. (A few of these were a very fun challenge!)
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ, y=hwy)) +
geom_smooth(mapping=aes(x=displ, y=hwy),se=FALSE)
```
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ, y=hwy)) +
geom_smooth(mapping=aes(x=displ, y=hwy, group=drv),se=FALSE)
```
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ, y=hwy, color=drv)) +
geom_smooth(mapping=aes(x=displ, y=hwy),se=FALSE)
```
```{r}
library(ggplot2)
ggplot(data=mpg) +
geom_point(mapping=aes(x=displ, y=hwy, color=drv)) +
geom_smooth(mapping=aes(x=displ, y=hwy, group=drv, linetype=drv),se=FALSE)
```
Note - I have not yet been able to reproduce the sixth graph in that set.
# 3.7 Statistical Transformations
Interesting question - where do the y-axis values come from in this plot, since they are not in the data set?
```{r}
library(ggplot2)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
```
The previous chart can be reproduced using stat_count as well as geom_bar:
```{r}
library(ggplot2)
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
```
An example where the y value is in the data set alreadey (note the use of stat="identity":
```{r}
library(tidyverse)
demo <- tribble(
~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551
)
ggplot(data=demo) +
geom_bar(mapping = aes(x=cut, y=freq), stat="identity")
```
It is possible to override the defauls and show proportions (note the use of ..prop..):
```{r}
library(ggplot2)
ggplot(data=diamonds)+
geom_bar(mapping = aes(x=cut, y=..prop.., group=1))
```
Here is another way to calculate a "stat summary", to summarize y-values for distinct x-values:
```{r}
```
```{r}
library(ggplot2)
ggplot(data=diamonds) +
stat_summary(
mapping = aes(x=cut, y=depth),
fun.ymin = min,
fun.ymax = max,
fun.y=median
)
```
## 3.7.1 Exercises
1. What is the default geom associated with stat_summary()?
```{r}
library(ggplot2)
?stat_summary
```
stat_summary_bin(mapping = NULL, data = NULL, geom = "pointrange"...
so the default geom is pointrange
How could you rewrite the previous plot to use that geom function instead of the stat function?
```{r}
library(tidyverse)
ggplot(data=diamonds) +
geom_pointrange(
mapping = aes(x=cut, y=depth),
ymin = min(diamonds$depth),
ymax = max(diamonds$depth)
)
```
2. What does geom_col() do? How is it different to geom_bar()?
```{r}
library(tidyverse)
?geom_col
```
From the help file: geom_col uses stat_identity: it leaves the data as is.
3. Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?
The names are consistent:
(geom_bar OR geom_col) and stat_count
geom_boxplot and stat_boxplot
geom_bin2d and stat_bin_2d
geom_contour and stat_contour
geom_count and stat_sum
geom_density and stat_density
geom_density_2d and stat_density_2d
geom_hex and stat_bin_hex
(geom_freqpoly OR geom_histogram) and stat_bin
geom_qq and stat_qq
geom_quantile and stat_quantile
geom_ribbon and geom_area
geom_segment and geom_curve
geom_smooth and stat_smooth
geom_label and geom_text
geom_violin and stat_ydensity
4. What variables does stat_smooth() compute? What parameters control its behaviour?
```{r}
library(tidyverse)
?stat_smooth
```
From the help file: Aids the eye in seeing patterns in the presence of overplotting.
5. In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?
```{r}
library(tidyverse)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))
```
The graphs do not show accurate proportions, but using group=1 fixes this problem:
```{r}
library(tidyverse)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop.., group=1))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group=1))
```
# 3.8 Position Adjustments
It's possible to add color to a bar chart (or other charts?) using color or fill. Note the color outline on the bars
on the first chart:
```{r}
library(tidyverse)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
```
Color bars:
```{r}
library(tidyverse)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
```
If color is mapped to a factor with multiple levels, such as Clarity in the Diamonds data set, we get:
```{r}
library(tidyverse)
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity))
```
Note there are three other options:
1. Position = Identity (note that Identity is the defaul for 2d geoms), it creates overlaps, which is why
alpha is reduced:
```{r}
library(tidyverse)
ggplot(data=diamonds, mapping = aes(x=cut, fill=clarity)) +
geom_bar(alpha=1/5, position="identity")
```
2. Position = "dodge" instead places overlapping objects <i>beside</i> one another
```{r}
library(tidyverse)
ggplot(data=diamonds, mapping = aes(x=cut, fill=clarity)) +
geom_bar(alpha=3/5, position="dodge")
```
3. Position = fill - this makes all the bars the same height
```{r}
library(tidyverse)
ggplot(data=diamonds, mapping = aes(x=cut, fill=clarity)) +
geom_bar(position="fill")
```
4. Position = "jitter" is very good for basic scatter plots. Recall our first scatterplot, it shows 126 points
and we can not tell (for example) if one pair of values has 109 points on the graph, even if the size
of the points are changed:
```{r}
library(tidyverse)
ggplot(mpg,mapping = aes(x=displ, y=hwy))+
geom_point(mapping = aes(x=displ, y=hwy),size=0.005)
```
but the actual data set has 234 observations. How to show all the observations? Use "jitter" and reduce
the point size:
```{r}
library(tidyverse)
ggplot(mpg,mapping = aes(x=displ, y=hwy))+
geom_point(mapping = aes(x=displ, y=hwy),position = "jitter", size=0.0005)
```
## 3.8.1 Exercises
1. What is the problem with this plot? How would you improve it?
```{r}
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
```
One solution to show all the points is to add jitter and reduce the point size:
```{r}
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point(mapping = aes(x = cty, y = hwy), position="jitter", size=0.0005)
```
2. What parameters to geom_jitter() control the amount of jittering?
```{r}
?geom_jitter
```
The position adjustment, color, size, width of jitter, height of jitter
3. Compare and contrast geom_jitter() with geom_count().
geom_jitter() adds variation to each point, so it's easier to see points
geom_count will count overlapping points
4. What’s the default position adjustment for geom_boxplot()? It is "dodge". Create a visualisation of the mpg dataset that demonstrates it.
(note that ggplot requires values for x an y, this substitutes "" for x, giving the boxplot for city mileage)
```{r}
ggplot(data=mpg) +
geom_boxplot(mapping = aes(x=mpg$trans, y=cty, group=mpg$trans), position="dodge")
```
Here is an example of the same graph using position = "identity":
```{r}
ggplot(data=mpg) +
geom_boxplot(mapping = aes(x=mpg$trans, y=cty, group=mpg$trans), position="identity")
```
# 3.9 Coordinate Systems
For example, coord_flip is good for horizontal boxplots. Here is a regular boxplot:
```{r}
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
```
Here is the same boxplot in a horizontal plot:
```{r}
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
```
Another example is coord_quickmap, which plots maps. This is a map of the lower 48 states with <i>incorrect</i> proportions:
```{r}
library(tidyverse)
install.packages("maps")
usa <- map_data("usa")
ggplot(usa, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black")
```
This is a map of the lower 48 states with correct proportions:
```{r}
library(tidyverse)
library("maps")
usa <- map_data("usa")
ggplot(usa, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black") +
coord_quickmap()
```
coord_polar uses polar coordinates, and that can reveal many interesting relationships that
might not be obvious from other charts, such as a bar chart:
```{r}
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
```
## 3.9.1 Exercises
1. Turn a stacked bar chart into a pie chart using polar_coord()
```{r}
library(tidyverse)
bar1 <- ggplot(data=diamonds, mapping = aes(x=cut, fill=clarity)) +
geom_bar(alpha=4/5, position="identity")
bar1+coord_polar()
```
2. What does labs() do? Read the documentation.
```{r}
?labs
```
Good labels will:
1. Ensure x and y axes are easily understood
2. Plot title will let people know the subject/topic of the plot
3. Captions can be used to help the reader understand the plot in more detail
3. What’s the difference between coord_quickmap() and coord_map()?
Here is coord_quickmap():
```{r}
library(tidyverse)
library("maps")
usa <- map_data("usa")
ggplot(usa, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black") +
coord_quickmap()
```
Here is an example of quickmap that shows the lower 48 states:
```{r}
library(tidyverse)
library(evaluate)
states <- map_data("state")
usamap <- ggplot(states, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black")
usamap + coord_quickmap()
```
4. What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?
```{r}
install.packages("ggplot2")
library(ggplot2)
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed()
```
This shows that highway and city mileage tend to go up together. As one goes up, the other goes up, too.
geom_abline() draws a straight line that shows the straight line relationship between highway and city mileage.
Why is coord_fixed() important?