forked from roncampbell/NICAR2022
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Handson_GISMapping.Rmd
614 lines (451 loc) · 16.6 KB
/
Handson_GISMapping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
---
title: "Intro to GIS / Mapping Visualization in R"
author: "Aaron Kessler"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
code_folding: show
df_print: kable
toc: yes
toc_float: yes
editor_options:
markdown:
wrap: sentence
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(tigris)
library(sf)
library(tmap)
library(tmaptools)
library(htmltools)
library(janitor)
library(rmapshaper)
library(here)
options(tigris_class = "sf")
```
# Getting Geospatial Data Into R
There are two primary ways to get geodata into R: through a saved file you import, or though a package that will help download it directly from the web.
R can handle almost any type of GIS data, including shapefiles, geodatabases, geoJson, etc.
We'll look at the SF package, one of the best packages for doing importing and processing data.
What are "Simple Features" (SF)?
The simple feature geodata format creates an object that behaves much like a dataframe in R, yet has spatial fields that contain the geographic information.
This represents a big improvement over previous ways of handling geospatial data in R.
You may see older references online to the SP package, which is the former way of doing geospatial work.
If you're just starting out, you're much better off focusing your efforts on the SF package from the get-go.
## Packages we'll be using
There are a bunch of different R packages designed to work with geospatial data.
We'll touch on a few of them here, primarily the tmap package, but there are many more.
Even ggplot2 itself now has functions to help handle sf objects!
Let's look as some actual code and examples to get started...
# An Example: Plotting Points
We'll use the tigris package to pull census boundary geo data into our session, for a state map of the US.
Note that at the end we'll discuss strategies for handling Alaska, Hawaii and Puerto Rico - for now we'll take them out for expediency's sake in the example below.
The tigris package is a wonderful resource for all kinds of boundary files several options for resolution - when using whole nation, 20m is usually better for individual states 5m may be preferable.
By setting options(tigris_class = "sf") at the top, we've told tigris we want simple feature objects returned.
```{r, include=FALSE}
states_geo <- tigris::states(resolution = "20m", cb = TRUE)
# let's take a look as what we have
states_geo
```
Looks a lot like a dataframe right?
Note the "geometry" field.
Also take note of the CRS, which stands for coordinate reference system; we'll come back to that shortly.
Ok, it's nice I have this GIS data, how do I actually see anything?
How do I map it out?
This is where you have many different options.
But we're going to start by using the powerful tmap package.
Keep in mind tmap uses the + sign not the pipe, similar ggplot2.
Watch how simple it is to get something initial up to see:
```{r}
tm_shape(states_geo) +
tm_polygons()
```
What just happened there?
Let's discuss.
I said for now we'll focus on the lower 48, how can we do that?
Well you can filter sf objects much like you can a regular dataframe/tibble.
First, let's start with just getting rid of U.S. territories, and just keep states, since this is something you'll find yourself doing quite frequently.
Tigris also comes with a handle fips code table built in.
All we have to do is reference it and can utilize its goodness.
```{r}
head(fips_codes)
```
This can be a great tool to help get down to just U.S. states and DC.
```{r}
vector_50anddc <- unique(fips_codes$state)[1:51]
```
Now we'll filter using our vector, must like we would a normal dataframe.
```{r}
states_geo <- states_geo %>%
filter(STUSPS %in% vector_50anddc)
```
Did it work?
Let's see how many rows we have now.
```{r}
states_geo %>%
nrow()
```
Great, now let's take out AK and HI too for now.
```{r}
states_geo <- states_geo %>%
filter(!STUSPS %in% c("AK", "HI"))
```
Let's map things out now to see what we have.
```{r}
tm_shape(states_geo) +
tm_polygons()
```
Bingo.
tm_polygons also takes some other arguments, including assigning an ID.
One of the powerful arguments is to symbolize the data based on a certain column.
All you have to do is feed in the name of the column you want to use to visualize by.
```{r}
tm_shape(states_geo) +
tm_polygons("ALAND", id = "GEOID")#here we feed in the land area, ALAND
```
Generate it again but this time adding labels.
```{r}
tm_shape(states_geo) +
tm_polygons("ALAND", id = "GEOID") +
tm_text("STUSPS", size = .5) #this line adds the labels
# there are numerous parameters and customizations you can do
```
At this point, let's touch on two geospatial concepts:
1) What are "cartographic boundaries" and why do we almost always want to use them for visualizations?
- you may have seen the parameter "cb = TRUE" above
- when to use them? when not to use them?
```{r}
# with cb
states_geo <- tigris::states(resolution = "20m", cb = TRUE) %>%
filter(STUSPS %in% vector_50anddc,
!STUSPS %in% c("AK", "HI"))
tm_shape(states_geo) +
tm_polygons()
```
```{r}
# without cb
# not run
# states_geo_nocb <- tigris::states(resolution = "20m") %>%
# filter(STUSPS %in% vector_50anddc,
# !STUSPS %in% c("AK", "HI"))
#
# tm_shape(states_geo_nocb) +
# tm_polygons()
```
Notice what's different? (Hint: look at MI and VA)
Also it took a lot longer to plot the non-cb version, even with 20m resolution. Why is that?
For large sf objects where you're more interested in visualizing, as opposed to spatial analysis based on distances etc,
you can "simplify" the map object to decrease the size and boost rendering speed
```{r}
#not run
# states_geo_nocb_SIMPLIFIED <- rmapshaper::ms_simplify(states_geo_nocb, keep = 0.1)
```
```{r}
# let's see the difference
# tm_shape(states_geo_nocb_SIMPLIFIED) +
# tm_polygons()
```
2) What is a Coordinate Reference System (CRS) and why is it important for mapping more than one element?
- how can I check the CRS? How can I change it?
- what's the difference between planar and geodesic?
- https://ihatecoordinatesystems.com/
the sf package's st_crs() function returns the CRS of a simple feature object
```{r}
st_crs(states_geo)
```
This becomes very important when you want to layer different geo datasets, and when doing processing
work such as spatial joins, or measuring distances and such.
You want all the data to be using the same CRS, or you could wind up with distorted or incorrect results.
# LET'S ADD SOME CITIES AS POINTS
```{r}
# load a sample of US cities
cities <- read_csv(here("data/table_data", "cities_with_coordinates.csv"))
cities
```
```{r}
# now we can create a geospatial object using the coordinates
# can can specify a crs
cities_geo <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4269)
```
```{r}
st_crs(cities_geo)
st_crs(states_geo)
```
looks like they're the same crs - but are they exactly the same?
sometimes helps to be extra sure that the text is identical
we can change a CRS by using st_transform()
```{r}
cities_geo <- st_transform(cities_geo, st_crs(states_geo))
```
```{r}
# now let's look again
st_crs(cities_geo)
st_crs(states_geo)
```
```{r}
# great, now let's map our new point layer on top of the base map
tm_shape(states_geo) + tm_polygons() +
tm_shape(cities_geo) + tm_dots()
```
```{r}
# They're on there!
# Little hard to see though, let's fiddle with the size and color
tm_shape(states_geo) + tm_polygons() +
tm_shape(cities_geo) + tm_dots(col = "red", size = 1)
```
```{r}
# we can actually save our tmap as its own object as well
mymap <- tm_shape(states_geo) + tm_polygons() +
tm_shape(cities_geo) + tm_dots(col = "red", size = 0.1)
mymap
```
```{r}
# We can either use the "export" button directly from the viewer to save as pdf...
# ...or do it using the following code:
tmap_save(mymap, here("mymap.pdf"))
```
```{r}
# We can also save it as an RDS file - the entire map becomes the saved object
saveRDS(mymap, here("mymap.rds"))
```
```{r}
# why might we want to do this?
# one use case: if displaying in a complex map on an rmarkdown document / website,
# you don't have to compute it on the fly, can use the pre-processed result instead
map_to_include <- readRDS(here("mymap.rds"))
```
```{r}
map_to_include
```
Wish your tmap was interactive instead of static?
While it doesn't have the same level of specific customization as using the leaflet package
directly (example of that later on), you can actually turn your map object in tmap into a
leaflet map by running a single line of code: setting the tmap_mode()
```{r}
# let's take a look
tmap_mode(mode = "view")
# what's what happens
mymap
```
you can also use the tmap_leaflet() function to convert to a leaflet object and
further customize using the leaflet's own methods
want to go back to static?
```{r}
tmap_mode(mode = "plot")
mymap
# nice...
```
# AN EXAMPLE: CHOROPLETH MAPS OF HOUSE DISTRICT DEMOGRAPHICS AND VOTING
```{r}
# We've done something with points, let's now look at a real-world chloropleth use case
# load dataset of district characteristics for pre-2018 election U.S. House districts
alldistricts <- readRDS(here("data/table_data", "alldistricts.rds"))
alldistricts
```
```{r}
# Since above we used the tigris package to get our base map, this time let's see what's
# involved in loading a geospatial file you already have yourself and want to bring into R.
cd_geo <- st_read(here("data/geospatial_data", "cb_2018_us_cd116_20m/cb_2018_us_cd116_20m.shp"))
head(cd_geo)
```
```{r}
# join our district dataset to its geography
# note that we can use dplyr's inner join here to join on the FIPS code (named geoid in the tables)
districtmap <- inner_join(cd_geo, alldistricts, by = c("GEOID" = "geoid"))
```
```{r}
# did it work?
glimpse(districtmap)# woohoo
```
```{r}
# remove AK and HI for expediency here again
districtmap <- districtmap %>%
filter(state_name != "Alaska",
state_name != "Hawaii")
```
```{r}
# Use TMAP to map it out
tmap_mode(mode = "plot")
tm_shape(districtmap) +
tm_polygons(id = "house_dist")
```
```{r}
# now let's actually do some analysis...
# which districts did Trump or Clinton carry in 2016?
# again you can customize color schemes, labels, etc... we'll just keep defaults for now.
tm_shape(districtmap) +
tm_polygons("prez_winner_2016", id = "house_dist")
```
```{r}
# let's look at whether a district is above/below the national average for pct with a college degree?
tm_shape(districtmap) +
tm_polygons("pct_ed_college_all_abovebelow_natl", id = "house_dist")
```
```{r}
# you can also filter our geospatial dataset to create subsets
# let's look at just GOP-held seats where the race was favoring the Dems or a tossup
rheld_demadvantage <- districtmap %>%
filter(incumbent_2018 == "R",
race_rating_2018 %in% c("likely democratic", "lean democratic", "tossup"))
# and then once again use tmap to display that
tm_shape(rheld_demadvantage) +
tm_polygons(id = "house_dist")
```
```{r}
# Of course what happens here - we don't have the rest of the CD map shown.
# So we can simply layer the base CD map underneath the filtered districts
tm_shape(districtmap) +
tm_polygons(id = "house_dist") +
tm_shape(rheld_demadvantage) +
tm_polygons(col = "red", id = "house_dist")
```
```{r}
# we can also symbolize the filtered districts by a variable if we want
tm_shape(districtmap) +
tm_polygons(id = "house_dist") +
tm_shape(rheld_demadvantage) +
tm_polygons("pct_ed_college_all_abovebelow_natl", id = "house_dist")
```
```{r}
# once we have something we want can save it as its own object
# here we'll add a title for the map, as well as a title for the legend
map_rheld_demadvantage_byeducation <- tm_shape(districtmap) +
tm_polygons() +
tm_shape(rheld_demadvantage) +
tm_polygons(col = "red", id = "house_dist") +
tm_layout(main.title = "GOP-Held Seats in 2018 Where Democrats Were Most Competitive",
main.title.position = "center",
main.title.color = "darkred",
main.title.size = 1.2)
map_rheld_demadvantage_byeducation
```
```{r}
# now we can export it
tmap_save(map_rheld_demadvantage_byeducation, here("map_rheld_demadvantage_byeducation.pdf"))
```
# SPATIAL JOINING
A spatial join is where instead of joining to tables based on matching a key field, you join two
datasets based on their geospatial position in the world so to speak.
Spatial joining is a mainstay of GIS work, so let's show a very quick example of one way to do it
in R, using the data we've already seen.
In this case, we had cities above mapped out. What if we wanted to know which congressional districts
each city was in?
We have our cities:
```{r}
mymap
```
```{r}
# And we have our district boundaries:
tm_shape(districtmap) +
tm_polygons()
```
Before we join, it's always good practice to visually look at what you have.
This will tell you if anything weird is happening you need to deal with first.
```{r}
tm_shape(districtmap) + tm_polygons() +
tm_shape(cities_geo) + tm_dots(col = "red", size = 0.1)
```
Even though cities within districts may turn out ok as we are, let's apply a planar CRS just to be sure.
More information on the differences between coordinate systems here: https://tinyurl.com/t97h947v
We'll create two new R objects here to keep them distinct from the previous ones
```{r}
districtmap_forjoin <- st_transform(districtmap, 2163)
# now we'll assign cities to match what we just assigned for districts
cities_geo_forjoin <- st_transform(cities_geo, st_crs(districtmap_forjoin))
```
```{r}
st_crs(cities_geo_forjoin)
st_crs(districtmap_forjoin)
```
```{r}
# Now let's do the join using sf's st_join() function
joined <- st_join(cities_geo_forjoin, districtmap_forjoin)
```
```{r}
joined # %>% View()
```
```{r}
# We can select just some relevant columns to simplify
joined %>%
select(city, state, house_dist)
```
```{r}
# Cool, that's better. Though notice the geometry always travels with it.
# What if you wanted to just have a table as the result, without the geometry?
# The sf package to the rescue again...
matched_table <- joined %>%
select(city, state, house_dist) %>%
st_set_geometry(NULL)
matched_table
```
# AUTOMATING REPETITIVE GIS WORK
You may be asking yourself, I'm already a QGIS or ArcGIS user and can do what you've shown.
Is there a benefit to doing it in R?
The truth is it's always dependent on your situation and what you're trying to accomplish.
One area where R can be especially helpful is when you need to do a bunch of variations of the
same thing. Let's take a quick look at an example.
```{r}
# We'll use a measure from our congressional district data from above to examine pct with college degrees
tm_shape(districtmap) +
tm_polygons("pct_ed_college_all", id = "GEOID")
```
Cool, so for the the whole country, that was easy enough.
But what about if our need was to have maps for EVERY STATE.
A separate map for each showing the same. Perhaps to go with state-specific pages on our website etc.
First, let's solve for one state. How could we do that...
```{r}
# Since the state abbreviation isn't in our current districtmap object, let's get it in there
fips_statelookup <- fips_codes %>%
as_tibble() %>%
select(state, state_code) %>%
distinct()
fips_statelookup
```
```{r}
districtmap <- inner_join(districtmap, fips_statelookup, by = c("STATEFP" = "state_code"))
```
```{r}
# create slice of just one state
cd_onestate <- districtmap %>%
filter(state == "GA")
#let's see what we've got
tm_shape(cd_onestate) +
tm_polygons("pct_ed_college_all", id = "GEOID")
```
Let's make a FUNCTION now to do this for a state
We'll use the state abbreviation to feed into it
```{r}
make_state_map <- function(stateabbr){
#choose state
cd_onestate <- districtmap %>%
filter(state == stateabbr)
# create cd map for the state
mymap_test <- tm_shape(cd_onestate) +
tm_polygons("pct_ed_college_all", id = "GEOID") +
tm_text("CD116FP", size = .5)
#export file to pdf
filename <- paste0("districtmap_", stateabbr, ".pdf")
print(filename)
tmap_save(mymap_test, here("stateoutputs", filename))
}
```
```{r}
# try for just one state
make_state_map("GA")
# let's take a look at the generated pdf file. Did it work?
```
```{r}
# Now that we know it works for one, we can do it for ALL states in a list we determine.
# Let's create a vector of all the states in our original map
vector_targetstates <- districtmap %>%
st_set_geometry(NULL) %>%
count(state) %>%
pull(state)
```
```{r}
# Then we'll use that to feed into our new function to loop through everything at once.
# We'll iterate through them all using purrr's walk() function
walk(vector_targetstates, make_state_map)
```