forked from hadley/ggplot2-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
maps.Rmd
414 lines (293 loc) · 28 KB
/
maps.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
# Maps {#maps}
```{r setup, include = FALSE}
source("common.R")
columns(1, 2 / 3)
```
Plotting geospatial data is a common visualisation task, and one that requires specialised tools. Typically the problem can be decomposed into two problems: using one data source to draw a map, and adding metadata from another information source to the map. This chapter will help you tackle both problems. I've structured the chapter in the following way: Section \@ref(polygonmaps) outlines a simple way to draw maps using `geom_polygon()`, which is followed in Section \@ref(sf) by a modern "simple features" (sf) approach using `geom_sf()`. Next, Sections \@ref(mapproj) and \@ref(sfdetail) discuss how to work with map projections and the underlying sf data structure. Finally, Section \@ref(rastermaps) discusses how to draw maps based on raster data.
## Polygon maps {#polygonmaps}
\index{Maps!geoms}
\index{Data!spatial}
\index{geom\_polygon}
Perhaps the simplest approach to drawing maps is to use `geom_polygon()` to draw boundaries for different regions. For this example we take data from the maps package using `ggplot2::map_data()`. The maps package isn't particularly accurate or up-to-date, but it's built into R so it's an easy place to start. Here's a data set specifying the county boundaries for Michigan:
```{r}
mi_counties <- map_data("county", "michigan") %>%
select(lon = long, lat, group, id = subregion)
head(mi_counties)
```
In this data set we have four variables: `lat` and `long` specify the latitude and longitude of a vertex (i.e. a corner of the polygon), `id` specifies the name of a region, and `group` provides a unique identifier for contiguous areas within a region (e.g. if a region consisted of multiple islands). To get a better sense of what the data contains, we can plot `mi_counties` using `geom_point()`, as shown in the left panel below. In this plot, each row in the data frame is plotted as a single point, producing a scatterplot that shows the corners of every county. To turn this scatterplot into a map, we use `geom_polygon()` instead, which draws each county as a distinct polygon. This is illustrated in the right panel below.
`r columns(2, 2/3)`
```{r}
ggplot(mi_counties, aes(lon, lat)) +
geom_point(size = .25, show.legend = FALSE) +
coord_quickmap()
ggplot(mi_counties, aes(lon, lat, group = group)) +
geom_polygon(fill = "white", colour = "grey50") +
coord_quickmap()
```
In both plots I use `coord_quickmap()` to adjust the axes to ensure that longitude and latitude are rendered on the same scale. Chapter \@ref(coord) discusses coordinate systems in ggplot2 in more general terms, but as we'll see below, geospatial data often require a more exacting approach. For this reason, ggplot2 provides `geom_sf()` and `coord_sf()` to handle spatial data specified in simple features format.
## Simple features maps {#sf}
There are a few limitations to the approach outlined above, not least of which is the fact that the simple "longitude-latitude" data format is not typically used in real world mapping. Vector data for maps are typically encoded using the "simple features" standard produced by the Open Geospatial Consortium. The sf package [@sf] developed by Edzer Pebesma <https://github.com/r-spatial/sf> provides an excellent toolset for working with such data, and the `geom_sf()` and `coord_sf()` functions in ggplot2 are designed to work together with the sf package.
To introduce these functions, we rely on the ozmaps package by Michael Sumner <https://github.com/mdsumner/ozmaps/> which provides maps for Australian state boundaries, local government areas, electoral boundaries, and so on [@ozmaps]. To illustrate what an sf data set looks like, we import a data set depicting the borders of Australian states and territories:
`r columns(1, 1)`
```{r}
library(ozmaps)
library(sf)
oz_states <- ozmaps::ozmap_states
oz_states
```
This output shows some of the metadata associated with the data (discussed momentarily), and tells us that the data is essentially a tibble with 9 rows and 2 columns. One advantage to sf data is immediately apparent, we can easily see the overall structure of the data: Australia is comprised of six states and some territories. There are 9 distinct geographical units, so there are 9 rows in this tibble (cf. `mi_counties data` where there is one row per polygon vertex).
The most important column is `geometry`, which specifies the spatial geometry for each of the states and territories. Each element in the `geometry` column is a multipolygon object which, as the name suggests, contains data specifying the vertices of one or more polygons that demark the border of a region. Given data in this format, we can use `geom_sf()` and `coord_sf()` to draw a serviceable map without specifying any parameters or even explicitly declaring any aesthetics:
```{r}
ggplot(oz_states) +
geom_sf() +
coord_sf()
```
To understand why this works, note that `geom_sf()` relies on a `geometry` aesthetic that is not used elsewhere in ggplot2. This aesthetic can be specified in one of three ways:
* In the simplest case (illustrated above) when the user does nothing,
`geom_sf()` will attempt to map it to a column named `geometry`.
* If the `data` argument is an sf object then `geom_sf()` can automatically
detect a geometry column, even if it's not called `geometry`.
* You can specify the mapping manually in the usual way with
`aes(geometry = my_column)`. This is useful if you have multiple geometry
columns.
The `coord_sf()` function governs the map projection, discussed in Section \@ref(mapproj).
### Layered maps
In some instances you may want to overlay one map on top of another. The ggplot2 package supports this by allowing you to add multiple `geom_sf()` layers to a plot. As an example, I'll use the `oz_states` data to draw the Australian states in different colours, and will overlay this plot with the boundaries of Australian electoral regions. To do this, there are two preprocessing steps to perform. First, I'll use `dplyr::filter()` to remove the "Other Territories" from the state boundaries.
The code below draws a plot with two map layers: the first uses `oz_states` to fill the states in different colours, and the second uses `oz_votes` to draw the electoral boundaries. Second, I'll extract the electoral boundaries in a simplified form using the `ms_simplify()` function from the rmapshaper package [@rmapshaper]. This is generally a good idea if the original data set (in this case `ozmaps::abs_ced`) is stored at a higher resolution than your plot requires, in order to reduce the time taken to render the plot.
`r columns(n = 1, aspect_ratio = 1)`
```{r}
oz_states <- ozmaps::ozmap_states %>% filter(NAME != "Other Territories")
oz_votes <- rmapshaper::ms_simplify(ozmaps::abs_ced)
```
Now that I have data sets `oz_states` and `oz_votes` to represent the state borders and electoral borders respectively, the desired plot can be constructed by adding two `geom_sf()` layers to the plot:
```{r}
ggplot() +
geom_sf(data = oz_states, mapping = aes(fill = NAME), show.legend = FALSE) +
geom_sf(data = oz_votes, fill = NA) +
coord_sf()
```
It is worth noting that the first layer to this plot maps the `fill` aesthetic in onto a variable in the data. In this instance the `NAME` variable is a categorical variable and does not convey any additional information, but the same approach can be used to visualise other kinds of area metadata. For example, if `oz_states` had an additional column specifying the unemployment level in each state, we could map the `fill` aesthetic to that variable.
### Labelled maps {#geom_sf_label}
Adding labels to maps is an example of annotating plots (Chapter \@ref(annotations)) and is supported by `geom_sf_label()` and `geom_sf_text()`. For example, while an Australian audience might be reasonably expected to know the names of the Australian states (and are left unlabelled in the plot above) few Australians would know the names of different electorates in the Sydney metropolitan region. In order to draw an electoral map of Sydney, then, we would first need to extract the
map data for the relevant elextorates, and then add the label. The plot below zooms in on the Sydney region by specifying `xlim` and `ylim` in `coord_sf()` and then uses `geom_sf_label()` to overlay each electorate with a label:
```{r}
# filter electorates in the Sydney metropolitan region
sydney_map <- ozmaps::abs_ced %>% filter(NAME %in% c(
"Sydney", "Wentworth", "Warringah", "Kingsford Smith", "Grayndler", "Lowe",
"North Sydney", "Barton", "Bradfield", "Banks", "Blaxland", "Reid",
"Watson", "Fowler", "Werriwa", "Prospect", "Parramatta", "Bennelong",
"Mackellar", "Greenway", "Mitchell", "Chifley", "McMahon"
))
# draw the electoral map of Sydney
ggplot(sydney_map) +
geom_sf(aes(fill = NAME), show.legend = FALSE) +
coord_sf(xlim = c(150.97, 151.3), ylim = c(-33.98, -33.79)) +
geom_sf_label(aes(label = NAME), label.padding = unit(1, "mm"))
```
The warning message is worth noting. Internally `geom_sf_label()` uses the function
`st_point_on_surface()` from the sf package to place labels, and the warning message
occurs because most algorithms used by sf to compute geometric quantities (e.g., centroids,
interior points) are based on an assumption that the points lie in on a flat two dimensional
surface and parameterised with Cartesian co-ordinates. This assumption is not strictly
warranted, and in some cases (e.g., regions near the poles) calculations that treat
longitude and latitude in this way will give erroneous answers. For this reason, the sf
package produces warning messages when it relies on this approximation.
<!-- HW: worth commenting on warning? "st_point_on_surface may not give correct results for longitude/latitude data" -->
### Adding other geoms
Though `geom_sf()` is special in some ways, it nevertheless behaves in much the same fashion as any other geom, allowing additional data to be plotted on a map with standard geoms. For example, we may wish to plot the locations of the Australian capital cities on the map using `geom_point()`. The code below illustrates how this is done:
`r columns(n = 1, aspect_ratio = 1)`
```{r}
oz_capitals <- tibble::tribble(
~city, ~lat, ~lon,
"Sydney", -33.8688, 151.2093,
"Melbourne", -37.8136, 144.9631,
"Brisbane", -27.4698, 153.0251,
"Adelaide", -34.9285, 138.6007,
"Perth", -31.9505, 115.8605,
"Hobart", -42.8821, 147.3272,
"Canberra", -35.2809, 149.1300,
"Darwin", -12.4634, 130.8456,
)
ggplot() +
geom_sf(data = oz_votes) +
geom_sf(data = oz_states, colour = "black", fill = NA) +
geom_point(data = oz_capitals, mapping = aes(x = lon, y = lat), colour = "red") +
coord_sf()
```
In this example `geom_point` is used only to specify the locations of the capital cities, but the basic idea can be extended to handle point metadata more generally. For example if the oz_capitals data were to include an additional variable specifying the number of electorates within each metropolitan area, we could encode that data using the `size` aesthetic.
## Map projections {#mapproj}
At the start of the chapter I drew maps by plotting longitude and latitude on a Cartesian plane, as if geospatial data were no different to other kinds of data one might want to plot. To a first approximation this is okay, but it's not good enough if you care about accuracy. There are two fundamental problems with the approach.
The first issue is the shape of the planet. The Earth is neither a flat plane, nor indeed is it a perfect sphere. As a consequence, to map a co-ordinate value (longitude and latitude) to a location we need to make assumptions about all kinds of things. How ellipsoidal is the Earth? Where is the centre of the planet? Where is the origin point for longitude and latitude? Where is the sea level? How do the tectonic plates move? All these things are relevant, and depending on what assumptions one makes the same co-ordinate can be mapped to locations that are many meters apart. The set of assumptions about the shape of the Earth is referred to as the **geodetic datum** and while it might not matter for some data visualisations, for others it is critical. There are several different choices one might consider: if your focus is North America the "North American Datum" (NAD83) is a good choice, whereas if your perspective is global the "World Geodetic System" (WGS84) is probably better.
The second issue is the shape of your map. The Earth is approximately ellipsoidal, but in most instances your spatial data need to be drawn on a two dimensional plane. It is not possible to map the surface of an ellipsoid to a plane without some distortion or cutting, and you will have to make choices about what distortions you are prepared to accept when drawing a map. This is the job of the **map projection**.
<!-- HW: this discussion is great, and was sorely missing from the previous edition of the book, and indeed my understanding of spatial data when I wrote it -->
Map projections are often classified in terms of the geometric properties that they preserve, e.g.
* Area-preserving projections ensure that regions of equal area on the globe are
drawn with equal area on the map.
* Shape-preserving (or conformal) projections ensure that the local shape of
regions is preserved.
And unfortunately, it's not possible for any projection to be shape-preserving and area-preserving. This makes it a little beyond the scope of this book to discuss map projections in detail, other than to note that the simple features specification allows you to indicate which map projection you want to use. For more information on map projections, see Geocomputation with R <https://geocompr.robinlovelace.net/> [@lovelace_geocomputation_2019].
Taken together, the geodetic datum (e.g, WGS84), the type of map projection (e.g., Mercator) and the parameters of the projection (e.g., location of the origin) specify a **coordinate reference system**, or CRS, a complete set of assumptions used to translate the latitude and longitude information into a two dimensional map. An sf object often includes a default CRS, as illustrated below:
```{r}
st_crs(oz_votes)
```
Most of this output corresponds to a **well-known text** (WKT) string that unambiguously describes the CRS. This verbose WKT representation is used by sf internally, but there are several ways to provide user input that sf understands. One such method is to provide numeric input in the form of an **EPSG code** (see <http://www.epsg.org/>). The default CRS in the `oz_votes` data corresponds to EPSG code 4283:
```{r}
st_crs(oz_votes) == st_crs(4283)
```
In ggplot2, the CRS is controlled by `coord_sf()`, which ensures that every layer in the plot uses the same projection. By default, `coord_sf()` uses the CRS associated with the geometry column of the data[^first-layer]. Because sf data typically supply a sensible choice of CRS, this process usually unfolds invisibly, requiring no intervention from the user. However, should you need to set the CRS yourself, you can specify the `crs` parameter by passing valid user input to `st_crs()`. The example below illustrates how to switch from the default CRS to EPSG code 3112:
[^first-layer]: If there are multiple data sets with a different associated CRS, it uses the CRS from the first layer.
`r columns(2,1)`
```{r}
ggplot(oz_votes) + geom_sf()
ggplot(oz_votes) + geom_sf() + coord_sf(crs = st_crs(3112))
```
## Working with sf data {#sfdetail}
As noted earlier, maps created using `geom_sf()` and `coord_sf()` rely heavily on tools
provided by the sf package, and indeed the sf package contains many more useful tools for
manipulating simple features data. In this section I provide an introduction to a few
such tools; more detailed coverage can be found on the sf package website
https://r-spatial.github.io/sf/.
To get started, recall that one advantage to simple features over other representations
of spatial data is that geographical units can have complicated structure. A good example
of this in the Australian maps data is the electoral district of Eden-Monaro, plotted below:
`r columns(2, 2/3)`
```{r}
edenmonaro <- ozmaps::abs_ced %>% filter(NAME == "Eden-Monaro")
p <- ggplot(edenmonaro) + geom_sf()
p + coord_sf(xlim = c(147.75, 150.25), ylim = c(-37.5, -34.5))
p + coord_sf(xlim = c(150, 150.25), ylim = c(-36.3, -36))
```
As this illustrates, Eden-Monaro is defined in terms of two distinct polygons, a large one on the Australian mainland and a small island. However, the large region has a hole in the middle (the hole exists because the Australian Capital Territory is a distinct political unit that is wholly contained within Eden-Monaro, and as illustrated above, electoral boundaries in Australia do not cross state lines). In sf terminology this is an example of a `MULTIPOLYGON` geometry. In this section I'll talk about the structure of these objects and how to work with them.
First, let's use dplyr to grab only the geometry object:
```{r}
edenmonaro <- edenmonaro %>% pull(geometry)
```
The metadata for the edenmonaro object can accessed using helper functions. For example, `st_geometry_type()` extracts the geometry type (e.g., `MULTIPOLYGON`), `st_dimension()` extracts the number of dimensions (2 for XY data, 3 for XYZ), `st_bbox()` extracts the bounding box as a numeric vector, and `st_crs()` extracts the CRS as a list with two components, one for the EPSG code and the other for the proj4string. For example:
```{r}
st_bbox(edenmonaro)
```
Normally when we print the `edenmonaro` object the output would display all the additional information (dimension, bounding box, geodetic datum etc) but for the remainder of this section I will show only the relevant lines of the output. In this case edenmonaro is defined by a MULTIPOLYGON geometry containing one feature:
```{r, output.lines = -(3:6)}
edenmonaro
```
However, we can "cast" the MULTIPOLYGON into the two distinct POLYGON geometries from which it is constructed using `st_cast()`:
```{r, output.lines = -(3:6)}
st_cast(edenmonaro, "POLYGON")
```
To illustrate when this might be useful, consider the Dawson electorate, which consists of 69 islands in addition to a coastal region on the Australian mainland.
```{r, output.lines = -(3:6)}
dawson <- ozmaps::abs_ced %>%
filter(NAME == "Dawson") %>%
pull(geometry)
dawson
ggplot(dawson) +
geom_sf() +
coord_sf()
```
Suppose, however, our interest is only in mapping the islands. If so, we can first use the `st_cast()` function to break the Dawson electorate into the constituent polygons. After doing so, we can use `st_area()` to calculate the area of each polygon and `which.max()` to find the polygon with maximum area:
```{r}
dawson <- st_cast(dawson, "POLYGON")
which.max(st_area(dawson))
```
The large mainland region corresponds to the 69th polygon within Dawson. Armed with this knowledge, we can draw a map showing only the islands:
```{r}
ggplot(dawson[-69]) +
geom_sf() +
coord_sf()
```
## Raster maps {#rastermaps}
<!-- HW: do you have any thoughts on a modern approach to this problem? -->
A second way to supply geospatial information for mapping is to rely on **raster data**. Unlike the simple features format, in which geographical entities are specified in terms of a set of lines, points and polygons, rasters take the form of images. In the simplest case raster data might be nothing more than a bitmap file, but there are many different image formats out there. In the geospatial context specifically, there are image formats that include metadata (e.g., geodetic datum, coordinate reference system) that can be used to map the image information to the surface of the Earth. For example, one common format is GeoTIFF, which is a regular TIFF file with additional metadata supplied. Happily, most formats can be easily read into R with the assistance of GDAL (the Geospatial Data Abstraction Library, https://gdal.org/). For example the sf package contains a function `sf::gdal_read()` that provides access to the GDAL raster drivers from R. However, you rarely need to call this function directly, as there are other high level functions that take care of this for you.
As an illustration, suppose we wish to plot satellite images made publicly available by the Australian Bureau of Meterorology (BOM) on their FTP server. The bomrang package [@bomrang] provides a convenient interface to the server, including a `get_available_imagery()` function that returns a vector of filenames and a `get_satellite_imagery()` function that downloads a file and imports it directly into R. For expository purposes, however, I'll use a more flexible method that could be adapted to any FTP server, and use the `download.file()` function:
```{r eval=FALSE}
# list of all file names with time stamp 2020-01-07 21:00 GMT
# (BOM images are retained for 24 hours, so this will return an
# empty vector if you run this code without editing the time stamp)
files <- bomrang::get_available_imagery() %>%
stringr::str_subset("202001072100")
# use curl_download() to obtain a single file, and purrr to
# vectorise this operation
purrr::walk2(
.x = paste0("ftp://ftp.bom.gov.au/anon/gen/gms/", files),
.y = file.path("raster", files),
.f = ~ download.file(url = .x, destfile = .y)
)
```
Note that if you want to run this code yourself you will need to change the time stamp string from `"202001072100"` to one day prior to the current date, and you will need to make sure there is a folder called "raster" in your working directory into which files will be downloaded. After caching the files locally (which is generally a good idea) we can inspect the list of files we have downloaded:
```{r}
dir("raster")
```
All 14 files are constructed from images taken by the Himawari-8 geostationary satellite operated by the Japan Meteorological Agency and takes images across 13 distinct bands. The images released by the Australian BOM include data on the visible spectrum (channel 3) and the infrared spectrum (channel 13):
```{r}
img_vis <- file.path("raster", "IDE00422.202001072100.tif")
img_inf <- file.path("raster", "IDE00421.202001072100.tif")
```
To import the data in the img_visible file into R, I'll use the stars package [@stars] to import the data as stars objects:
```{r}
library(stars)
sat_vis <- read_stars(img_vis, RasterIO = list(nBufXSize = 600, nBufYSize = 600))
sat_inf <- read_stars(img_inf, RasterIO = list(nBufXSize = 600, nBufYSize = 600))
```
In the code above, the first argument specifies the path to the raster file, and the `RasterIO` argument is used to pass a list of low-level parameters to GDAL. In this case, I have used `nBufXSize` and `nBufYSize` to ensure that R reads the data at low resolution (as a 600x600 pixel image). To see what information R has imported, we can inspect the `sat_vis` object:
```{r}
sat_vis
```
This output tells us something about the structure of a stars object. For the `sat_vis` object the underlying data is stored as a three dimensional array, with `x` and `y` dimensions specifying the spatial data. The `band` dimension in this case corresponds to the colour channel (RGB) but is redundant for this image as the data are greyscale. In other data sets there might be bands corresponding to different sensors, and possibly a time dimension as well. Note that the spatial data is also associated with a coordinate reference system (referred to as "refsys" in the output).
To plot the `sat_vis` data in ggplot2, we can use the `geom_stars()` function provided by the stars package. A minimal plot might look like this:
`r columns(2, 1/2, max_width = 2)`
```{r}
ggplot() +
geom_stars(data = sat_vis) +
coord_equal()
```
The `geom_stars()` function requires the `data` argument to be a stars object, and maps the raster data to the fill aesthetic. Accordingly, the blue shading in the satellite image above is determined by the ggplot2 scale, not the image itself. That is, although `sat_vis` contains three bands, the plot above only displays the first one, and the raw data values (which range from 0 to 255) are mapped onto the default blue palette that ggplot2 uses for continuous data. To see what the image file "really" looks like we can separate the bands using `facet_wrap()`:
```{r}
ggplot() +
geom_stars(data = sat_vis, show.legend = FALSE) +
facet_wrap(vars(band)) +
coord_equal() +
scale_fill_gradient(low = "black", high = "white")
```
One limitation to displaying only the raw image is that it is not easy to work out where the relevant landmasses are, and we may wish to overlay the satellite data with the `oz_states` vector map to show the outlines of Australian political entities. However, some care is required in doing so because the two data sources are associated with different coordinate reference systems. To project the `oz_states` data correctly, the data should be transformed using the `st_transform()` function from the sf package. In the code below, I extract the CRS from the `sat_vis` raster object, and transform the `oz_states` data to use the same system.
```{r}
oz_states <- st_transform(oz_states, crs = st_crs(sat_vis))
```
Having done so, I can now draw the vector map over the top of the raster image to make the image more interpretable to the reader. It is now clear from inspection that the satellite image was taken during the Australian sunrise:
```{r}
ggplot() +
geom_stars(data = sat_vis, show.legend = FALSE) +
geom_sf(data = oz_states, fill = NA, color = "white") +
coord_sf() +
theme_void() +
scale_fill_gradient(low = "black", high = "white")
```
What if we wanted to plot more conventional data over the top? A simple example of this would be to plot the locations of the Australian capital cities per the `oz_capitals` data frame that contains latitude and longitude data. However, because these data are not associated with a CRS and are not on the same scale as the raster data in `sat_vis`, these will *also* need to be transformed. To do so, we first need to create an sf object from the `oz_capitals` data using `st_as_sf()`:
```{r}
cities <- oz_capitals %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = FALSE)
```
This projection is set using the EPSG code 4326, an ellipsoidal projection using the latitude and longitude values as co-ordinates and relying on the WGS84 datum. Having done so we can now transform the co-ordinates from the latitude-longitude geometry to the match the geometry of our `sat_vis` data:
```{r}
cities <- st_transform(cities, st_crs(sat_vis))
```
The transformed data can now be overlaid using `geom_sf()`:
```{r}
ggplot() +
geom_stars(data = sat_vis, show.legend = FALSE) +
geom_sf(data = oz_states, fill = NA, color = "white") +
geom_sf(data = cities, color = "red") +
coord_sf() +
theme_void() +
scale_fill_gradient(low = "black", high = "white")
```
This version of the image makes clearer that the satellite image was taken at approximately sunrise in Darwin: the sun had risen for all the eastern cities but not in Perth. This could be made clearer in the data visualisation using the `geom_sf_text()` function to add labels to each city. For instance we could add another layer to the plot using code like this,
```{r eval = FALSE}
geom_sf_text(data = cities, mapping = aes(label = city))
```
though some care would be required to ensure the text is positioned nicely (see Chapter \@ref(annotations)).
## Data sources
* The USAboundaries package, <https://github.com/ropensci/USAboundaries> contains state, county and zip code data for the US [@USAboundaries]. As well as current boundaries, it also has state and county boundaries going back to the 1600s.
* The tigris package, <https://github.com/walkerke/tigris>, makes it easy to access the US Census TIGRIS shapefiles [@tigris]. It contains state, county, zipcode, and census tract boundaries, as well as many other useful datasets.
* The rnaturalearth package [@rnaturalearth] bundles up the free, high-quality data from <http://naturalearthdata.com/>. It contains country borders, and borders for the top-level region within each country (e.g. states in the USA, regions in France, counties in the UK).
* The osmar package, <https://cran.r-project.org/package=osmar> wraps up the OpenStreetMap API so you can access a wide range of vector data including individual streets and buildings [@osmar]
* If you have your own shape files (`.shp`) you can load them into R with `sf::read_sf()`