-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoccupancy_example_walkthrough.Rmd
360 lines (247 loc) · 14.7 KB
/
occupancy_example_walkthrough.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
---
title: "Occupancy example walkthrough"
author: "Rob Cooke & Nick Isaac"
date: "12/05/2023"
output:
html_notebook:
toc: TRUE
---
# Overview of notebook
This notebook will walkthrough preparing data (from <a href = "https://nbnatlas.org/" target = "_blank">NBN Atlas</a>) and building an occupancy model for <a href = "https://www.bwars.com/ant/formicidae/formicinae/lasius-niger" target = "_blank">*Lasius niger*</a> (Small Black Ant) across the UK. Although the model will be built for a single species we will use the target-group approach for the detection model and therefore we will use records for all Formicidae (Ants).
# Set-up environment
Here we load the necessary packages
```{r}
# # install JAGS in terminal
# sudo apt install jags
# # sparta package from github
# remotes::install_github("BiologicalRecordsCentre/sparta")
# # BRCmap package from github
# remotes::install_github("colinharrower/BRCmap")
# load necessary packages
library(sparta)
library(dplyr)
library(R2jags)
library(BRCmap)
# all other packages are called directly; other packages needed:
# readr, skimr, janitor, stringr, DT, tibble, ggplot2
# global parameters
first_year <- 1970
last_year <- 2023
focal_taxa <- "Lasius niger"
# load_rdata function
# loads an RData file, and assigns it to an object name
load_rdata <- function(fileName) {
load(fileName)
get(ls()[ls() != "fileName"])
}
# check where project is located
xfun::proj_root()
```
# Raw data
Download data for *Formicidae* (Ants) from <a href = "https://nbnatlas.org/" target = "_blank">NBN Atlas</a>
N.B. you need to be logged in to your NBN Atlas account to be able to download data
We excluded records with 'unconfirmed identifications' and 'fossil records'
Load raw downloaded data (downloaded Mon May 15 15:12:40 UTC 2023)
Data is licensed by CC-BY-NC, CC-BY, CC0 or OGL, see the <a href = "https://docs.nbnatlas.org/data-licenses/" target = "_blank">NBN Licences page</a> for more details
```{r}
# original raw records
recs_raw_orig <- readr::read_csv("data/raw/formicidae_records-2023-05-15.csv", col_types = list(`Abundance scale` = readr::col_skip())) # readr performs safer data reading (e.g., dates are read as dates, strings are read as character)
# skip Abundance scale column which is problematic
# overview of data and columns
skimr::skim(recs_raw_orig) %>%
dplyr::select(skim_type, skim_variable, n_missing)
# tidy data
recs <- recs_raw_orig %>%
# tidy column names
janitor::clean_names() %>%
# only species-level records
dplyr::filter(taxon_rank == "species") %>%
# only records recorded to day precision
dplyr::filter(!is.na(start_date)) %>%
dplyr::filter(is.na(end_date)) %>%
# only records recorded to 4-figure grid ref (1km)
dplyr::filter(!is.na(osgr_1km)) %>%
# get year from date
dplyr::mutate(year = as.numeric(format(start_date, "%Y"))) %>%
# temporal window
dplyr::filter(year >= first_year & year <= last_year) %>%
# remove duplicates
dplyr::distinct(scientific_name, start_date, osgr_1km, year)
# number of records after cleaning
nrow(recs)
# number of records for Lasius niger
foc_recs <- dplyr::filter(recs, scientific_name == focal_taxa)
nrow(foc_recs)
# convert grid references to easting and northing
grcoord <- suppressWarnings(BRCmap::gr2sp_points(foc_recs$osgr_1km)) %>%
as.data.frame(.)
# join easting and northing to data
foc_recs_gr <- dplyr::left_join(foc_recs, grcoord, by = c("osgr_1km" = "GRIDREF")) %>%
# convert eastings and northings to longitude and latitude
dplyr::mutate(latitude = BRCmap::OSGridstoLatLong(.$EASTING, .$NORTHING)[[1]],
longitude = BRCmap::OSGridstoLatLong(.$EASTING, .$NORTHING)[[2]]) %>%
# suppress warning about many to many
suppressWarnings(.)
# simple ggplot of records for Lasius niger
simple_map <- ggplot2::ggplot(data = foc_recs_gr, ggplot2::aes(x = EASTING, y = NORTHING)) +
# add points
ggplot2::geom_point() +
ggplot2::coord_equal() +
ggplot2::theme_void()
# display simple_map
simple_map
leaflet_map <- leaflet::leaflet(data = foc_recs_gr) %>%
# default basemap
leaflet::addTiles() %>%
# identify columns in dataframe containing coords
leaflet::addCircleMarkers(lng = ~longitude, lat = ~latitude)
# display leaflet_map
leaflet_map
```
## Data diagnostics
Run some data diagnostics within sparta
The plot produced shows the number of records for each year in the top plot and the average list length in a box plot at the bottom. List length is the number of taxa observed on a visit to a site, where a visit is taken to be a unique combination of ‘where’ and ‘when’. A trend in the number of observations across time is not uncommon and a formal test for such a trend is performed in the form of a linear model. Trends in the number of records over time are handled by all of the methods presented in sparta in a variety of different ways. A trend in list length can cause some methods such as the reporting rate methods to fail (see ‘LessEffortPerVisit’ scenario in <a href = "https://doi.org/10.1111/2041-210X.12254" target = "_blank">Isaac et al. (2014)</a>
```{r}
# run data diagnostics - time period is year
dd <- sparta::dataDiagnostics(taxa = recs$scientific_name,
site = recs$osgr_1km,
time_period = recs$year,
progress_bar = FALSE)
```
# Format the data for sparta
Here we format the data to match the structure needed to run a 'sparta' occupancy model
```{r}
vis_data <- sparta::formatOccData(taxa = recs$scientific_name,
site = recs$osgr_1km,
survey = recs$start_date)
# saveRDS(vis_data, "data/processed/vis_data_Ants.rds")
# summary of structure of vis_data
skimr::skim(vis_data) %>%
# remove unnecessary columns
dplyr::select(-n_missing, -complete_rate, -character.min, -character.max, -character.empty, -character.n_unique, -character.whitespace)
# view components of vis_data
## spp_vis
head(vis_data$spp_vis)
## occDetdata
head(vis_data$occDetdata)
```
# Run sparta occupancy model
Here we run a 'sparta' occupancy model
Specifically, we fit a multi-season Bayesian occupancy model to the occurrence records for each species. These models separate occupancy (the proportion of occupied 1 km grid cells) and detection into hierarchically coupled submodels to allow for imperfect detection and temporal changes in recorder intensity, which are common biases in occurrence record datasets.
We use a closure period (the temporal precision of the occupancy submodel) of one year. We therefore estimate occupancy annually between `r first_year` and `r last_year`, using a separate model for each species.
The detection submodel estimates the probability of detection based on repeat visits (a visit is a unique combination of 1 km grid cell and day) within years. As the data are presence-only, we use records of other species within the same taxonomic group (the target-group approach) to infer non-detections of the focal species.
We include the number of species recorded during a visit, categorized into lists of 1, 2–3 or 4+ records, in the detection submodel to estimate sampling intensity and variability in selective reporting.
We also add a random effect (intercept) of grid cell to allow for variation in occupancy status and uneven sampling among grid cells.
For the priors, we select a random walk prior on the year effect, which enables the sharing of information between the current and previous year in the occupancy submodel - imposing an *a priori* judgement that a species' occupancy is likely to be similar from one year to the next. We use uninformative priors for the remaining parameters within the model.
We fit occupancy models using the occDetFunc function in the sparta package, which uses a Markov Chain Monte Carlo algorithm to fit the models via JAGS.
We specify three chains, 32,000 iterations, a burn in of 30,000, and a thinning rate of six.
```{r}
system.time({
mod_out <- sparta::occDetFunc(taxa_name = as.character(focal_taxa),
occDetdata = vis_data$occDetdata,
spp_vis = vis_data$spp_vis,
write_results = TRUE,
output_dir = "data/processed/",
n_chains = 3,
n_iterations = 32000,
burnin = 30000,
thinning = 6,
nyr = 2,
modeltype = c('ranwalk', 'halfcauchy', 'catlistlength'),
provenance = "NBN Atlas Formicidae Mon May 15 15:12:40 UTC 2023. vis_data_Ants.rds",
return_data = FALSE,
seed = 842845)
})
```
# Investigating the model
## Model object and summary
```{r}
# load sparta model
mod_out <- load_rdata("data/processed/Lasius niger.rdata")
# # model code
# mod_out$model
# # The model used as provided to JAGS. Also contained is a list of fully observed variables. These are those listed in the BUGS data.
# # bugs output summary
# mod_out$BUGSoutput$summary
# # A summary table of the monitored parameters. The posterior distribution for each parameter is summaried with the mean, standard deviation, various credible intervals, a formal convergence metric (Rhat), and a measure of effective sample size (n.eff).
# data summary from model
attributes(mod_out)$metadata$analysis$summary %>%
as.data.frame() %>%
# view
head(.)
```
## Convergence
We determine convergence based on the Gelman-Rubin statistic (Rhat < 1.1)
```{r}
# compile rhats for species:year occupancy estimates (psi)
psi_rhats <- mod_out$BUGSoutput$summary %>%
data.frame() %>%
tibble::rownames_to_column("para") %>%
dplyr::filter(stringr::str_detect(para, "psi.fs"))
# view psi_rhats
DT::datatable(dplyr::select(psi_rhats, para, mean, sd, Rhat, n.eff))
# Percentage of species:year occupancy estimates (psi) that converged
signif((nrow(dplyr::filter(psi_rhats, Rhat <= 1.1)) / nrow(psi_rhats)) * 100, 2)
```
# Model results
Here we plot the occupancy trend for Lasius niger between 1970 and 2023 with the plot function in sparta. The uncertainty envelope is the 2.5th and 97.5th quantiles.
```{r}
# plot shows trend and rhats
plot(mod_out)
```
## Metrics of change
Here we calculate some metrics of change across the focal period
```{r}
# simple difference between first and last year
simp <- sparta::occurrenceChange(mod_out, firstYear = first_year, lastYear = last_year, change = "difference")
# mean, median, cis
simp[1:3]
# metric of change for 999 posterior samples
DT::datatable(simp$data)
# percentage difference between first and last year
perc <- sparta::occurrenceChange(mod_out, firstYear = first_year, lastYear = last_year, change = "percentdif")
# mean, median, cis
perc[1:3]
# metric of change for 999 posterior samples
DT::datatable(perc$data)
# annual growth rate between first and last year
grow <- sparta::occurrenceChange(mod_out, firstYear = first_year, lastYear = last_year, change = "growthrate")
# mean, median, cis
grow[1:3]
# metric of change for 999 posterior samples
DT::datatable(grow$data)
# linear growth rate between first and last year
lin <- sparta::occurrenceChange(mod_out, firstYear = first_year, lastYear = last_year, change = "lineargrowth")
# mean, median, cis
lin[1:3]
# metric of change for 999 posterior samples
DT::datatable(lin$data)
# last 10 years
# percentage difference for last 10 years
perc_10yr <- sparta::occurrenceChange(mod_out, firstYear = last_year - 10, lastYear = last_year, change = "percentdif")
# mean, median, cis
perc_10yr[1:3]
# annual growth rate for last 10 years
grow_10yr <- sparta::occurrenceChange(mod_out, firstYear = last_year - 10, lastYear = last_year, change = "growthrate")
# mean, median, cis
grow_10yr[1:3]
```
# Resources
<br/>
**Going further:**
<br/>
Multi-species indicators (see <a href = "https://github.com/BiologicalRecordsCentre/BRCindicators" target = "_blank">BRC indicators package</a>)<br/>
Species richness, species trends, temporal beta diversity (see <a href = "https://github.com/03rcooke/pa_occ" target = "_blank">Protected Areas GitHub repository</a>)<br/>
Running code on HPC (e.g., JASMIN) (see <a href = "https://github.com/03rcooke/pa_occ" target = "_blank">Protected Areas GitHub repository</a>)<br/>
ROBITT assessment (see <a href = "https://github.com/03rcooke/pa_occ" target = "_blank">Protected Areas GitHub repository</a>), including use of occAssess (<a href = "https://github.com/robboyd/occAssess" target = "_blank">occAssess GitHub repository</a>)<br/>
<br/>
**Useful papers for sparta-style occupancy models:**
<br/>
Isaac, N.J., van Strien, A.J., August, T.A., de Zeeuw, M.P. and Roy, D.B., 2014.
<a href = "https://doi.org/10.1111/2041-210X.12254" target = "_blank">Statistics for citizen science: extracting signals of change from noisy ecological data</a>, pp.1052-1060.<br/>
Outhwaite, C.L., Chandler, R.E., Powney, G.D., Collen, B., Gregory, R.D. and Isaac, N.J., 2018. <a href = "https://doi.org/10.1016/j.ecolind.2018.05.010" target = "_blank">Prior specification in Bayesian occupancy modelling improves analysis of species occurrence data</a>. Ecological Indicators, 93, pp.333-343.<br/>
Outhwaite, C.L., Powney, G.D., August, T.A., Chandler, R.E., Rorke, S., Pescott, O.L., Harvey, M., Roy, H.E., Fox, R., Roy, D.B. and Alexander, K., 2019. <a href = "https://www.nature.com/articles/s41597-019-0269-1" target = "_blank">Annual estimates of occupancy for bryophytes, lichens and invertebrates in the UK, 1970–2015</a>. Scientific data, 6(1), p.259.<br/>
Pocock, M.J., Logie, M.W., Isaac, N.J., Outhwaite, C.L. and August, T., 2019. <a href = "https://www.biorxiv.org/content/10.1101/813626v1" target = "_blank">Rapid assessment of the suitability of multi-species citizen science datasets for occupancy trend analysis</a>. bioRxiv, p.813626.
Outhwaite, C.L., Gregory, R.D., Chandler, R.E., Collen, B. and Isaac, N.J., 2020. <a href = "https://www.nature.com/articles/s41559-020-1111-z" target = "_blank">Complex long-term biodiversity change among invertebrates, bryophytes and lichens</a>. Nature ecology & evolution, 4(3), pp.384-392.<br/>
Cooke, R., Mancini, F., Boyd, R.J., Evans, K.L., Shaw, A., Webb, T.J. and Isaac, N.J., 2023. <a href = "https://doi.org/10.1016/j.biocon.2022.109884" target = "_blank">Protected areas support more species than unprotected areas in Great Britain, but lose them equally rapidly</a>. Biological Conservation, 278, p.109884.<br/>
Boyd, R.J., August, T.A., Cooke, R., Logie, M., Mancini, F., Powney, G.D., Roy, D.B., Turvey, K. and Isaac, N.J., 2023. <a href = "https://onlinelibrary.wiley.com/doi/full/10.1111/brv.12961" target = "_blank">An operational workflow for producing periodic estimates of species occupancy at national scales</a>. Biological Reviews.