-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathZoopVASTworkflow.Rmd
465 lines (353 loc) · 19.2 KB
/
ZoopVASTworkflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
---
title: "Zooplankton VAST index standardization"
author: "Sarah Gaichas"
date: "`r Sys.Date()`"
output:
html_document:
code_fold: hide
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
library(tidyverse)
library(here)
library(DT)
library(ggiraph)
library(patchwork)
```
# Introduction
The Atlantic herring research track working group is interested in exploring zooplankton indices within the assessment.
These indices could potentially explain:
1. herring growth/condition
1. herring recruitment
1. patterns in herring "survival" random effects
We have zooplankton indices already in `ecodata` at the EPU scale. These were reviewed with the working group in March 2024 (see [here](https://noaa-edab.github.io/forageindex/ZooplanktonOverview.html)).
However, we want to examine the scale of herring assessment and different combinations of zooplankton species than currently exist in `ecodata`.
We'll try VAST models for index standardization using the ECOMON data as an input.
Set of decisions:
1. zooplankton species
2. which zooplankton metric (area or volume)
3. spatial footprint
4. temporal scale (seasonal? annual?)
5. potential covariates
# Explore data
## Zooplankton data
Read in file obtained from Scott Large, source of his zooplankton models:
```{r}
ecomonall <- read_csv("data/EcoMon_Plankton_Data_v3_8.csv")
names(ecomonall)
```
A lookup of these column headings is here: https://www.fisheries.noaa.gov/inport/item/35054
## Herring diets
What herring eat from our food habits data is summarized here: https://fwdp.shinyapps.io/tm2020/
Copepods are highest diet proportion, followed by well digested prey (AR), and euphausiids. These are aggregate categories so I should look at the breakdown to species.
```{r, eval=FALSE}
# run the diet script that we used for bluefish? gives diet by season and year
# Load NEFSC stomach data received from Brian Smith
# put the get_diet function on this repo in R folder
source(here::here("R/get_diet.R"))
# what is the code
ecodata::species_groupings |> dplyr::filter(RPATH == "AtlHerring")
# get diet takes SVSPP as argument
herringdiet <- get_diet(32)
saveRDS(herringdiet, here::here("data/herringdiet.rds"))
```
```{r}
herringdiet <- readRDS(here::here("data/herringdiet.rds"))
p1 <- ggplot(herringdiet, aes(year, relmsw, fill=prey)) +
geom_bar(stat = "identity") + #
ylab("Percent in Diet") +
xlab("Year") +
facet_wrap("season", nrow=4) +
theme_bw() +
viridis::scale_fill_viridis(discrete = TRUE) +
theme(legend.position = "none"
#legend.position="bottom",
#legend.text=element_text(size=5)
) +
geom_bar_interactive(stat = "identity", aes(tooltip = prey, data_id = prey))
ggiraph(code = print(p1), height=14)
```
# Initial decisions
## 1. Zooplankton species: VAST Categories
Maybe start with total zooplankton volume, Copepoda, Euphausiids, and *Calanus finmarchicus*? And keep both spatial units. These would be separate univariate models, though we could try a multivariate model with all too.
It appears that Copepoda in the ecomon dataset is not all copepods together, as I hoped, but is unid copepods. Very few are in the data.
So I will sum everything that Harvey classified as a copeopod in [this spreadsheet that he shared with Scott](https://github.com/NOAA-EDAB/zooplanktonindex/blob/main/data/ForageTaxaUse_August2022.xlsx).
```{r}
sumcopepods <- ecomonall |>
dplyr::rowwise() |>
dplyr::mutate(allcopepods_10m2 = sum(ctyp_10m2,
calfin_10m2,
pseudo_10m2,
tlong_10m2,
cham_10m2,
para_10m2,
acarspp_10m2,
mlucens_10m2,
calminor_10m2,
clauso_10m2,
acarlong_10m2,
euc_10m2,
fur_10m2,
calspp_10m2,
ost_10m2,
temspp_10m2,
tort_10m2,
paraspp_10m2,
na.rm = TRUE
),
allcopepods_100m3 = sum(ctyp_100m3,
calfin_100m3,
pseudo_100m3,
tlong_100m3,
cham_100m3,
para_100m3,
acarspp_100m3,
mlucens_100m3,
calminor_100m3,
clauso,
acarlong_100m3,
euc_100m3,
fur_100m3,
calspp_100m3,
ost_100m3,
temspp_100m3,
tort_100m3,
paraspp_100m3,
na.rm = TRUE
)
) |>
dplyr::select(cruise_name, station, lat, lon, date, time, depth,
sfc_temp, sfc_salt, btm_temp, btm_salt, volume_1m2, volume_100m3,
allcopepods_10m2, allcopepods_100m3)
herringfood <- ecomonall |>
dplyr::left_join(sumcopepods) |>
dplyr::select(cruise_name, station, lat, lon, date, time, depth,
sfc_temp, sfc_salt, btm_temp, btm_salt, volume_1m2, volume_100m3,
allcopepods_10m2, allcopepods_100m3, euph_10m2, euph_100m3,
hyper_10m2, hyper_100m3, calfin_10m2, calfin_100m3) |>
dplyr::mutate(date = lubridate::dmy(date),
year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date),
stationid = paste0(cruise_name, "_", station),
season_ng = case_when(month <= 6 ~ "SPRING",
month >= 7 ~ "FALL",
TRUE ~ as.character(NA)),
season_4 = case_when(month %in% c(12,1,2) ~ "winter",
month %in% c(3,4,5) ~ "spring",
month %in% c(6,7,8) ~ "summer",
month %in% c(9,10,11) ~ "fall",
TRUE ~ as.character(NA)))
nonzeroobs <- herringfood |>
dplyr::group_by(year, month) |>
dplyr::summarise(totzoo = sum(volume_100m3 != 0, na.rm = TRUE),
cope = sum(allcopepods_100m3 != 0, na.rm = TRUE),
euph = sum(euph_100m3 != 0, na.rm = TRUE),
hyp = sum(hyper_100m3 != 0, na.rm = TRUE),
calfin = sum(calfin_100m3 != 0, na.rm = TRUE))
```
So we will look at total zooplankton displacement volume: `volume_1m2, volume_100m3`
Total euphausiids: `euph_10m2, euph_100m3`
Total hyperiids: `hyper_10m2, hyper_100m3`
*Calanus finmarchicus*: `calfin_10m2, calfin_100m3`
Total copeopds: summing over the following names for each of the area or volume metrics, only area shown here
`ctyp_10m2, calfin_10m2, pseudo_10m2, tlong_10m2, cham_10m2, para_10m2, acarspp_10m2, mlucens_10m2, calminor_10m2, clauso_10m2, acarlong_10m2, euc_10m2, fur_10m2, calspp_10m2, ost_10m2, temspp_10m2, tort_10m2, paraspp_10m2`
These are the corresponding species names, with Large copepods denoted by *
Centropages typicus, Calanus finmarchicus\*, Pseudocalanus spp., Temora longicornis, Centropages hamatus, Paracalanus parvus, Acartia spp., Metridia lucens\*, Calanus minor\*, Clausocalanus arcuicornis, Acartia longiremis, Eucalanus spp.\*, Clausocalanus furcatus, Calanus spp.\*, Temora stylifera, Temora spp., Tortanus discaudatus, Paracalanus spp.
Added four more categories based on discussion with ToR 1 subgroup:
Small copepods ALL:
Centropages typicus, Pseudocalanus spp., Temora longicornis, Centropages hamatus, Paracalanus parvus, Acartia spp., Clausocalanus arcuicornis, Acartia longiremis, Clausocalanus furcatus, Temora stylifera, Temora spp., Tortanus discaudatus, Paracalanus spp.
Large copepods ALL:
Calanus finmarchicus\*, Metridia lucens\*, Calanus minor\*, Eucalanus spp.\*, Calanus spp.\*
Small copeopods SOE (used in small-large index):
Centropages typicus, Pseudocalanus spp., Temora longicornis, Centropages hamatus
Large copeopds SOE (used in small-large index) = Calanus finmarchicus\*
Dataset years extend from `r sort(unique(herringfood$year))[1]` to `r sort(unique(herringfood$year))[length(unique(herringfood$year))]`
(As of June 21, data expanded to include Fall 2022, but table still covers old dataset because I've been asked not to distribute the new dataset yet. Models run after 21 June will use data through 2022 from new dataset.)
This is a list of stations with nonzero samples by year and month to better understand seasonal coverage.
```{r}
DT::datatable(nonzeroobs)
```
## 2. Which metric?
Ryan recommends the numbers per 100 cubic meters of filtered water volume `_100m3` data, which should match what is used in the SOE indices. However, he thinks that anomalies calculated from the area `_10m2` data should be the same as those calculated from the volume based numbers. This makes sense to me.
Harvey is using the numbers per 100 cubic meters for other analyses as well.
Let's start with the `_100m3` numbers for each category. We can try one with the other metric and see if it does make a difference as well.
## 3. Spatial scale options:
1. Herring survey strata (differ by spring and fall)
2. EPUs for comparison with SOE datasets
3. Full shelf/AllEPU
Lets look at the spatial extent of the data in each season to see if we need to limit model domain. A test model of total zoop biovolume (present at each station in many years) failed using the full set of EPUs because the spatial random effects parameter was headed for 0. This is what happened to the herring index fall model when there were no observations in the southern end of the domain.
This is zooplankton volume by season (half years)
Now using data processed in [`VASTzoopindex_processinputs.R`](https://github.com/NOAA-EDAB/zooplanktonindex/blob/main/data/VASTzoopindex_processinputs.R)
```{r}
herringfood_stn <- readRDS(here::here("data/herringfood_stn_all.rds"))
# make SST column that uses surftemp unless missing or 0
#herringfood_stn <- herringfood_stn %>%
# dplyr::mutate(sstfill = ifelse((is.na(sfc_temp)|sfc_temp==0), oisst, sfc_temp))
# code Vessel as AL=0, HB=1, NEAMAP=2
herringfood_stn_fall <- herringfood_stn %>%
#ungroup() %>%
filter(season_ng == "FALL", # Annual model for MRIP index
year > 1976) %>%
mutate(AreaSwept_km2 = 1, #Elizabeth's code
#declon = -declon already done before neamap merge
Vessel = 1 #as.numeric(as.factor(vessel))-1
) %>%
dplyr::select(Catch_g = volume_100m3, #use megabenwt for individuals input in example
Year = year,
Vessel,
AreaSwept_km2,
Lat = lat,
Lon = lon #,
#btm_temp, #this leaves out many stations
#sfc_temp, #this leaves out many stations
#oisst,
#sstfill
) %>%
na.omit() %>%
as.data.frame()
herringfood_stn_spring <- herringfood_stn %>%
#ungroup() %>%
filter(season_ng == "SPRING", # Annual model for MRIP index
year > 1976) %>%
mutate(AreaSwept_km2 = 1, #Elizabeth's code
#declon = -declon already done before neamap merge
Vessel = 1 #as.numeric(as.factor(vessel))-1
) %>%
dplyr::select(Catch_g = volume_100m3, #use megabenwt for individuals input in example
Year = year,
Vessel,
AreaSwept_km2,
Lat = lat,
Lon = lon #,
#btm_temp, #this leaves out many stations
#sfc_temp, #this leaves out many stations
#oisst,
#sstfill
) %>%
na.omit() %>%
as.data.frame()
nonzerofall <- herringfood_stn_fall |>
dplyr::filter(Catch_g > 0) #,
#Year > 1981)
nonzerospring <- herringfood_stn_spring |>
dplyr::filter(Catch_g > 0) #,
# Year > 1981)
Fall <- ggplot(data = ecodata::coast) +
geom_sf() +
geom_point(data = FishStatsUtils::northwest_atlantic_grid, aes(x = Lon, y = Lat), colour = "coral4", size=0.05, alpha=0.1) +
geom_point(data = nonzerofall, aes(x = Lon, y = Lat), colour = "blue", size=0.5, alpha=1) +
coord_sf(xlim =c(-78.5, -65.5), ylim = c(33, 45)) + #zoomed to Hatteras and N
xlab("") +
ylab("") +
ggtitle("Fall zoo biovolume 1982-2022")+
theme(plot.margin = margin(0, 0, 0, 0, "cm"))
Spring <- ggplot(data = ecodata::coast) +
geom_sf() +
geom_point(data = FishStatsUtils::northwest_atlantic_grid, aes(x = Lon, y = Lat), colour = "coral4", size=0.05, alpha=0.1) +
geom_point(data = nonzerospring, aes(x = Lon, y = Lat), colour = "blue", size=0.5, alpha=1) +
coord_sf(xlim =c(-78.5, -65.5), ylim = c(33, 45)) + #zoomed to Hatteras and N
xlab("") +
ylab("") +
ggtitle("Spring zoo biovolume 1982-2022")+
theme(plot.margin = margin(0, 0, 0, 0, "cm"))
Spring + Fall
```
## 4. Temporal scale options:
I need to decide how to break up the data temporally, thinking by season associated with herring population processes. So there could be a winter index, a spring index, a summer index, and a fall index. What months?
Mark W's [herring spawn timing](https://drive.google.com/drive/folders/1vMnK6j01gM0ujv6sy5AClkXNn3w0UsAX) info could be useful here. Do they feed while spawning?
Alternatively, at least two seasons should match the bottom trawl survey timing if we want to use these indices as covariates for availability/catchability.
Mark W's spawn timing paper also looked at changes in survey timing.
What is NEFSC BTS survey timing? Histograms of survey month grouped by FALL, SPRING:
```{r}
survdat <- readRDS(url("https://github.com/NOAA-EDAB/benthosindex/raw/main/data/survdat_nobio.rds"))
survdat$survdat |>
dplyr::mutate(date = lubridate::ymd_hms(EST_TOWDATE),
month = lubridate::month(date)) |>
dplyr::group_by(SEASON) |>
dplyr::group_walk(~ hist(.x$month,
breaks = c(1:12),
right = TRUE))
```
1. 4 models, 2 matching survey seasons (spring/fall) and try two for winter and summer to cover other portions of herring life history. Current designations are
* "winter" months 12, 1, 2
* "spring" months 3, 4, 5
* "summer" months 6, 7, 8
* "fall" months 9, 10, 11
1. Alternatively, roll winter into spring and summer into fall as done for bluefish.
* "SPRING" months 1-6
* "FALL" months 7-12
1. Not sure an annual model makes sense for zooplankton but we could try it, may compare with SOE indices?
## 5. Covariates
Probably at a minimum temperature should be explored. Catchability or habitat? Its both
Day of year or day of season? potential catchability covariate for this high turnover group
A frontal index might be useful? As a habitat covaraite?
### Temperature
We will need to merge in alternative temperature to use all the data since some are NAs.
We have `r dim(herringfood)[1]` rows (stations) total, `r dim(herringfood |> dplyr::filter(!is.na(sfc_temp)))[1]` that have surface temperature, and `r dim(herringfood |> dplyr::filter(!is.na(btm_temp)))[1]` that have bottom temperature.
Likely surface temperature is more relevant for zooplankton?
Missing temperature data by year and season (SPRING and FALL):
```{r}
stnsummary <- herringfood_stn %>%
#dplyr::filter(year>1984) %>%
dplyr::group_by(year, season_ng) %>%
dplyr::summarise(nstation = n(),
hastemp = sum(!is.na(sfc_temp))) %>%
dplyr::ungroup() %>%
dplyr::mutate(percentmiss = (nstation - hastemp)/nstation*100) %>%
tidyr::pivot_wider(names_from = season_ng,
values_from = c(nstation, hastemp, percentmiss)) %>%
dplyr::select(year, nstation_SPRING, hastemp_SPRING, percentmiss_SPRING,
nstation_FALL, hastemp_FALL, percentmiss_FALL)
flextable::flextable(stnsummary) %>%
flextable::set_header_labels(year = 'Year',
nstation_SPRING = "Spring N stations",
hastemp_SPRING = "Spring N stations with in situ temperature",
percentmiss_SPRING = "Spring percent missing temperature",
nstation_FALL = "Fall N stations",
hastemp_FALL = "Fall N stations with in situ temperature",
percentmiss_FALL = "Fall percent missing temperature") %>%
flextable::set_caption("Number of survey stations by year and season with in situ sea surface temperature measurements.") %>%
flextable::colformat_double(big.mark = "", digits = 0) %>%
#flextable::autofit()
flextable::width(width = 1)
```
Missing temperature data by year and season (winter, spring, summer, fall):
```{r}
stnsummary <- herringfood_stn %>%
#dplyr::filter(year>1984) %>%
dplyr::group_by(year, season_4) %>%
dplyr::summarise(nstation = n(),
hastemp = sum(!is.na(sfc_temp))) %>%
dplyr::ungroup() %>%
dplyr::mutate(percentmiss = (nstation - hastemp)/nstation*100) %>%
tidyr::pivot_wider(names_from = season_4,
values_from = c(nstation, hastemp, percentmiss)) %>%
dplyr::select(year, nstation_winter, hastemp_winter, percentmiss_winter,
nstation_spring, hastemp_spring, percentmiss_spring,
nstation_summer, hastemp_summer, percentmiss_summer,
nstation_fall, hastemp_fall, percentmiss_fall)
flextable::flextable(stnsummary) %>%
flextable::set_header_labels(year = 'Year',
nstation_winter = "Winter N stations",
hastemp_winter = "Winter N stations with in situ temperature",
percentmiss_winter = "Winter percent missing temperature",
nstation_spring = "Spring N stations",
hastemp_spring = "Spring N stations with in situ temperature",
percentmiss_spring = "Spring percent missing temperature",
nstation_summer = "Summer N stations",
hastemp_summer = "Summer N stations with in situ temperature",
percentmiss_summer = "Summer percent missing temperature",
nstation_fall = "Fall N stations",
hastemp_fall = "Fall N stations with in situ temperature",
percentmiss_fall = "Fall percent missing temperature") %>%
flextable::set_caption("Number of survey stations by year and season with in situ sea surface temperature measurements.") %>%
flextable::colformat_double(big.mark = "", digits = 0) %>%
#flextable::autofit()
flextable::width(width = 1)
```
This can be done with the OISST data as was done for the forage index, or the bottom temperature data as was done for the benthos index.
### Day of year
We can derive from date field in dataset, convert to season for seasonal models