Modelling_growth_and_its_uncertainty_in_FLa4a.Rmd

---
title: Modelling Individual Growth and Using Stochastic Slicing to Convert Length-based Data Into Age-based Data
subtitle: Assessment for All initiative (a4a) 
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
github_document:
 mathjax: TRUE
pdf_document:
  fig_width: 6 
  fig_height: 4 
tags: [FLR]
license: Creative Commons CC-BY SA
---

```{r, ini, echo=FALSE, results='hide', message=FALSE, warnings=FALSE, cache=FALSE}
library(knitr)
source("R/ini.R")
```
 
The document explains the approach being developed by a4a to integrate uncertainty in individual growth into stock assessment and advice. It presents a mixture of text and code, where the former explains the concepts behind the methods, while the latter shows how these can be run with the software provided.

## Required packages

To follow this tutorial you should have installed the following packages:

- CRAN: [copula](https://cran.r-project.org/web/packages/copula/index.html), [triangle](https://cran.r-project.org/web/packages/triangle/index.html), [coda](https://cran.r-project.org/web/packages/coda/index.html), [XML](https://cran.r-project.org/web/packages/XML/index.html),[reshape2](https://cran.r-project.org/web/packages/reshape2/index.html), [latticeExtra](https://cran.r-project.org/web/packages/latticeExtra/index.html)

- FLR: [FLCore](http://www.flr-project.org/FLCore/), [FLa4a](http://www.flr-project.org/FLa4a/)

You can do so as follows,

```{r, eval=FALSE}
install.packages(c("copula","triangle", "coda", "XML", "reshape2", "latticeExtra"))
# from FLR
install.packages(c("FLCore", "FLa4a"), repos="http://flr-project.org/R")
```

```{r, pkgs}
# Load all necessary packages and datasets; trim pkg messages
library(FLa4a)
library(XML)
library(reshape2)
library(latticeExtra)
data(ple4)
data(ple4.indices)
data(ple4.index)
data(rfLen)
```

# Background

The a4a stock assessment framework is based on age dynamics. Therefore, to use length information, this length information must be processed before it can be used in an assessment. The rationale is that the processing should give the analyst the flexibility to use a range of sources of information, e.g. literature or online databases, to obtain information about the species growth model and the uncertainty about model parameters.

Within the a4a framework, this is handled using the `a4aGr` class. In this section we introduce the `a4aGr` class and look at a variety of ways that parameter uncertainty can be included.

For more information on the a4a methodologies refer to [Jardim, et.al, 2014](http://icesjms.oxfordjournals.org/content/early/2014/04/03/icesjms.fsu050.abstract), [Millar, et.al, 2014](http://icesjms.oxfordjournals.org/content/early/2014/03/31/icesjms.fsu043.abstract) and [Scott, et.al, 2016](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154922).

# `a4aGr` - The growth class

The conversion of length data to age is performed through a growth model. The implementation is done through the `a4aGr` class.

```{r, show_a4aGr}
showClass("a4aGr")
```

To construct an `a4aGr` object, the growth model and parameters must be provided. Check the help file for more information.

Here we show an example using the von Bertalanffy growth model. To create the `a4aGr` object, it is necessary to pass the model equation ($length \sim time$), the inverse model equation ($time \sim length$) and the parameters. Any growth model can be used as long as it is possible to write the model (and the inverse) as an R formula.

```{r, a4aGr_vB_example, echo=TRUE}
vbObj <- a4aGr(
	grMod=~linf*(1-exp(-k*(t-t0))),      
	grInvMod=~t0-1/k*log(1-len/linf),      
	params=FLPar(linf=58.5, k=0.086, t0=0.001, units=c('cm','year-1','year')))

# Check the model and its inverse
lc=20
c(predict(vbObj, len=lc))
c(predict(vbObj, t=predict(vbObj, len=lc))==lc)
```

The predict method allows the transformation between age and lengths using the growth model.

```{r, predict_araGr_example}
c(predict(vbObj, len=5:10+0.5))
c(predict(vbObj, t=5:10+0.5))
```

# Adding uncertainty to growth parameters with a multivariate normal distribution

Uncertainty in the growth model is introduced through the inclusion of parameter uncertainty.
This is done by making use of the parameter variance-covariance matrix (the *vcov* slot of the `a4aGr` class) and assuming a distribution. The numbers in the variance-covariance matrix could come from the parameter uncertainty from fitting the growth model parameters.

Here we set the variance-covariance matrix by scaling a correlation matrix, using a cv of 0.2, based on 

$$\rho_{x,y}=\frac{\Sigma_{x,y}}{\sigma_x \sigma_y}$$

and 

$$CV_x=\frac{\sigma_x}{\mu_x}$$

```{r, set_vcov_example}
# Make an empty cor matrix
cm <- diag(c(1,1,1))
# k and linf are negatively correlated while t0 is independent
cm[1,2] <- cm[2,1] <- -0.5
# scale cor to var using CV=0.2
cv <- 0.2
p <- c(linf=60, k=0.09, t0=-0.01)
vc <- matrix(1, ncol=3, nrow=3)
l <- vc
l[1,] <- l[,1] <- p[1]*cv
k <- vc
k[,2] <- k[2,] <- p[2]*cv
t <- vc
t[3,] <- t[,3] <- p[3]*cv
mm <- t*k*l
diag(mm) <- diag(mm)^2
mm <- mm*cm
# check that we have the intended correlation
all.equal(cm, cov2cor(mm))
```

Create the `a4aGr` object as before, but now we also include the *vcov* argument for the variance-covariance matrix.

```{r, making_vcov_example}
vbObj <- a4aGr(grMod=~linf*(1-exp(-k*(t-t0))), grInvMod=~t0-1/k*log(1-len/linf), 
               params=FLPar(linf=p['linf'], k=p['k'], t0=p['t0'], 
                            units=c('cm','year-1','year')), vcov=mm)
```

First we show a simple example where we assume that the parameters are represented using a multivariate normal distribution.

```{r, simulate_vcov_example}
# Note that the object we have just created has a single iteration for each parameter
vbObj@params
dim(vbObj@params)
# We simulate 10000 iterations from the a4aGr object by calling mvrnorm() using the 
# variance-covariance matrix we created earlier.
vbNorm <- mvrnorm(10000,vbObj)
# Now we have 10000 iterations of each parameter, randomly sampled from the 
# multivariate normal distribution
vbNorm@params
dim(vbNorm@params)
```
We can now convert from length to ages data based on the 10000 parameter iterations. This gives us 10000 sets of age data. For example, here we convert a single length vector using each of the 10000 parameter iterations: 

```{r, }
ages <- predict(vbNorm, len=5:10+0.5)
dim(ages)
# We show the first ten iterations only as an illustration
ages[,1:10]
```

The marginal distributions can be seen in `r fign('plot_norm_params')`.

```{r, plot_norm_params, fig.cap="The marginal distributions of each of the parameters when using a multivariate normal distribution.", echo=FALSE}
par(mfrow=c(3,1))
hist(c(params(vbNorm)['linf',]), main='linf', prob=TRUE, xlab='')
hist(c(params(vbNorm)['k',]), main='k', prob=TRUE, xlab='')
hist(c(params(vbNorm)['t0',]), main='t0', prob=TRUE, xlab='')
```

The shape of the correlation can be seen in `r fign('plot_norm_scatter')`.

```{r, plot_norm_scatter, fig.cap="Scatter plot of the 10000 samples parameter from the multivariate normal distribution.", echo=FALSE}
splom(data.frame(t(params(vbNorm)@.Data)), par.settings=list(plot.symbol=list(pch=19, cex=0.1, col=1)))
```

Growth curves for the 10000 iterations can be seen in `r fign('plot_mv_growth')`.


```{r, plot_mv_growth, fig.cap="Growth curves using parameters simulated from a multivariate normal distribution.", echo=FALSE}
bwplot(value~factor(Var1), data=melt(predict(vbNorm, t=0:50+0.5)), par.settings=list(plot.symbol=list(cex=0.2, col='gray50'), box.umbrella=list(col='gray40'), box.rectangle=list(col='gray30')), ylab='length (cm)', xlab='age (years)', scales=list(x=list(rot=90)))
```

# Adding uncertainty to growth parameters with a multivariate triangle distribution
\label{sec:growth_triangle_cop}

One alternative to using a normal distribution is to use a [triangle distribution](http://en.wikipedia.org/wiki/Triangle\_distribution). We use the package [triangle](http://cran.r-project.org/web/packages/triangle/index.html), where this distribution is parametrized using the minimum, maximum and median values. This can be very attractive if the analyst needs to, for example, obtain information from the web or literature and perform a meta-analysis.

Here we show an example of setting a triangle distribution with values taken from Fishbase. Note, this example needs an internet connection to access Fishbase.

```{r, tri_example}
# The web address for the growth parameters for redfish (Sebastes norvegicus)
addr <- 'http://www.fishbase.org/PopDyn/PopGrowthList.php?ID=501'
# Scrape the data
tab <- try(readHTMLTable(addr))
# Interrogate the data table and get vectors of the values
linf <- as.numeric(as.character(tab$dataTable[,2]))
k <- as.numeric(as.character(tab$dataTable[,4]))
t0 <- as.numeric(as.character(tab$dataTable[,5]))
# Set the min (a), max (b) and median (c) values for the parameter as a list of lists
# Note in this example that t0 has no 'c' (median) value which makes the distribution symmetrical
triPars <- list(list(a=min(linf), b=max(linf), c=median(linf)),
             list(a=min(k), b=max(k), c=median(k)),
             list(a=median(t0, na.rm=T)-IQR(t0, na.rm=T)/2, b=median(t0, na.rm=T)+
                    IQR(t0, na.rm=T)/2))
# Simulate 10000 times using mvrtriangle
vbTri <- mvrtriangle(10000, vbObj, paramMargins=triPars)
```
The marginals will reflect the uncertainty in the parameter values that were scraped from [Fishbase](http://fishbase.org) but, because we don't really believe the parameters are multivariate normal, here we adopted a distribution based on a *t* copula with triangle marginals.

The marginal distributions can be seen in `r fign('plot_tri_params')` and the shape of the correlation can be seen in `r fign('plot_tri_scatter')`.

```{r, plot_tri_params, echo=FALSE, fig.cap="The marginal distributions of each of the parameters based on a multivariate triangle distribution."}
par(mfrow=c(3,1))
hist(c(params(vbTri)['linf',]), main='linf', prob=TRUE, xlab='')
hist(c(params(vbTri)['k',]), main='k', prob=TRUE, xlab='')
hist(c(params(vbTri)['t0',]), main='t0', prob=TRUE, xlab='')
```
```{r, plot_tri_scatter, echo=FALSE, fig.cap="Scatter plot of the 10000 parameter sets from the multivariate triangle distribution."}
splom(data.frame(t(params(vbTri)@.Data)), par.settings=list(plot.symbol=list(pch=19, cex=0.1, col=1)))
```
We can still use `predict()` to see the growth model uncertainty (`r fign('plot_tri_growth')`).

```{r, plot_tri_growth, echo=FALSE, fig.cap="Growth curves using parameters simulated from a multivariate triangle distribution."}
bwplot(value~factor(Var1), data=melt(predict(vbTri, t=0:50+0.5)), par.settings=list(plot.symbol=list(cex=0.2, col='gray50'), box.umbrella=list(col='gray40'), box.rectangle=list(col='gray30')), ylab='length (cm)', xlab='age (years)', scales=list(x=list(rot=90)))
```
Note that the above examples use variance-covariance matrices that were made up. An alternative would be to scrape the entire growth parameters dataset from Fishbase and compute the shape of the variance-covariance matrix from this dataset.

# Adding uncertainty to growth parameters with statistical copulas

A more general approach to adding parameter uncertainty is to make use of [statistical copulas](http://www.encyclopediaofmath.org/index.php/Copula) and marginal distributions of choice. This is possible with the `mvrcop()` function borrowed from the package [copula](http://cran.r-project.org/web/packages/copula/). The example below keeps the same parameters and changes only the copula type and family, but a lot more can be done. Check the package *copula* for more information. 

```{r, copula_triangle_example}
vbCop <- mvrcop(10000, vbObj, copula='archmCopula', family='clayton', param=2, 
                margins='triangle', paramMargins=triPars)
```
The shape of the correlation changes is shown in (`r fign('plot_cop_tri_scatter')`) and the resulting growth curves in (`r fign('plot_cop_tri_growth')`). 

```{r, plot_cop_tri_scatter, echo=FALSE, fig.cap="Scatter plot of the 10000 parameter sets based on the archmCopula copula with triangle margins."}
splom(data.frame(t(params(vbCop)@.Data)), par.settings=list(plot.symbol=list(pch=19, cex=0.1, col=1)))
```
```{r, plot_cop_tri_growth, fig.cap="Growth curves based on the archmCopula copula with triangle margins.", echo=FALSE}
bwplot(value~factor(Var1), data=melt(predict(vbCop, t=0:50+0.5)), par.settings=list(plot.symbol=list(cex=0.2, col='gray50'), box.umbrella=list(col='gray40'), box.rectangle=list(col='gray30')), ylab='length (cm)', xlab='age (years)', scales=list(x=list(rot=90)))
```

# Converting from length to age based data - the `l2a()` method

After introducing uncertainty in the growth model through the parameters, we can transform the length-based dataset into an age-based one. The method that deals with this process is `l2a()`. The implementation of this method for the `FLQuant` class is the main workhorse. There are two other implementations, for the `FLStock` and `FLIndex` classes, which are mainly wrappers that call the `FLQuant` method several times.

When converting from length-based data to age-based data, we need to be aware of how the aggregation of length classes is performed. For example, individuals in length classes 1-2, 2-3, and 3-4 cm may all be considered age 1 fish (depending on the growth model). How should the values in those length classes be combined?

If the values are abundances, they should be summed. Summing other types of values, such as mean weight, does not make sense. Instead, these values are averaged over the length classes (possibly weighted by the abundance). This is controlled using the *stat* argument, which can either be *mean* or *sum* (the default). Fishing mortality is not computed to avoid making inappropriate assumptions about the meaning of F at length.

We demonstrate the method by converting a catch-at-length `FLQuant` to a catch-at-age `FLQuant`. First we make an `a4aGr` object with a multivariate triangle distribution (using the parameters we derive above). We use 10 iterations as an example, and call `l2a()` by passing it the length-based `FLQuant` and the `a4aGr` object.

```{r, FLQ_l2a, message=FALSE}
vbTriSmall <- mvrtriangle(10, vbObj, paramMargins=triPars)
cth.n <- l2a(catch.n(rfLen.stk), vbTriSmall)
```
```{r, example_flq_slice}
dim(cth.n)
```
In the previous example, the `FLQuant` object that was sliced (`catch.n(rfLen.stk)`) had only one iteration. This iteration was sliced by each of the iterations in the growth model. It is possible for the `FLQuant` object to have the same number of iterations as the growth model, in which case each iteration of the `FLQuant` and the growth model are used together. It is also possible for the growth model to have only one iteration while the `FLQuant` object has many iterations. The same growth model is then used for each of the `FLQuant` iterations. As with all FLR objects, the general rule is *one or n* iterations.

As well as converting one `FLQuant` at a time, we can convert entire `FLStock` and `FLIndex` objects. In these cases, the individual `FLQuant` slots of those classes are converted from length to age. As mentioned above, the aggregation method depends on the types of value the slots contain. The abundance slots (.n, such as stock.n) are summed. The wt, m, mat, harvest.spwn and m.spwn slots of an `FLStock` object are averaged. The catch.wt and sel.pattern slots of an `FLIndex` object are averaged, while the index, index.var and catch.n slots are summed. 

The method for `FLStock` classes takes an additional argument for the plusgroup.

```{r, FLS_FLI_l2a, message=FALSE, warning=TRUE}
aStk <- l2a(rfLen.stk, vbTriSmall, plusgroup=14)
aIdx <- l2a(rfTrawl.idx, vbTriSmall)
```

When converting with `l2a()` all lengths above Linf are converted to the maximum age, because there is no information in the growth model about how to deal with individuals larger than Linf. 

# References

Ernesto Jardim, Colin P. Millar, Iago Mosqueira, Finlay Scott, Giacomo Chato Osio, Marco Ferretti, Nekane Alzorriz, Alessandro Orio; What if stock assessment is as simple as a linear model? The a4a initiative. ICES J Mar Sci 2015; 72 (1): 232-236. DOI: https://doi.org/10.1093/icesjms/fsu050

Colin P. Millar, Ernesto Jardim, Finlay Scott, Giacomo Chato Osio, Iago Mosqueira, Nekane Alzorriz; Model averaging to streamline the stock assessment process. ICES J Mar Sci 2015; 72 (1): 93-98. DOI: https://doi.org/10.1093/icesjms/fsu043

Scott F, Jardim E, Millar CP, Cervino S (2016) An Applied Framework for Incorporating Multiple Sources of Uncertainty in Fisheries Stock Assessments. PLoS ONE 11(5): e0154922. DOI: https://doi.org/10.1371/journal.pone.0154922

# More information

Documentation can be found at (http://flr-project.org/FLa4a). You are welcome to:

- Submit suggestions and bug-reports at: (https://github.com/flr/FLa4a/issues)
- Send a pull request on: (https://github.com/flr/FLa4a/)
- Compose a friendly e-mail to the maintainer, see `packageDescription('FLa4a')`

## Software Versions

* `r version$version.string`
* FLCore: `r packageVersion('FLCore')`
* FLa4a: `r packageVersion('FLa4a')`
* **Compiled**: `r date()`

## License

This document is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0) license.

## Author information

**Ernesto Jardim**. European Commission, DG Joint Research Centre, Directorate D - Sustainable Resources, Unit D.02 Water and Marine Resources, Via E. Fermi 2749, 21027 Ispra VA, Italy. <https://ec.europa.eu/jrc/>