here_be_dragons.Rmd

---
title: "Here be dragons"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Here be dragons}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
bibliography: vignette.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = TRUE
)
# job::job({
# knitr::knit("vignettes/here-be-dragons.Rmd.orig", "vignettes/here-be-dragons.Rmd")
# })

options(rmarkdown.html_vignette.check_title = FALSE)
```

## Load dependencies
```{r}
#remotes::install_github("traitecoevo/hmde@3435682c09f1378b3d1b7a23563ec4ae2240cbc4", force = TRUE)

library(hmde)
library(dplyr)
library(ggplot2)
library(deSolve)
library(cowplot)
library(mixtools)
library(MASS)
library(rstan)

source("R/helper_here_be_dragons.R")
```

This vignette demonstrates an interaction between errors from numerical integration methods and MCMC sampling that produces a bimodal posterior distribution as a result of numerical error. We stumbled across the problem and are documenting it here, but account for it in the package through either using analytic solutions, or numerical methods with adaptive step sizes where analytic solutions are not available.

The underlying issue is that if, given errors in the chosen numerical integration method, two sets of parameters for a differential equation $f$ give "the same" (more in a second) output for
$$Y(t_{j+1}) = Y(t_j) + \int_{t_j}^{t_{j+1}} f(Y(t), \boldsymbol{\theta})\,dt\qquad (1)$$
the MCMC sampler will be unable to meaningfully distinguish the parameter combinations. What you see in practice is chains converging to extremely different parameter combinations, one of which is the `true' combination, the other of which is wrong but produces the same $\hat{Y}(t_j)$ values due to the numerical error. Thus, there is a form of non-identifiability that arises from numerical errors in a longitudinal model based on Equation (1) that are separate to other issues of non-identifiability, and currently under-explored in the literature. In this demonstration we use a Runge-Kutta 4th order numerical method (\cite{butcher2016numerical}) with different step sizes to show that even 'small' step sizes can give problems.

It is important to state that "the same" $Y(t)$ values in this context means within statistical error of each other. We assume that our data consists of observations of the form $y_{j}$ at time $t_j$ that look like
$$y_j = Y(t_j) + \text{error},$$
and have some finite level of precision. The numerical method may produce estimated values $\hat{Y}(t_j)$ that differ by some amount that is much smaller than the level of precision or observation error for the different parameter combinations, but due to the imprecision of the measurement process the MCMC sampler cannot meaningfully distinguish the estimates. 

For this demonstration we will simulate data as though it is measured in centimetres. We use rounding to produce simulated data with measurement precision of 0.1cm, and error of $\mathcal{N}(0, 0.1)$, analogous to the 1mm measurement precision and approximate standard deviation of the real-world source data used in @obrien2024allindividuals.

## The model
We are implementing a longitudinal model of the form in Equation (1) within a hierarchical Bayesian longitudinal model where
$$f(Y(t), \beta_0, \beta_1) = \beta_0 - \beta_1 Y(t)\qquad (2)$$
Equation (2) is known to produce pathological behaviour from numerical methods [@butcher2016numerical], so serves as an ideal simple example of the interaction between those pathologies and the MCMC sampling process. We are attempting to estimate the parameters $\beta_0$ and $\beta_1$ from observations $y_j$, which is based on estimating $\hat{Y}_j$ given the prior distribution
$$y_j \sim \mathcal{N}(\hat{Y}_j, 0.1).$$

As we are looking at a single individual, we have prior distributions for the parameters which are
$$\beta_k \sim \log\mathcal{N}(0,2),$$
and enforce $\beta_k > 0$.

## Simulating data
For this demo code we use $\beta_0 = 10$ and $\beta_1 = 1$, which gives an asymptotic size of 10. If you wish to experiment with other values, input them in the next block and the rest will run based on that. We use the analytic solution to simulate true sizes over time, then add measurement error and round to the chosen measurement precision of 0.1cm to give a sequence of observations over time that become the `y_obs` data for the model fit. Notice that `y_obs` can produces values that are bigger than the theoretical asymptotic size $\beta_0/\beta_1$ due to error.
```{r}
#Change these values to change the model parameters. Must be positive values.
beta_0 <- 10
beta_1 <- 1

#True initial condition
true_y_0 <- 1
max_time <- 9
time <- 0:max_time

#Analytic solution
analytic_solution <- function(x = NULL, pars = NULL){ #Pars is list of beta_0, beta_1, y_0
  return(
    (pars[[1]]/pars[[2]]) + (pars[[3]] - (pars[[1]]/pars[[2]])) * exp(-pars[[2]] * x)
  )
}
true_pars <- list(
  beta_0 = beta_0,
  beta_1 = beta_1,
  true_y_0 = true_y_0
)
true_args_list <- list(pars = c(beta_0,
                             beta_1,
                             true_y_0))

y_true <- analytic_solution(time, true_pars)
```

From the analytic solution we produce observations by adding measurement error and rounding to a precision of 0.1.
```{r}
#Produce observations with error and limited precision
y_obs <- round(y_true + rnorm(length(y_true), 0, 0.1), digits = 1)

#Unrounded data if needed
#y_obs <- y_true + rnorm(length(y_true), 0, 0.1)

#Observed data frame
obs_data_frame <- tibble(
  time = time,
  y_obs = y_obs,
  obs_index = 1:length(y_obs)
)

#Have a look at the true and 'observed' data
plot_data <- tibble(
  x = c(time, time),
  y = c(y_true,y_obs),
  Data = c(rep("True sizes", times = length(y_true)),
             rep("Obs. sizes", times = length(y_obs)))
)

sizes_over_time <- ggplot(plot_data, aes(x = x, y = y, group = Data)) +
  geom_point(aes(colour = Data, shape = Data), size = 2) +
  geom_line(aes(colour = Data, linetype = Data), linewidth = 1) +
  scale_linetype_manual(values = c("dashed", "dotted")) +
  ylim(0, 10.5) +
  labs(x = "Time", y = "Y") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7, 0.3))
sizes_over_time

#Have a look at the observations against the analytic solution
analytic_observed <- ggplot(obs_data_frame, aes(x = time, y = y_obs)) +
  geom_function(fun=analytic_solution, args = true_args_list,
                linewidth = 1, colour = "black") +
  geom_point(colour = "darkorchid", size = 3) +
  geom_line(colour = "darkorchid", linewidth = 1,
            linetype = "dashed") +
  labs(x = "Time", y = "Y",
       title = "Data simulation") +
  ylim(0, 10.5) +
  theme_classic()
analytic_observed
```

A note on error and precision: the same bad behaviour occurs even if you use unrounded data with all the precision R offers and much smaller error. You can test this by uncommenting the line that rounds the data and/or changing the measurement error standard deviation. The chosen values, and rounding, are intended to demonstrate that this is a problem that may occur with realistic precision and error.

## Implementing models and collecting posterior estimates
In this section we are going to run a sequence of sets of models. The first batch is about checking for the existence of bimodal posterior distributions across different step sizes for the same numerical method. We then use a different numerical method to see if the problem persists. Lastly, we use different means in the parameter priors to see if the posterior can be constrained by such methods.

### Step size data
We're going to run 100 fits with a step size of 1 using the custom RK4 solver and a single chain each. Each chain is expected to converge to a parameter combination that gives estimated $\hat{Y}(t_j)$ close to the analytic solution, but which combination is converged to is random. We do single chains because each can fall into the numerical error trap, and the easiest way to identify that trap is to extract the estimates afterwards.

There are likely to be diagnostic problems but that is part of what we are here to explore so we will be ignoring them. Divergent transitions in particular are to be expected when we have the very different parameter estimates we see. Results are hidden for this block.
```{r, results='hide'}
runs <- 100
step_sizes = c(0.5, 0.25, 0.125)
par_est_tibble <- tibble(run = c(),
                         step_size = c(),
                         beta_0 = c(),
                         beta_1 = c())

for(j in 1:length(step_sizes)){
  print(paste0("Fits for step size ", step_sizes[j]))
  for(i in 1:runs){
    temp <- fit_affine_model(run_no = i, 
                     step_size = step_sizes[j], 
                     obs_data_frame,
                     int_method = 1)
    
    par_est_tibble <- rbind(par_est_tibble, temp)
  } 
}
```

### Better numerical method: Runge-Kutta 4-5 algorithm with adaptive step size
To show that these problems can be dealt with, we will use the RK45 solver which includes an adaptive step size to reduce error. These models can also take time to run because of the adaptive step size, so this block does not run by default. Occasionally the sampler will start at a bad parameter combination and the RK45 algorithm will fail to converge, but this is usually solved by the next sample iteration and is a result of the affine ODE being a numerical nightmare.

```{r, eval = FALSE}
runs <- 100
rk45_par_est_tibble <- tibble(run = c(),
                         step_size = c(),
                         beta_0 = c(),
                         beta_1 = c())

for(i in 1:runs){
  temp <- fit_affine_model(run_no = i, 
                     step_size = 1, 
                     obs_data_frame,
                     int_method = 2)
  
  temp$step_size[1] <- "Adaptive"
  
  rk45_par_est_tibble <- rbind(rk45_par_est_tibble, temp)
} 
```

### Analytic solution
To check the bias on the observed data and for any other posterior pathologies we use the analytic solution to get estimates.
```{r}
runs <- 10000
analytic_par_est_tibble <- tibble(run = c(),
                         step_size = c(),
                         beta_0 = c(),
                         beta_1 = c())

for(i in 1:runs){
  temp <- fit_affine_model(run_no = i, 
                     step_size = 1, 
                     obs_data_frame,
                     int_method = 3)
  
  temp$step_size[1] <- "Analytic"
  
  analytic_par_est_tibble <- rbind(analytic_par_est_tibble, temp)
} 
```

### Independent error
To show that the bias in the better estimate mode is due to the observations we do another set of runs with the analytic solution, numerical solution, and independent error for each chain instead of using the same set of observations.
```{r}
runs <- 10000
indep_err_par_est_tibble <- tibble(run = c(),
                         step_size = c(),
                         beta_0 = c(),
                         beta_1 = c())

for(i in 1:runs){ #Analytic solution
  temp_obs <- obs_data_frame %>%
  mutate(
      y_obs = round((y_true + 
                       rnorm(length(y_true), 0, 0.1)), 
                    digits = 1)
    )
  
  temp <- fit_affine_model(run_no = i, 
                     step_size = 1, 
                     temp_obs,
                     int_method = 3)
  
  temp$step_size[1] <- "Analytic"
  
  indep_err_par_est_tibble <- rbind(indep_err_par_est_tibble, temp)
} 

#Numerical method
indep_err_numeric_par_est_tibble <- tibble(run = c(),
                         step_size = c(),
                         beta_0 = c(),
                         beta_1 = c())
for(i in 1:runs){ #RK4
  temp_obs <- obs_data_frame %>%
  mutate(
      y_obs = round((y_true + 
                       rnorm(length(y_true), 0, 0.1)), 
                    digits = 1)
    )
  
  temp <- fit_affine_model(run_no = i, 
                     step_size = 0.5, 
                     temp_obs,
                     int_method = 1)
  
  temp$step_size[1] <- "0.5"
  
  indep_err_numeric_par_est_tibble <- rbind(indep_err_numeric_par_est_tibble, temp)
} 
```

### Testing different priors
As the default priors for the $\beta$ parameters are closer to the true parameter combination than any of the erroneous clusters we did follow-up testing to check if, and how much, this might bias the probability of a chain falling into the second mode in the posterior distribution. The Stan code for the affine model is set up to have a default pair of prior means at $\log(1) = 0$, but user-defined values can be passed as part of the data structure. We do this for the 0.5 step size only.

We are going to test two different prior configurations: a set of means that is half-way between the correct parameters and the (previously estimated) known error point for step size 0.5; and a much smaller standard deviation for the default mean of 1.
```{r, eval = FALSE}
step_size <- 0.5
runs <- 100
beta_prior_testing <- list(
  average_means_test = list(
    prior_means = c(mean(c(10, 49.4)),
                     mean(c(1, 4.92))
    ),
    prior_sds = c(2,2) #Default value
  ),
  small_sd_test <- list(
    prior_means = c(1,1),
    prior_sds = c(0.1,0.1)
  )
)

prior_test_est_tibble_list <- list()

for(j in 1:length(beta_prior_testing)){
  prior_test_est_tibble <- tibble(run = c(),
                                step_size = c(),
                                beta_0 = c(),
                                beta_1 = c())
  
  for(i in 1:runs){
    temp <- fit_affine_model(run_no = i, 
                     step_size = 0.5, 
                     obs_data_frame,
                     int_method = 1,
                     prior_means = beta_prior_testing[[j]]$prior_means,
                     prior_sds = beta_prior_testing[[j]]$prior_sds)
    
    prior_test_est_tibble <- rbind(prior_test_est_tibble, temp)
  }
  
  prior_test_est_tibble_list[[i]] <- prior_test_est_tibble
}
```

### Optimisation tests
To check if what we observe is a quirk of MCMC, and to better understand the posterior distribution, we investigate a deterministic optimization algorithm instead. 
```{r}
#Load model
affine_model <- stan_model(file = "stan/affine_single_ind.stan")

#-----------------------------------------------------------------------------#
#RK4 with 0.5 step size
rstan_data <- hmde_model("affine_single_ind") |>
      hmde_assign_data(n_obs = nrow(obs_data_frame),
                       y_obs = obs_data_frame$y_obs,
                       obs_index = obs_data_frame$obs_index,
                       time = obs_data_frame$time,
                       y_bar = mean(obs_data_frame$y_obs),
                       step_size = 0.5,
                       int_method = 1)

rk4_random_post_est <- tibble()
rk4_random_hessian <- list()
for(i in 1:10){
  #if(((i-1) %% 50) == 0){
    print(paste0("Run: ", i))
  #}
  
  temp <- opt_affine_model(rstan_data, run_no = i, verbose = TRUE)
  
  rk4_random_post_est <- rbind(rk4_random_post_est, temp$est_tibble_temp)
  rk4_random_hessian[[i]] <- temp$hessian
}
save_data <- list(rk4_random_post_est, rk4_random_hessian)
saveRDS(save_data, file = "output/rk4_random_optimizing.rds")

#-----------------------------------------------------------------------------#
#Analytic solution
rstan_data <- hmde_model("affine_single_ind") |>
      hmde_assign_data(n_obs = nrow(obs_data_frame),
                       y_obs = obs_data_frame$y_obs,
                       obs_index = obs_data_frame$obs_index,
                       time = obs_data_frame$time,
                       y_bar = mean(obs_data_frame$y_obs),
                       step_size = 0.5,
                       int_method = 3)

analytic_random_post_est <- tibble()
analytic_random_hessian <- list()
for(i in 1:10000){
  if(((i-1) %% 50) == 0){
    print(paste0("Run: ", i))
  }
  
  temp <- opt_affine_model(rstan_data, run_no = i)
  
  analytic_random_post_est <- rbind(analytic_random_post_est, temp$est_tibble_temp)
  analytic_random_hessian[[i]] <- temp$hessian
}

save_data <- list(analytic_random_post_est, analytic_random_hessian)
saveRDS(save_data, file = "output/analytic_random_optimizing.rds")
```

### Alternate parameter values
To show that bimodality is not a quirk of our chosen $\beta$s we'll do a short run of chains for different parameter values.
```{r}
beta_0_diff <- 8
beta_1_diff <- 1.2
time <- 0:9

true_pars_diff <- list(
  beta_0 = beta_0_diff,
  beta_1 = beta_1_diff,
  true_y_0 = true_y_0
)

y_true_diff <- analytic_solution(time, true_pars_diff)
y_obs_diff <- round(y_true_diff + rnorm(length(y_true_diff), 0, 0.1), digits = 1)

#Observed data frame
obs_diff_data_frame <- tibble(
  time = time,
  y_obs = y_obs_diff,
  obs_index = 1:length(y_obs_diff)
)

runs <- 10000
step_size = 0.25
diff_par_est_tibble <- tibble(run = c(),
                         step_size = c(),
                         beta_0 = c(),
                         beta_1 = c())

for(i in 1:runs){
  print(paste0("Fit ", i))
  temp <- fit_affine_model(run_no = i, 
                   step_size = step_size, 
                   obs_diff_data_frame,
                   int_method = 1)
  
  diff_par_est_tibble <- rbind(diff_par_est_tibble, temp)
} 
```

### Checking Canham model for multi-modality in the posterior
For reassurance we check whether the Canham model is also succeptable to posterior multimodality 
```{r}
#Build data
Canham_DE <- function(Time, State, Pars) { #Pars: g_max, y_max, k
  with(as.list(c(State, Pars)), {
    dY <- g_max * exp(-0.5 * (log(Y / y_max) / k)^2)
    
    return(list(c(dY)))
  })
}

pars_combo <- c(g_max = 0.8,
                y_max = 8,
                k = 1)
times <- seq(0, 49, by = 0.0001)
yini  <- c(Y = 1) #Initial condition
canham_true_y <- ode(yini, times, Canham_DE, pars_combo, method = "rk4")[,2]
canham_true_data <- tibble(
  y = canham_true_y,
  time = times
) %>%
  filter(time %in% seq(from = 0, to = 49, by = 5))

#Get forward projection with a larger step size and see if there is a meaningful difference
test_times <- seq(0, 49, by = 0.001)
canham_test_y <- ode(yini, test_times, Canham_DE, pars_combo, method = "rk4")[,2]
canham_test_data <- tibble(
  y = canham_test_y,
  time = test_times
) %>%
  filter(time %in% seq(from = 0, to = 49, by = 5))
test <- canham_true_data$y - canham_test_data$y
max(test)

#Add error, rounding
canham_obs_data <- tibble(
  y_obs = round((canham_true_data$y + rnorm(nrow(canham_true_data), 0, 0.1)), digits = 1),
  time = canham_true_data$time,
  obs_index = 1:nrow(canham_true_data)
)

#Plot Canham function ad data
y_0 <- yini[1] #Starting size
y_final <- canham_obs_data$y_obs[nrow(canham_obs_data)]
line_plot_data <- tibble(
  Time = c(canham_obs_data$time, test_times),
  Y = c(canham_obs_data$y_obs, canham_test_y),
  source = c(rep("Simulated obs.", times = nrow(canham_obs_data)),
             rep("Precise numerics", times = length(canham_test_y)))
) %>%
  filter(Time <= 45)

#Plot of growth function
ggplot() +
  xlim(y_0, y_final) +
  labs(x = "Y(t)", y = "f", title = "Canham growth") +
  geom_function(fun=hmde_model_des("canham_single_ind"),
                args=list(pars = list(pars_combo[1],
                                      pars_combo[2],
                                      pars_combo[3])),
                colour="darkorchid", linewidth=1,
                xlim=c(y_0, y_final)) +
  theme_classic()

#Plot Canham simulated data
canham_obs_true <- ggplot(line_plot_data, aes(x = Time, y = Y, group = source)) +
  geom_line(aes(colour = source, linetype = source), linewidth = 1) +
  geom_point(aes(colour = source, shape = source), size = 3, stroke = 1) +
  scale_colour_manual(values = c("black", "darkorchid")) +
  scale_shape_manual(values = c(NA, 25)) +
  scale_linetype_manual(values = c("solid", NA)) +
  labs(x = "Time", y = "Y(t)",
       title = "True and observed", 
       shape = "",
       colour = "",
       linetype = "") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.3,0.8))
canham_obs_true

# Get posterior estimates
runs <- 5000
canham_par_est_tibble <- tibble(run = c(),
                                g_max = c(),
                                y_max = c(),
                                k = c())

for(i in 1:runs){
  print(paste0("Run: ", i))
  
  #Independent error
  canham_obs_data_temp <- canham_obs_data %>%
    mutate(
      y_obs = round((canham_true_data$y + 
                       rnorm(nrow(canham_true_data), 0, 0.1)), 
                    digits = 1)
    )
  
  #Run the model
  suppressWarnings(
    fit <- hmde_model("canham_single_ind") |>
      hmde_assign_data(n_obs = nrow(canham_obs_data_temp),
                       y_obs = canham_obs_data_temp$y_obs,
                       obs_index = canham_obs_data_temp$obs_index,
                       time = canham_obs_data_temp$time)  |> 
      hmde_run(chains = 1, cores = 1, iter = 2000)
  )
  
  #Extract parameter estimates
  ests <- hmde_extract_estimates(model = "canham_single_ind",
                                 fit = fit,
                                 input_measurement_data = canham_obs_data)
  
  temp <- tibble(
    run = i,
    g_max = ests$individual_data$ind_max_growth_mean,
    y_max = ests$individual_data$ind_size_at_max_growth_mean,
    k = ests$individual_data$ind_k_mean
  )
  
  canham_par_est_tibble <- rbind(canham_par_est_tibble, temp)
} 
```

## Analysis
The general workflow for each model is the same: fit a finite mixture model to identify clustering, get an estimate of the distance between clusters, then plot the posterior distribution clusters. Additional plots may be produced depending on which model is being tested.

Some aesthetics for plots.
```{r}
legend_spec <- tibble(
  step_size_name = c("0.5", "0.25", "0.125"),
  step_size = c(0.5, 0.25, 0.125),
  x = c(-10, -10, -10),
  y = c(-10, -10, -10),
  colours = c("#f8766d", "#00ba38", "#609cff"),
  linetypes = c("longdash", "dashed", "dotted"),
  shapes = c(19, 17, 15)
) 

legend_spec_with_true <- tibble(
  step_size_name = c("0.5", "0.25", "0.125", "True pars"),
  step_size = c(0.5, 0.25, 0.125, NA),
  x = c(-10, -10, -10, -10),
  y = c(-10, -10, -10, -10),
  colours = c("#f8766d", "#00ba38", "#609cff", "black"),
  linetypes = c("longdash", "dashed", "dotted", "solid"),
  shapes = c(19, 17, 15, 3)
)

for(i in 1:nrow(legend_spec)){
  fancy_name_no_step_size <- 
  paste0("Beta_0 = ",
         signif(step_size_mix_models_par_ests$error_beta_0[i],
                digits = 3),
         ",\n Beta_1 = ",
         signif(step_size_mix_models_par_ests$error_beta_1[i], 
                             digits = 3))
  legend_spec$fancy_name_no_step_size[i] <- fancy_name_no_step_size
  legend_spec_with_true$fancy_name_no_step_size[i] <- fancy_name_no_step_size
  
  fancy_name <- paste0("Step size ", step_size_mix_models_par_ests$step_size[i], 
                       "\n", fancy_name_no_step_size)
  
  legend_spec$fancy_name[i] <- fancy_name
  legend_spec_with_true$fancy_name[i] <- fancy_name
}

legend_spec_with_true$fancy_name_no_step_size[4] <-
  paste0("Beta_0 = ",
         beta_0,
         ",\n Beta_1 = ",
         beta_1)

legend_spec_with_true$fancy_name[4] <-
  paste0("True values\n Beta_0 = ",
         beta_0,
         ",\n Beta_1 = ",
         beta_1)
```

### Different step sizes
We are going to fit a finite mixture model for each of the step sizes to tell us about the clustering in the posterior distributions. We assume that there are two clusters (you can check with scatter plots), that one is close to the true values and the other some distance away with much larger estimates for both, and use the mean for rows where $\hat{\beta_}_0 > mean(\hat{\beta_}_0)$ as our starting value for the iterative process to avoid singularities. The overall mean of the estimates works as a threshold for extreme values because of the bimodality and distance between clusters.

```{r}
step_size_mix_models <- list()
step_size_mix_model_plots <- list()
step_size_mix_models_par_ests <- tibble(
  good_beta_0 = c(),
  good_beta_1 = c(),
  error_beta_0 = c(),
  error_beta_1 = c(),
  step_size = c(),
  error_fraction = c(),
  dist = c()
)

for(i in 1:length(unique(par_est_tibble$step_size))){
  #Get data for single step size
  step_size_selected <- unique(par_est_tibble$step_size)[i]
  
  analysis_data <- par_est_tibble %>%
    filter(step_size == step_size_selected)
  
#  temp <- fit_mix_model(par_est_data = analysis_data, 
#                par_names = c("beta_0", "beta_1"), 
#                true_pars = c(beta_0, beta_1))
  
  #Get some extreme estimates
  possible_error <- analysis_data %>%
    filter(beta_0 > mean(analysis_data$beta_0))
  
  #To speed up the iterative algorithm we provide some initial conditions
  mu <- list( #Means from true parameters and extreme estimates
    true = c(beta_0, beta_1),
    error = c(mean(possible_error$beta_0), 
              mean(possible_error$beta_1))
  )
  
  #Fit multivariate normal finite mixture model to the estimates
  step_size_mix_models[[i]] <- mvnormalmixEM(x = analysis_data[,c(3,4)], mu = mu)
  
  print(paste0("Summary of mixture model for step size ", step_size_selected))
  print(summary(step_size_mix_models[[i]]))
  
  step_size_mix_model_plots[[i]] <- plot(step_size_mix_models[[i]], 
                               whichplots = 2, 
                               xlab2 = "Beta 0", 
                               ylab2 = "Beta 1")
  
  dist_table <- tibble( #Data to calculate distance
    b_0 = c(step_size_mix_models[[i]]$mu[[2]][1], 
            step_size_mix_models[[i]]$mu[[1]][1]),
    b_1 = c(step_size_mix_models[[i]]$mu[[2]][2], 
            step_size_mix_models[[i]]$mu[[1]][2])
  )
  
  #Extract values
  step_size_mix_models_par_ests_temp <- tibble(
    good_beta_0 = step_size_mix_models[[i]]$mu[[1]][1],
    good_beta_1 = step_size_mix_models[[i]]$mu[[1]][2],
    error_beta_0 = step_size_mix_models[[i]]$mu[[2]][1],
    error_beta_1 = step_size_mix_models[[i]]$mu[[2]][2],
    step_size = step_size_selected,
    error_prob = step_size_mix_models[[i]]$lambda[2],
    dist = dist(dist_table)
  )
  
  step_size_mix_models_par_ests <- rbind(step_size_mix_models_par_ests, 
                                         step_size_mix_models_par_ests_temp)
}

#Have a look at the estimates
step_size_mix_models_par_ests
```
We get bimodality in the posterior distributions, for all it's rarer for the smaller step sizes. Bias in the estimates closest to the true values is due to the same measurement error in the 'observed' data for all the fits. The second mode in the estimates arises from the numerical integration error as we will verify shortly. The extreme bimodality of the posterior is consistent behaviour, even though the point of the second mode shifts based on the step size.

We use scatter plots of the clusters for qualitative analysis. As the clusters are so distant from each other, and so tight around their means, we separate them out for plots. The mixture models gives a classification of each point in the data that we use to filter observations. As the clusters are so distant we can use other heuristics such as $\hat{\beta_}_0 > 2\beta_0$ which agree perfectly with the cluster analysis.

The contours over the scatter plot come from data simulated from the cluster's multivariate normal distribution identified by the finite mixture model.
```{r}
scatterplot_errors_only <- list()
scatterplot_good_only <- list()

for(i in 1:length(unique(par_est_tibble$step_size))){
  step_size_select <- unique(par_est_tibble$step_size)[i]
  
  plot_data <- par_est_tibble %>%
    filter(step_size == step_size_select)
  
  #Get classification from mixture model
  plot_data$good_est <- step_size_mix_models[[i]][["posterior"]][,1]
  
  error_ests_scatter <- plot_data %>%
    filter(!as.logical(good_est))
  good_ests_scatter <- plot_data %>%
    filter(as.logical(good_est))
  
  #Scatter plot of erroneous parameters
  xpos <- (min(error_ests_scatter$beta_0) + 
             0.2*(max(error_ests_scatter$beta_0) - 
                    min(error_ests_scatter$beta_0)))
  ypos <- (max(error_ests_scatter$beta_1) - 
             0.1*(max(error_ests_scatter$beta_1) - 
                    min(error_ests_scatter$beta_1)))
  norm_data <- as.data.frame(mvrnorm(n = 10000,
                       mu = step_size_mix_models[[i]][["mu"]][[2]],
                       Sigma = step_size_mix_models[[i]][["sigma"]][[2]]))
  names(norm_data) <- c("beta_0", "beta_1")
    
  scatterplot_errors_only[[i]] <- ggplot(data = error_ests_scatter, 
                                         aes(x = beta_0, y = beta_1)) +
    geom_point(colour = legend_spec$colours[i], 
               shape = legend_spec$shapes[i], 
               alpha = 0.5,
               size = 2) +
    geom_density_2d(data = norm_data, colour = "black") +
    labs(x = "beta_0 est.",
           y = "beta_1 est.",
           title = paste0("Second cluster: step size ", step_size_select)) +
    annotate("text", x = xpos, y = ypos, 
             label = paste0("Probability: \n",
                            step_size_mix_models_par_ests$error_prob[i])) +
    theme_classic()
  
  #Scatter plot of good parameter estimates
  xpos <- (min(good_ests_scatter$beta_0) + 
             0.2*(max(good_ests_scatter$beta_0) - 
                    min(good_ests_scatter$beta_0)))
  ypos <- (max(good_ests_scatter$beta_1) - 
             0.1*(max(good_ests_scatter$beta_1) - 
                    min(good_ests_scatter$beta_1)))
  norm_data <- as.data.frame(mvrnorm(n = 10000,
                       mu = step_size_mix_models[[i]][["mu"]][[1]],
                       Sigma = step_size_mix_models[[i]][["sigma"]][[1]]))
  names(norm_data) <- c("beta_0", "beta_1")
  
  scatterplot_good_only[[i]] <- ggplot(data = good_ests_scatter, 
                                         aes(x = beta_0, y = beta_1)) +
    geom_point(colour = legend_spec$colours[i], 
               shape = legend_spec$shapes[i], 
               alpha = 0.5,
               size = 2) +
    geom_density_2d(data = norm_data, colour = "black") +
    labs(x = "beta_0 est.",
           y = "beta_1 est.",
           title = paste0("First cluster: step size ", step_size_select)) +
    annotate("text", x = xpos, y = ypos, 
             label = paste0("Probability: \n",
                            (1-step_size_mix_models_par_ests$error_prob[i]))) +
      theme_classic()
}
```

We can double-check the numerical error by using an independent solver with the same step size. We use the `deSolve` package which has an implementation of RK4 and allows us to choose the step sizes using the time parameter.
```{r}
#install.packages("deSolve")
library(deSolve)

#Create DE function
DE <- function(Time, State, Pars) { #Implementation of DE
  with(as.list(c(State, Pars)), {
    dY <- beta_0 - beta_1 * Y

    return(list(c(dY)))
  })
}
```

### Second cluster analysis
We want to look at the behaviour of the numerical method for the bad estimate clusters. To do so we project forward from the initial condition using the chosen step size, bad parameter combination, and see what happens. We can compare the numerical solution to both the true sizes over time, and to the analytic solution with those same bad parameter estimates.

First we generate the numerical and analytic solution data.
```{r}
yini  <- c(Y = true_y_0) #Initial condition
y_over_time <- tibble(model="True Sizes",
                      y_analytic = y_true,
                      y_numeric = y_true,
                      time = 0:max_time,
                      beta_0_par = beta_0,
                      beta_1_par = beta_1
                      )

#Generate Y(t) with RK4
for(i in 1:nrow(step_size_mix_models_par_ests)){
  pars_combo <- c(beta_0 = step_size_mix_models_par_ests$error_beta_0[i],
                    beta_1 = step_size_mix_models_par_ests$error_beta_1[i])
  times <- seq(0, max_time, by = step_size_mix_models_par_ests$step_size[i])
  
  solution_pars <- c(pars_combo, true_y_0)
  y_true_temp <- analytic_solution(times, solution_pars)
    
  numerical_output <- ode(yini, times, DE, pars_combo, method = "rk4")
  
  y_over_time_temp <- tibble(
    model = step_size_mix_models_par_ests$step_size[i],
    y_analytic = y_true_temp,
    y_numeric = numerical_output[,2],
    time = times,
    beta_0_par = step_size_mix_models_par_ests$error_beta_0[i],
    beta_1_par = step_size_mix_models_par_ests$error_beta_1[i]
  )
  
  y_over_time <- rbind(y_over_time, y_over_time_temp)
}
```

Here is a figure that shows all of the estimated sizes over time for the bad parameter combinations across different step sizes, compared to the true values.
```{r}
y_over_time_filtered <- y_over_time %>%
  filter(time %in% 0:max_time)
  
#Plot sizes over time for all models
compare_sizes_over_time <- ggplot(y_over_time_filtered, 
                                  aes(x=time, y=y_numeric, group_by = as.factor(model))) +
  geom_point(aes(colour = as.factor(model),
             shape = as.factor(model)),
             alpha=0.5, size = 2, stroke = 1.5) +
  geom_line(aes(colour = as.factor(model)), alpha=0.5, linewidth = 1) +
  scale_colour_manual(values = legend_spec_with_true$colours) +
  scale_shape_manual(values = legend_spec_with_true$shapes) +
  labs(x = "Time", y = "Y(t)", title = "Estimated Y(t) with bad parameters",
       colour = "Step size", shape = "Step size") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7, 0.3))

compare_sizes_over_time
```
These are indistinguishable values. If you look extremely closely you get some deviation due to parameter bias, but that is not statistically relevant to the process.

### Numerics
We can demonstrate the source of the problem at the erroneous cluster by looking at what the RK4 algorithm is doing. First here is a direct implementation of the RK4 algorithm for a single step.
```{r}
#Define DE function
example_DE <- function(y, pars){
  return(pars[1] - pars[2]*y)
}

#Define RK4 functions=
rk4_step <- function(y, pars, interval, DE){
  k1 <- DE(y, pars)
  k2 <- DE(y+interval*k1/2.0, pars)
  k3 <- DE(y+interval*k2/2.0, pars)
  k4 <- DE(y+interval*k3, pars)
  
  y_hat <- y + (1.0/6.0) * (k1 + 2.0*k2 + 2.0*k3 + k4) * interval
  
  intermediate_step <- c(y+interval*k1/2.0, 
                         y+interval*k2/2.0, 
                         y+interval*k3)
  k_vals <- c(k1, k2, k3, k4)
  
  return(list(
    y_hat = y_hat,
    k_vals = k_vals,
    intermediate_step = intermediate_step)
  )
}
```

The behaviour of the sub-step values is of interest to us. Because the affine DE is negative for $Y(t) > \beta_0/\beta_1$, we can actually get sub-steps that oscillate back and forth, with both positive and negative increments. Here we will plot that for the erroneous parameters that arise at step size 0.5.
```{r}
pars <- c(49.4, 4.94)
step_size <- 0.5
steps <- 18
y_0 <- 1
y_est <- tibble(
  y_hat = y_0,
  source = "Est. Y(t_j)",
  k_vals = NA,
  step = 0
)

y_start <- y_0
for(i in 1:steps){
  numeric_sol <- rk4_step(y_start, pars,  step_size, example_DE)
  
  y_est_temp <- tibble(
    y_hat = c(numeric_sol$intermediate_step, numeric_sol$y_hat),
    source = c(rep("RK4 weights k_j", times = 3), "Est. Y(t_j)"),
    k_vals = c(numeric_sol$k_vals),
    step = i
  )
  
  y_est <- rbind(y_est, y_est_temp)
  y_start <- numeric_sol$y_hat
}

y_est$index <- 1:nrow(y_est)

plot_data_first_step <- y_est %>%
  filter(source == "RK4 weights k_j",
         step == 1) %>%
  mutate(source = "Substep Y(t) ests")

plot_data_other_steps <- y_est %>%
  filter(source == "RK4 weights k_j",
         step > 1) %>%
  mutate(source = "Substep Y(t) ests")

plot_data_y_hat <- y_est %>%
  filter(source == "Est. Y(t_j)") %>%
  mutate(source = "Numerical Est.")

plot_data_Y_t <- plot_data_y_hat[seq(from = 1, to = 20, by = 2),] %>%
  mutate(source = "True Y(t_j)")
plot_data_Y_t$y_hat = y_true

plot_data <- rbind(plot_data_first_step, plot_data_Y_t, plot_data_y_hat)

numeric_plot_bad <- ggplot(plot_data, aes(x = index, y=y_hat, group=source)) +
  geom_line(aes(colour = source, linewidth = source), alpha = 0.8) +
  geom_point(aes(colour = source, shape =source, size = source), 
             alpha = 0.8, stroke = 0.8) +
  scale_colour_manual(values = c("darkred", "black", "#f8766d")) +
  scale_linewidth_manual(values = c(0.5, 0.5, 0.8)) +
  scale_size_manual(values = c(2, 3, 3)) +
  scale_shape_manual(values = c(4, 3, 19)) +
  geom_line(data = plot_data_other_steps, aes(group = as.factor(step)),
            colour = "black", linewidth = 0.5, alpha = 0.8) +
  geom_point(data = plot_data_other_steps,
             colour = "black", shape = 3, size = 2, 
             alpha = 0.8, stroke = 0.8) +
  labs(x = "Step",
       y = "Estimated Y",
       title = "Numerical est.: step size 0.5, beta_0 = 49.4, beta_1 = 4.94",
       colour = "Source",
       shape = "Source",
       linewidth = "Source",
       size = "Source") +
  geom_hline(yintercept=0, colour = "black", linewidth = 0.25, linetype = "dashed")+
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7,0.7))

#Plot of within-step k-values
k_plot_data <- y_est %>% filter(!is.na(k_vals))
  
weights_plot_bad <- ggplot(k_plot_data, aes(x = index, y=k_vals, group=as.factor(step))) +
  geom_line(colour = "black", linewidth = 0.5, alpha = 0.8) +
  geom_point(colour = "black", shape = 3, size = 2, 
             alpha = 0.8, stroke = 0.8) +
  labs(x = "Index",
       y = "k value",
       title = "Gradient est.: step size 0.5, beta_0 = 49.4, beta_1 = 4.94") +
  geom_hline(yintercept=0, colour = "black", linewidth = 0.25, linetype = "dashed")+
  theme_classic()
```

For comparison, here's the numerical behaviour at the true parameter combination.
```{r}
pars <- c(10, 1)
step_size <- 0.5
steps <- 18
y_0 <- 1
y_est <- tibble(
  y_hat = y_0,
  source = "Est. Y(t_j)",
  k_vals = NA,
  step = 0
)

y_start <- y_0
for(i in 1:steps){
  numeric_sol <- rk4_step(y_start, pars,  step_size, example_DE)
  
  y_est_temp <- tibble(
    y_hat = c(numeric_sol$intermediate_step, numeric_sol$y_hat),
    source = c(rep("RK4 weights k_j", times = 3), "Est. Y(t_j)"),
    k_vals = c(numeric_sol$k_vals),
    step = i
  )
  
  y_est <- rbind(y_est, y_est_temp)
  y_start <- numeric_sol$y_hat
}

y_est$index <- 1:nrow(y_est)

plot_data_first_step <- y_est %>%
  filter(source == "RK4 weights k_j",
         step == 1) %>%
  mutate(source = "Substep Y(t) ests")

plot_data_other_steps <- y_est %>%
  filter(source == "RK4 weights k_j",
         step > 1) %>%
  mutate(source = "Substep Y(t) ests")

plot_data_y_hat <- y_est %>%
  filter(source == "Est. Y(t_j)") %>%
  mutate(source = "Numerical Est.")

plot_data_Y_t <- plot_data_y_hat[seq(from = 1, to = 20, by = 2),] %>%
  mutate(source = "True Y(t_j)")
plot_data_Y_t$y_hat = y_true

plot_data <- rbind(plot_data_first_step, plot_data_Y_t, plot_data_y_hat)

numeric_plot_good <- ggplot(plot_data, aes(x = index, y=y_hat, group=source)) +
  geom_line(aes(colour = source, linewidth = source), alpha = 0.8) +
  geom_point(aes(colour = source, shape =source, size = source), 
             alpha = 0.8, stroke = 0.8) +
  scale_colour_manual(values = c("darkred", "black", "#f8766d")) +
  scale_linewidth_manual(values = c(0.5, 0.5, 0.8)) +
  scale_size_manual(values = c(2, 3, 3)) +
  scale_shape_manual(values = c(4, 3, 19)) +
  geom_line(data = plot_data_other_steps, aes(group = as.factor(step)),
            colour = "black", linewidth = 0.5, alpha = 0.8) +
  geom_point(data = plot_data_other_steps,
             colour = "black", shape = 3, size = 2, 
             alpha = 0.8, stroke = 0.8) +
  labs(x = "Step",
       y = "Estimated Y",
       title = "Numerical est.: step size 0.5, beta_0 = 10, beta_1 = 1",
       colour = "Source",
       shape = "Source",
       linewidth = "Source",
       size = "Source") +
  geom_hline(yintercept=0, colour = "black", linewidth = 0.25, linetype = "dashed")+
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7,0.4))

#Plot of within-step k-values
k_plot_data <- y_est %>% filter(!is.na(k_vals))
weights_plot_good <- ggplot(k_plot_data, aes(x = index, y=k_vals, group=as.factor(step))) +
  geom_line(colour = "black", linewidth = 0.5, alpha = 0.8) +
  geom_point(colour = "black", shape = 3, size = 2, 
             alpha = 0.8, stroke = 0.8) +
  labs(x = "Index",
       y = "k value",
       title = "Gradient est.: step size 0.5, beta_0 = 10, beta_1 = 1") +
  geom_hline(yintercept=0, colour = "black", linewidth = 0.25, linetype = "dashed")+
  theme_classic()
```

### Step sizes
Back to analysis of the individual step sizes. This time we will look at the numerical method stability and error directly for the erroneous parameter values.
```{r}
#Plot sizes over time with analytic solution for each step size individually
size_analytic_plots <- list()
size_scatter_plot <- list()
stability_plots <- list()

for(i in 1:nrow(step_size_mix_models_par_ests)){
  #Get data for that step size
  plot_data <- y_over_time %>%
    filter(model == step_size_mix_models_par_ests$step_size[i]) %>%
    mutate(numerical_error = y_numeric - y_analytic,
           solname = "Analytic solution:\n Beta_0 = 10,\n Beta_1 = 1",
           pointname = paste0("Numerical method:\n Beta_0 = ", 
                      signif(step_size_mix_models_par_ests$error_beta_0[i], 
                             digits = 3),
                      ",\n Beta_1 = ", 
                      signif(step_size_mix_models_par_ests$error_beta_1[i], 
                             digits = 3)))
  
  #Exclude intermediate steps
  plot_data_reduced <- plot_data %>%
    filter(time %in% 0:max_time)
  
  #Get error est
  temp_rmse_true <- sqrt(
    sum(plot_data_reduced$numerical_error^2)/nrow(plot_data_reduced)
  )
  temp_rmse_obs <- sqrt(
    sum((plot_data_reduced$y_numeric- y_true)^2)/nrow(plot_data_reduced)
  )
  
  #Plot numerical stability
  stability_plots[[i]] <- ggplot(plot_data, 
                                 aes(x = time,
                                     y = numerical_error)) +
    geom_point(colour = legend_spec$colours[i], 
               shape = legend_spec$shapes[i], size = 3) +
    geom_line(colour = legend_spec$colours[i], 
              linetype = legend_spec$linetypes[i], 
              linewidth = 1) +
    geom_hline(yintercept = 0, linetype = "solid", colour = "black") +
    annotate("text", x = 7, y = (0.25*min(plot_data$numerical_error)),
             label = paste0("RMSE: ", signif(temp_rmse_true, digits = 3))) +
    labs(title = paste0("Numerical error: step size ", plot_data$model[1]),
         x = "Time",
         y = "Error (numerical - analytic)") +
    theme_classic()
  
  #Scatter of numerical and analytic values
  size_scatter_plot[[i]] <- ggplot(plot_data, aes(x = y_analytic, y = y_numeric)) +
    geom_point(colour = legend_spec$colours[i], 
               shape = legend_spec$shapes[i], size = 2) +
    geom_line(colour = legend_spec$colours[i], 
              linetype = "solid", linewidth = 1) +
    geom_abline(intercept = 0, slope = 1, linetype = "solid", colour = "black") +
    labs(title = paste0("Step size ", plot_data$model[1],
                        ", Beta_0 = ", signif(plot_data$beta_0_par[1], digits = 3),
                        ", Beta_1 = ", signif(plot_data$beta_1_par[1], digits = 3)),
         x = "Analytic solution",
         y = "Numerical solution") +
    theme_classic()
  
  
  size_analytic_plots[[i]] <- ggplot(plot_data_reduced, 
                                  aes(x=time, y=y_numeric)) +
  geom_function(fun=analytic_solution, args=true_args_list,
                colour="black", 
                aes(linetype = solname),
                linewidth=1) +
  scale_linetype_manual(values = "solid") +
  geom_point(aes(colour = pointname,
             shape = pointname),
             alpha=1, size = 2, stroke = 1.5) +
  scale_shape_manual(values = legend_spec$shapes[i]) +
  scale_colour_manual(values = legend_spec$colours[i]) +
    annotate("text", x = 7, y = 7.5,
             label = paste0("RMSE: ", signif(temp_rmse_obs, digits = 3))) +
  labs(x = "Time", y = "Y(t)", 
       title = paste0("Estimated Y(t): step size ", plot_data_reduced$model[1]),
       colour = NULL,
       shape = NULL,
       linetype = NULL) +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.8,0.3))
}
```

What we're interested in here is that despite the wildly wrong parameter estimates, there's strong alignment between the sizes over time and the true $Y(t)$ values due to the numerical error. 

To demonstrate that a smaller step size to test the method is enough to identify bad estimates, we show that RK4 with step size 0.001 diverges from the true sizes over time. The lines in this plot are based on the small step size numerical estimates, while the points come from the $Y(t_j)$ values for the discrete observation times.
```{r}
#Generate y(t) with RK4 given the parameter estimates
y_over_time_smallstep <- tibble(model=legend_spec_with_true$fancy_name[4],
                      y_hat = y_true,
                      time = 0:max_time
                      )

for(i in 1:nrow(step_size_mix_models_par_ests)){
  pars_combo <- c(beta_0 = step_size_mix_models_par_ests$error_beta_0[i],
                    beta_1 = step_size_mix_models_par_ests$error_beta_1[i])
    times <- seq(0, max_time, by = 0.001)
    
    numerical_output <- ode(yini, times, DE, pars_combo, method = "rk4")
    
    y_over_time_temp <- tibble(
      model = legend_spec_with_true$fancy_name_no_step_size[i],
      y_hat = numerical_output[,2],
      time = times
    )
    
    y_over_time_smallstep <- rbind(y_over_time_smallstep, y_over_time_temp)
}

point_data <- y_over_time_smallstep %>%
  filter(time %in% 0:max_time)

#Plot sizes over time
compare_sizes_over_time_smallstep <- ggplot(y_over_time_smallstep, 
                                  aes(x=time, y=y_hat, grouping = as.factor(model))) +
  geom_line(aes(colour = as.factor(model),
                linetype = as.factor(model)), alpha=0.5, linewidth = 1) +
  geom_point(data = point_data,
             aes(colour = as.factor(model),
             shape = as.factor(model)),
             alpha=0.5, size = 2, stroke = 1.5) +
  geom_function(fun=analytic_solution, args=true_args_list,
                colour="black",
                linetype = "solid",
                linewidth=1) +
  scale_shape_manual(values = legend_spec_with_true$shapes) +
  scale_colour_manual(values = legend_spec_with_true$colours) +
  scale_linetype_manual(values = c(legend_spec$linetypes, NA)) +
  labs(x = "Time", y = "Y(t)", title = "Small step size",
       colour = "Parameters", 
       shape = "Parameters",
       linetype = "Parameters") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7, 0.4),
        legend.key.spacing.y = unit(2, 'mm')) +
  guides(colour = guide_legend(byrow = TRUE))

compare_sizes_over_time_smallstep
```

Here are two plots showing the ODEs and analytic solutions with the bad estimates compared to the true parameter values. To plot the ODEs we exploit the fact that they are straight lines rather than plotting the functions properly.
```{r}
#Get asymptotic size
step_size_mix_models_par_ests$Y_max <- step_size_mix_models_par_ests$error_beta_0/step_size_mix_models_par_ests$error_beta_1

#Build points for start and end of lines
plot_data <- tibble(
  x = c(0, (beta_0/beta_1), 
        rep(0, times = nrow(step_size_mix_models_par_ests)), 
        step_size_mix_models_par_ests$Y_max),
  y = c(beta_0, 0, 
        step_size_mix_models_par_ests$error_beta_0, 
        rep(0, times = nrow(step_size_mix_models_par_ests))),
  step_size = c("True pars", "True pars", 
                step_size_mix_models_par_ests$step_size, 
                step_size_mix_models_par_ests$step_size)
)

#Plot DEs 
error_de_plot <- ggplot(data = plot_data, aes(x,y)) +
  geom_line(aes(colour = as.factor(step_size),
                linetype = as.factor(step_size)),
            linewidth = 1) +
  scale_colour_manual(values = c(legend_spec$colours[3:1], "black")) +
  scale_linetype_manual(values = c(legend_spec$linetypes[3:1], "solid")) +
  labs(title = "ODEs",
       x = "Y(t)", y = "f", 
       colour = "Step size", 
       linetype = "Step size") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7, 0.7))
error_de_plot

#Plot analytic solutions
error_solution_plot <- ggplot() +
  geom_function(fun=analytic_solution, args=true_args_list,
                  colour="black", 
                  linetype = "solid",
                  linewidth=1) 

for(i in 1:nrow(step_size_mix_models_par_ests)){ #Add the analytic solutions
  args_list <- list(pars=c(step_size_mix_models_par_ests$error_beta_0[i],
                           step_size_mix_models_par_ests$error_beta_1[i],
                           true_y_0))
  error_solution_plot <- error_solution_plot +
    geom_function(fun=analytic_solution, args=args_list,
                  colour=legend_spec$colours[i], 
                  linetype = legend_spec$linetypes[i],
                  linewidth=1)
}

error_solution_plot <- error_solution_plot +
  geom_line(data = legend_spec_with_true,
            linewidth=1,
            aes(colour = fancy_name_no_step_size,
                linetype = fancy_name_no_step_size,
                x = x, y = y)) +
  scale_colour_manual(values = c(legend_spec_with_true$colours[4], 
                                 legend_spec$colours[c(2,3,1)])) +
  scale_linetype_manual(values = c(legend_spec_with_true$linetypes[4], 
                                   legend_spec$linetypes[c(2,3,1)])) +
  xlim(0, max_time) +
  ylim(true_y_0, (beta_0/beta_1+0.5)) +
  labs(x = "Time", y = "Y(t)", 
       title = "Analytic solutions",
       colour = "Parameters", 
       linetype = "Parameters") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.7, 0.4),
        legend.key.spacing.y = unit(2, 'mm')) +
  guides(colour = guide_legend(byrow = TRUE))

error_solution_plot
```
The limiting behaviour at $\beta_0/\beta_1$ is consistent, what changes is how fast line approaches the asymptote. 

If you have done multiple step sizes, this block will produce a scatter plot of all the parameter estimates together. We can see a very strong linear trend (slope is 1/10, which corresponds to the inverse limiting size), with the gap between the true parameter and the erroneous one getting larger as the step size shrinks. Also important to note that the frequency of bad estimates decreases along with the step size, which we believe comes from the greater distance: the further the bad estimates are from the true ones, the less likely the MCMC algorithm is to fall into them, particularly as the priors are close to the true values.
```{r, eval = FALSE}
est_scatter <- ggplot(data = par_est_tibble, aes(x = beta_0, y = beta_1)) +
  geom_point(aes(colour = as.factor(step_size), 
                 shape = as.factor(step_size)), 
             alpha = 0.5,
             size = 2) +
  labs(colour = "Step size", shape = "Step size",
       title = "All posterior estimates") +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.2, 0.8))
est_scatter
```

### RK45 analysis
We're going to take a look at the RK45 method parameter estimates.
```{r}
rk45_hist_0 <- ggplot(data = rk45_par_est_tibble, aes(x=beta_0)) +
  geom_histogram(fill = "lightblue",
                 colour = "black") +
  labs(x = "beta_0 estimates",
       title = "Parameter histograms from RK45") +
  theme_classic()
rk45_hist_1 <- ggplot(data = rk45_par_est_tibble, aes(x=beta_1)) +
  geom_histogram(fill = "lightgreen",
                 colour = "black") +
  labs(x = "beta_1 estimates") +
  theme_classic()

#Get some extreme estimates
possible_error <- rk45_par_est_tibble %>%
  filter(beta_0 > mean(rk45_par_est_tibble$beta_0))

#To speed up the iterative algorithm we provide some initial conditions
mu <- list( #Means from true parameters and extreme estimates
  true = c(beta_0, beta_1),
  error = c(mean(possible_error$beta_0), 
            mean(possible_error$beta_1))
)

#Fit multivariate normal finite mixture model to the estimates
rk45_mix_models <- mvnormalmixEM(x = rk45_par_est_tibble[,c(3,4)], mu = mu)

summary(rk45_mix_models)

plot(rk45_mix_models,
     whichplots = 2,
     xlab2 = "Beta 0",
     ylab2 = "Beta 1")

dist_table <- tibble( #Data to calculate distance
    b_0 = c(rk45_mix_models$mu[[2]][1], 
            rk45_mix_models$mu[[1]][1]),
    b_1 = c(rk45_mix_models$mu[[2]][2], 
            rk45_mix_models$mu[[1]][2])
  )
dist(dist_table)
```

### Alternate prior analysis
For each combination of priors we run a mixture model to get probability estimates.
```{r}
beta_prior_testing <- list(
  average_means_test = list(
    prior_means = c(mean(c(10, 49.4)),
                     mean(c(1, 4.92))
    ),
    prior_sds = c(2,2) #Default value
  ),
  small_sd_test <- list(
    prior_means = c(1,1),
    prior_sds = c(0.1,0.1)
  )
)

mix_model_prior_list <- list()
mix_model_prior_scatter <- list()
#Scatter plot to check for evidence of further clusters
for(i in 1:length(beta_prior_testing)){
  prior_test_est_tibble <- prior_test_est_tibble_list[[i]]
  
  mix_model_prior_scatter[[i]] <- ggplot(data = prior_test_est_tibble,
       aes(x = beta_0, y = beta_1)) +
    geom_point(colour = legend_spec$colours[1], 
               shape = legend_spec$shapes[1], 
               alpha = 0.5,
               size = 2) +
    labs(x = "beta_0 est.",
           y = "beta_1 est.") +
    theme_classic()

  #Get some initialising estimates
  possible_error <- prior_test_est_tibble %>%
    filter(beta_0 > mean(prior_test_est_tibble$beta_0))
  
  #To speed up the iterative algorithm we provide some initial conditions
  mu <- list( #Means from true parameters and extreme estimates
    true = c(beta_0, beta_1),
    error = c(mean(possible_error$beta_0), 
              mean(possible_error$beta_1))
  )
  
  #Fit multivariate normal finite mixture model to the estimates
  mix_model_prior_list[[i]] <- mvnormalmixEM(x = prior_test_est_tibble[,c(3,4)], mu = mu)
  summary(mix_model_prior_list[[i]])
  print(mix_model_prior_list[[i]]$lambda)
  
  dist_table <- tibble( #Data to calculate distance
    b_0 = c(mix_model_prior_list[[i]]$mu[[2]][1], 
            mix_model_prior_list[[i]]$mu[[1]][1]),
    b_1 = c(mix_model_prior_list[[i]]$mu[[2]][2], 
            mix_model_prior_list[[i]]$mu[[1]][2])
  )
  print(dist(dist_table))
}

```

### Analytic solution analysis
```{r}
mix_model_analytic_scatter <- ggplot(data = analytic_par_est_tibble,
     aes(x = beta_0, y = beta_1)) +
  geom_point(colour = "darkorchid", 
             shape = 5, 
             alpha = 0.5,
             size = 2) +
  labs(x = "beta_0 est.",
         y = "beta_1 est.") +
  theme_classic()

analytic_hist_0 <- ggplot(data = analytic_par_est_tibble, aes(x=beta_0)) +
  geom_histogram(fill = "lightblue",
                 colour = "black") +
  labs(x = "beta_0 estimates",
       title = "Parameter histograms from analytic solution") +
  theme_classic()
analytic_hist_1 <- ggplot(data = analytic_par_est_tibble, aes(x=beta_1)) +
  geom_histogram(fill = "lightgreen",
                 colour = "black") +
  labs(x = "beta_1 estimates") +
  theme_classic()

#Get some initialising estimates
possible_error <- analytic_par_est_tibble %>%
  filter(beta_0 > mean(analytic_par_est_tibble$beta_0))

#To speed up the iterative algorithm we provide some initial conditions
mu <- list( #Means from true parameters and extreme estimates
  true = c(beta_0, beta_1),
  error = c(mean(possible_error$beta_0), 
            mean(possible_error$beta_1))
)

#Fit multivariate normal finite mixture model to the estimates
mix_model_analytic <- mvnormalmixEM(x = analytic_par_est_tibble[,c(3,4)], mu = mu)
summary(mix_model_analytic)
print(mix_model_analytic$lambda)

dist_table <- tibble( #Data to calculate distance
  b_0 = c(mix_model_analytic$mu[[2]][1], 
          mix_model_analytic$mu[[1]][1]),
  b_1 = c(mix_model_analytic$mu[[2]][2], 
          mix_model_analytic$mu[[1]][2])
)
print(dist(dist_table))
```

### Independent errors
We check that the independent error fits with the analytic solution produced a (minimally biased) unbiased estimate of the parameters. We get more variance in the estimates, but they are centred at the correct values.
```{r}
pars_combo <- c(beta_0, beta_1)

indep_error_hist_0 <- ggplot(data = indep_err_par_est_tibble, 
                             aes(x=beta_0)) +
  geom_histogram(fill = "lightblue",
                 colour = "black") +
  labs(x = "beta_0 estimates",
       title = "Parameter histograms from independent errors") +
  theme_classic()
indep_error_hist_1 <- ggplot(data = indep_err_par_est_tibble, 
                             aes(x=beta_1)) +
  geom_histogram(fill = "lightgreen",
                 colour = "black") +
  labs(x = "beta_1 estimates") +
  theme_classic()

#Split into clusters
cluster_1 <- indep_err_par_est_tibble %>%
  filter(beta_0 > mean(indep_err_par_est_tibble$beta_0))
cluster_2 <- indep_err_par_est_tibble %>%
  filter(beta_0 <= mean(indep_err_par_est_tibble$beta_0))

mu <- list( #Means from true parameters and extreme estimates
  cluster_1 = c(mean(cluster_1$beta_0), 
                mean(cluster_1$beta_1)),
  cluster_2 = c(mean(cluster_2$beta_0), 
            mean(cluster_2$beta_1))
)

#Fit multivariate normal finite mixture model to the estimates
analytic_indep_mix_models <- mvnormalmixEM(x = indep_err_par_est_tibble[,c(3,4)], mu = mu)

summary(analytic_indep_mix_models)

indep_error_post_par_table <- tibble(
  par_name = c("$\beta_0$", "$\beta_1$"),
  error = "Indep",
  int = "Analytic",
  par_true = pars_combo,
  par_mean = c(
    mean(indep_err_par_est_tibble$beta_0),
    mean(indep_err_par_est_tibble$beta_1)
  ),
  ci_lower = c(
    as.numeric(quantile(indep_err_par_est_tibble$beta_0, probs = 0.025)),
    as.numeric(quantile(indep_err_par_est_tibble$beta_1, probs = 0.025))
  ),
  ci_upper = c(
    as.numeric(quantile(indep_err_par_est_tibble$beta_0, probs = 0.975)),
    as.numeric(quantile(indep_err_par_est_tibble$beta_1, probs = 0.975))
  )
)

#Get mean and CIs
same_error_post_par_table <- tibble(
  par_name = c("$\beta_0$", "$\beta_1$"),
  error = "Same obs.",
  int = "Analytic",
  par_true = pars_combo,
  par_mean = c(
    mean(analytic_par_est_tibble$beta_0),
    mean(analytic_par_est_tibble$beta_1)
  ),
  ci_lower = c(
    as.numeric(quantile(analytic_par_est_tibble$beta_0, probs = 0.025)),
    as.numeric(quantile(analytic_par_est_tibble$beta_1, probs = 0.025))
  ),
  ci_upper = c(
    as.numeric(quantile(analytic_par_est_tibble$beta_0, probs = 0.975)),
    as.numeric(quantile(analytic_par_est_tibble$beta_1, probs = 0.975))
  )
)

#Get mean and CIs for numerics with same obs
par_est_tibble_0_5 <- par_est_tibble %>% #Filter out other step sizes and bad mode
  filter(step_size == 0.5, beta_0 < 20)
same_error_rk4_post_par_table <- tibble(
  par_name = c("$\beta_0$", "$\beta_1$"),
  error = "Same obs.",
  int = "RK4",
  par_true = pars_combo,
  par_mean = c(
    mean(par_est_tibble_0_5$beta_0),
    mean(par_est_tibble_0_5$beta_1)
  ),
  ci_lower = c(
    as.numeric(quantile(par_est_tibble_0_5$beta_0, probs = 0.025)),
    as.numeric(quantile(par_est_tibble_0_5$beta_1, probs = 0.025))
  ),
  ci_upper = c(
    as.numeric(quantile(par_est_tibble_0_5$beta_0, probs = 0.975)),
    as.numeric(quantile(par_est_tibble_0_5$beta_1, probs = 0.975))
  )
)

#Numerics with indep. error
#Get some extreme estimates
possible_error <- indep_err_numeric_par_est_tibble %>%
  filter(beta_0 > mean(indep_err_numeric_par_est_tibble$beta_0))

#To speed up the iterative algorithm we provide some initial conditions
mu <- list( #Means from true parameters and extreme estimates
  true = c(beta_0, beta_1),
  error = c(mean(possible_error$beta_0), 
            mean(possible_error$beta_1))
)

#Fit multivariate normal finite mixture model to the estimates
numeric_indep_mix_models <- mvnormalmixEM(x = indep_err_numeric_par_est_tibble[,c(3,4)], mu = mu)

summary(numeric_indep_mix_models)

par_est_tibble_0_5_indep <- indep_err_numeric_par_est_tibble %>% #Filter out other step sizes and bad mode
  filter(beta_0 < 20)
indep_error_rk4_post_par_table <- tibble(
  par_name = c("$\beta_0$", "$\beta_1$"),
  error = "Indep",
  int = "RK4",
  par_true = pars_combo,
  par_mean = c(
    mean(par_est_tibble_0_5_indep$beta_0),
    mean(par_est_tibble_0_5_indep$beta_1)
  ),
  ci_lower = c(
    as.numeric(quantile(par_est_tibble_0_5_indep$beta_0, probs = 0.025)),
    as.numeric(quantile(par_est_tibble_0_5_indep$beta_1, probs = 0.025))
  ),
  ci_upper = c(
    as.numeric(quantile(par_est_tibble_0_5_indep$beta_0, probs = 0.975)),
    as.numeric(quantile(par_est_tibble_0_5_indep$beta_1, probs = 0.975))
  )
)

bias_analysis_table <- rbind(indep_error_post_par_table,
      indep_error_rk4_post_par_table,
      same_error_post_par_table,
      same_error_rk4_post_par_table
      ) %>%
  arrange(par_name, error)

bias_analysis_table
```

### Alternate parameters
```{r}
diff_par_hist_0 <- ggplot(data = diff_par_est_tibble, 
                             aes(x=beta_0)) +
  geom_histogram(fill = "lightblue",
                 colour = "black") +
  labs(x = "beta_0 estimates",
       title = "Parameter histograms from alternate parameters") +
  theme_classic()
diff_par_hist_1 <- ggplot(data = diff_par_est_tibble, 
                             aes(x=beta_1)) +
  geom_histogram(fill = "lightgreen",
                 colour = "black") +
  labs(x = "beta_1 estimates") +
  theme_classic()

ggplot(data = diff_par_est_tibble, 
                             aes(x=beta_0, y=beta_1)) +
  geom_point(colour = "lightgreen") +
  labs(x = "beta_0 estimates",
       y = "beta_1 estimates") +
  theme_classic()

#Get some extreme estimates
possible_error <- diff_par_est_tibble %>%
  filter(beta_0 > mean(diff_par_est_tibble$beta_0))

#To speed up the iterative algorithm we provide some initial conditions
mu <- list( #Means from true parameters and extreme estimates
  true = c(beta_0, beta_1),
  error = c(mean(possible_error$beta_0), 
            mean(possible_error$beta_1))
)

diff_par_mix_models <- mvnormalmixEM(x = diff_par_est_tibble[,c(3,4)], mu = mu)

summary(diff_par_mix_models)

plot(diff_par_mix_models,
     whichplots = 2,
     xlab2 = "Beta 0",
     ylab2 = "Beta 1")

dist_table <- tibble( #Data to calculate distance
    b_0 = c(diff_par_mix_models$mu[[2]][1], 
            diff_par_mix_models$mu[[1]][1]),
    b_1 = c(diff_par_mix_models$mu[[2]][2], 
            diff_par_mix_models$mu[[1]][2])
  )
dist(dist_table)

diff_par_mix_models$mu[[2]][1]/diff_par_mix_models$mu[[2]][2]
diff_par_mix_models$mu[[1]][1]/diff_par_mix_models$mu[[1]][2]

```

### Optimisation tests
```{r}
optimizing_tests <- c("analytic", "rk4")
optimizing_mix_model_list <- list()
optimizing_scatter <- list()

for(i in 1:length(optimizing_tests)){
  data <- readRDS(paste0("output/", optimizing_tests[i], "_random_optimizing.rds"))
  est_tibble <- data[[1]]
  
  optimizing_scatter[[i]] <- ggplot(data = est_tibble,
       aes(x = beta_0, y = beta_1)) +
    geom_point(colour = legend_spec$colours[1], 
               shape = legend_spec$shapes[1], 
               alpha = 0.5,
               size = 2) +
    labs(x = "beta_0 est.",
           y = "beta_1 est.") +
    theme_classic()

  #Get some initialising estimates
  cluster_1 <- est_tibble %>%
    filter(beta_0 <= mean(est_tibble$beta_0))
  cluster_2 <- est_tibble %>%
    filter(beta_0 > mean(est_tibble$beta_0))
  
  #To speed up the iterative algorithm we provide some initial conditions
  mu <- list( #Means from true parameters and extreme estimates
    cluster_1_mean = c(mean(cluster_1$beta_0), 
              mean(cluster_1$beta_1)),
    cluster_2_mean = c(mean(cluster_2$beta_0), 
              mean(cluster_2$beta_1))
  )
  
  #Fit multivariate normal finite mixture model to the estimates
  optimizing_mix_model_list[[i]] <- mvnormalmixEM(x = est_tibble[,c(4,5)], mu = mu)
  summary(optimizing_mix_model_list[[i]])
  print(optimizing_mix_model_list[[i]]$lambda)
  
  dist_table <- tibble( #Data to calculate distance
    b_0 = c(optimizing_mix_model_list[[i]]$mu[[2]][1], 
            optimizing_mix_model_list[[i]]$mu[[1]][1]),
    b_1 = c(optimizing_mix_model_list[[i]]$mu[[2]][2], 
            optimizing_mix_model_list[[i]]$mu[[1]][2])
  )
  print(dist(dist_table))
}

```

### Canham testing
Lucky for us, there is no evidence of secondary posterior clusters in the Canham estimates.
```{r}
pars_combo <- c(g_max = 0.8,
                y_max = 8,
                k = 1)
y_0 <- 1
par_names <- c("g_max", "y_max", "k")

Canham_DE <- function(Time, State, Pars) { #Pars: g_max, y_max, k
  with(as.list(c(State, Pars)), {
    dY <- g_max * exp(-0.5 * (log(Y / y_max) / k)^2)
    
    return(list(c(dY)))
  })
}

times <- seq(0, 49, by = 0.0001)
yini  <- c(Y = y_0) #Initial condition
canham_true_y <- ode(yini, times, Canham_DE, pars_combo, method = "rk4")[,2]
canham_true_data <- tibble(
  y = canham_true_y,
  time = times
) %>%
  filter(time %in% seq(from = 0, to = 49, by = 5))

canham_hist_1 <- ggplot(data = canham_par_est_tibble, aes(x=g_max)) +
  geom_histogram(fill = "lightblue",
                 colour = "black") +
  labs(x = "g_max estimates") +
  theme_classic()
canham_hist_2 <- ggplot(data = canham_par_est_tibble, aes(x=y_max)) +
  geom_histogram(fill = "lightgreen",
                 colour = "black") +
  labs(x = "y_max estimates") +
  theme_classic()
canham_hist_3 <- ggplot(data = canham_par_est_tibble, aes(x=k)) +
  geom_histogram(fill = "#DDB8F9",
                 colour = "black") +
  labs(x = "k estimates") +
  theme_classic()

#Scatter plots
pairs_index <- list(
  c(1,2),
  c(2,3),
  c(3,1)
)
canham_scatter_list <- list()
for(i in pairs_index){
  plot_data <- tibble(
      x = canham_par_est_tibble[[par_names[i[1]]]],
      y = canham_par_est_tibble[[par_names[i[2]]]]
    )
  canham_scatter_temp_plot <- ggplot(plot_data,
                                       aes(x = x,
                                           y = y)) +
    geom_point(colour = "darkorchid", alpha = 0.1) +
    labs(x = par_names[i[1]], y = par_names[i[2]]) +
    theme_classic()
  
  canham_scatter_temp <- list(canham_scatter_temp_plot)
  canham_scatter_list <- c(canham_scatter_list, 
                         canham_scatter_temp)
}

#Mean posterior estimates
est_pars <- list(
  pars = list(
    g_max = mean(canham_par_est_tibble$g_max),
    y_max = mean(canham_par_est_tibble$y_max),
    k = mean(canham_par_est_tibble$k)
  )
)

canham_post_par_table <- tibble(
  par_name = c("$g_{max}$", "$y_{max}$", "$k$"),
  par_true = pars_combo,
  par_mean = c(
    mean(canham_par_est_tibble$g_max),
    mean(canham_par_est_tibble$y_max),
    mean(canham_par_est_tibble$k)
  ),
  ci_lower = c(
    as.numeric(quantile(canham_par_est_tibble$g_max, probs = 0.025)),
    as.numeric(quantile(canham_par_est_tibble$y_max, probs = 0.025)),
    as.numeric(quantile(canham_par_est_tibble$k, probs = 0.025))
  ),
  ci_upper = c(
    as.numeric(quantile(canham_par_est_tibble$g_max, probs = 0.975)),
    as.numeric(quantile(canham_par_est_tibble$y_max, probs = 0.975)),
    as.numeric(quantile(canham_par_est_tibble$k, probs = 0.975))
  )
)

#Plot of growth functions
line_data <- tibble(
  x = c(100, 101, 100, 101),
  y = c(10, 11, 10, 11),
  source = c("True values", "True values",
             "Estimates", "Estimates")
)
ymin <- canham_true_data$y[1]
yfin <- canham_true_data$y[nrow(canham_true_data)]
canham_function_plot <- ggplot(line_data, aes(x = x, y = y, group = source)) +
  geom_line(aes(linetype = source, colour = source), linewidth = 0.8) +
  scale_linetype_manual(values = c("solid", "dashed")) +
  scale_colour_manual(values = c("darkorchid", "black")) +
  xlim(ymin, yfin) +
  ylim(0, 0.82) +
  labs(x = "Y(t)", y = "f", title = "Canham growth functions",
       colour = "Parameters", linetype = "Parameters") +
  geom_function(fun=hmde_model_des("canham_single_ind"),
                args=est_pars,
                colour="darkorchid", linewidth=1,
                xlim=c(ymin, yfin)) +
  geom_function(fun=hmde_model_des("canham_single_ind"),
                args=list(pars = list(pars_combo[1],
                                      pars_combo[2],
                                      pars_combo[3])),
                colour="black", linewidth=0.8,
                linetype = "dashed",
                xlim=c(ymin, yfin)) +
  theme_classic() +
  theme(legend.position = "inside",
        legend.position.inside = c(0.5, 0.3))
canham_function_plot
```

## Plots for paper
The following block uses the `plot_grid` function to produce arranged figures and is set up to use with 3 step sizes.
```{r, eval=FALSE}
#Histograms
plot_grid(beta_0_plot_list[[1]], 
          beta_0_plot_list[[2]],
          beta_0_plot_list[[3]],
          beta_1_plot_list[[1]], 
          beta_1_plot_list[[2]],
          beta_1_plot_list[[3]],
          nrow = 3, byrow = FALSE,
          align = "v")

#Labels
lab_vec_4 <- c("(a)", "(b)", "(c)", "(d)")
lab_vec_5 <- c("(a)", "(b)", "(c)", "(d)", "(e)")
lab_vec_6 <- c("(a)", "(b)", "(c)", "(d)", "(e)", "(f)")
lab_vec_8 <- c("(a)", "(b)", "(c)", "(d)", "(e)", "(f)", "(g)", "(h)")

#Numerics plots
plot_grid(
  numeric_plot_bad,
  numeric_plot_good,
  weights_plot_bad,
  weights_plot_good,
  nrow = 2,
  labels= lab_vec_4
)

#DE, analytic solutions, all parameters, and sizes over time
plot_grid(
  analytic_observed, est_scatter,
  error_de_plot, error_solution_plot,
  nrow = 2,
  byrow = TRUE,
  labels= lab_vec_4
)

#Step size figures
plot_grid(
  size_analytic_plots[[1]],
    stability_plots[[1]],
  size_analytic_plots[[2]],
    stability_plots[[2]],
  size_analytic_plots[[3]],
    stability_plots[[3]],
  nrow = 3,
    byrow = TRUE,
    labels=lab_vec_6
)

plot_grid(
  scatterplot_good_only[[1]],
    scatterplot_errors_only[[1]],
  scatterplot_good_only[[2]],
    scatterplot_errors_only[[2]],
  scatterplot_good_only[[3]],
    scatterplot_errors_only[[3]],
  nrow = 3,
    byrow = TRUE,
    labels=lab_vec_6
)


#RK45 and analytic solution
plot_grid(
  rk45_hist_0,
  rk45_hist_1,
  analytic_hist_0,
  analytic_hist_1,
  ncol = 1,
  labels = lab_vec_4
)

#Canham
upper_plot <- plot_grid(
  canham_obs_true,
  canham_function_plot,
  nrow = 1,
  labels = lab_vec_5[1:2]
)
scatter_plot <- plot_grid(
  canham_scatter_list[[1]],
  canham_scatter_list[[2]],
  canham_scatter_list[[3]],
  nrow = 1,
  labels = lab_vec_5[3:5]
)
hist_plot <- plot_grid(
  canham_hist_1,
  canham_hist_2,
  canham_hist_3,
  nrow = 1,
  labels = lab_vec_5[3:5]
)
scatter_hist <- plot_grid(
  canham_hist_1,
  canham_hist_2,
  canham_hist_3,
  canham_scatter_list[[1]],
  canham_scatter_list[[2]],
  canham_scatter_list[[3]],
  nrow = 2,
  labels = lab_vec_8[3:8],
  align = "v"
)

plot_grid(upper_plot,
          scatter_plot,
          nrow = 2)
plot_grid(upper_plot,
          hist_plot,
          nrow = 2)
plot_grid(upper_plot, #Dims: 750,850 
          scatter_hist,
          nrow = 2,
          rel_heights = c(0.4, 0.6))

```

# Where to from here?
For the purpose of the hmde package that this vignette is a part of, we account for the pathologies in this particular model by using the analytic solution for the von Bertalanffy equation. More work needs to be done to understand the interaction between numerical methods and MCMC sampling. What we have demonstrated is that the problem exists, and it is not enough to have numerical stability at the true parameter values because MCMC estimation moves around, you need numerical stability in a potentially quite large part of the parameter space. The good news is that simulated data and posterior plots with more accurate numerical methods can at least identify that something is going wrong.