Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vignettes/variation #225

Merged
merged 21 commits into from
May 24, 2023
Merged
Changes from 20 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
00e3b9b
First edits to add plotting functions in base R
kellymccain28 Feb 15, 2023
71e78f8
Updated some plots to base R
kellymccain28 Feb 15, 2023
ff529e6
More plot modifications to avoid ggplot2
kellymccain28 Feb 15, 2023
db6a683
Removing cowplot dependency and changing last plot to base r
kellymccain28 Feb 15, 2023
8095e07
Updating plot rendering
kellymccain28 Feb 21, 2023
b212c28
Improved explanations and plots
kellymccain28 Feb 23, 2023
45b0d87
Fixing formatting, edits to explanations
kellymccain28 Feb 27, 2023
557c033
adding clarification/explanation
kellymccain28 Feb 28, 2023
ea354a6
fix y axis label
kellymccain28 Feb 28, 2023
5283b19
Updating plots, editing text, removed p_detect references
kellymccain28 Mar 2, 2023
0000e12
Last minutes updates to the text
kellymccain28 Mar 2, 2023
edb9336
updating to rmarkdown::html_vignette to match other vignettes
kellymccain28 Mar 2, 2023
bb25014
added section on parameter variation
Mar 9, 2023
3a5af92
Edits from pete/hillary's comments; added incidence plot, cleaned up …
kellymccain28 Mar 10, 2023
fe02151
Merge branch 'vignettes/variation' of https://github.com/mrc-ide/mala…
kellymccain28 Mar 10, 2023
fd42b12
Removed Lydia's section - she will make new vignette
kellymccain28 Mar 10, 2023
2b057d0
changed last plots to plot prevalence / fixed y axes
kellymccain28 Mar 10, 2023
1583772
remove Estimating stochastic variation, add stochastic elimination
kellymccain28 Mar 28, 2023
77b371c
Removing unclear sentence
kellymccain28 Mar 30, 2023
dbd2323
last edits
kellymccain28 Mar 30, 2023
05da94c
Changed plot so that beginning of sim is not 0% prev
kellymccain28 Apr 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 100 additions & 61 deletions vignettes/Variation.Rmd
Original file line number Diff line number Diff line change
@@ -1,91 +1,130 @@
---
title: "Variation"
output: html_document
title: "Stochastic Variation"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Variation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
```{r setup}
library(malariasimulation)
library(cowplot)
library(ggplot2)
set.seed(123)
```

## Variation in outputs
`malariasimulation` is a stochastic, individual-based model and, as such, simulations run with identical parameterisations will generate non-identical, sometimes highly variable, outputs. To illustrate this, we will compare the prevalence and incidence of malaria over a year in simulations with a small and a larger population.

Malariasimulation is a stochastic model, so there will be be variation in model outputs. For example, we could look at detected malaria cases over a year...
### Plotting functions
First, we will create a few plotting functions to visualise outputs.
```{r}
cols <- c("#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")

plot_prev <- function(output, ylab = TRUE, ylim = c(0,1)){
if (ylab == TRUE) {
ylab = "Prevalence in children aged 2-10 years"
} else {ylab = ""}
plot(x = output$timestep, y = output$n_detect_730_3650 / output$n_730_3650,
type = "l", col = cols[3], ylim = ylim,
xlab = "Time (days)", ylab = ylab, lwd = 2,
xaxs = "i", yaxs = "i", bty = "l")
grid(lty = 2, col = "grey80", lwd = 1)
}

plot_inci <- function(output, ylab = TRUE, ylim){
if (ylab == TRUE) {
ylab = "Incidence per 1000 children aged 0-5 years"
} else {ylab = ""}
plot(x = output$timestep, y = output$n_inc_clinical_0_1825 / output$n_0_1825 * 1000,
type = "l", col = cols[5], ylim = ylim,
xlab = "Time (days)", ylab = ylab, lwd = 2,
xaxs = "i", yaxs = "i", bty = "l")
grid(lty = 2, col = "grey80", lwd = 1)
}
```

## Parameterisation
First, we will use the `get_parameters()` function to generate a list of parameters, accepting the default values, for two different population sizes and use the `set_equilibrium()` function to initialise the model at a given entomological inoculation rate (EIR). The only parameter which changes between the two parameter sets is the argument for `human_population`.
```{r}
# A small population
p <- get_parameters(list(
# A small population
simparams_small <- get_parameters(list(
human_population = 1000,
clinical_incidence_rendering_min_ages = 0,
clinical_incidence_rendering_max_ages = 5 * 365,
individual_mosquitoes = FALSE
))
p <- set_equilibrium(p, 2)
small_o <- run_simulation(365, p)
p <- get_parameters(list(

simparams_small <- set_equilibrium(parameters = simparams_small, init_EIR = 50)

# A larger population
simparams_big <- get_parameters(list(
human_population = 10000,
clinical_incidence_rendering_min_ages = 0,
clinical_incidence_rendering_max_ages = 5 * 365,
individual_mosquitoes = FALSE
))
p <- set_equilibrium(p, 2)
big_o <- run_simulation(365, p)

plot_grid(
ggplot(small_o) + geom_line(aes(timestep, n_detect_730_3650)) + ggtitle('n_detect at n = 1,000'),
ggplot(small_o) + geom_line(aes(timestep, p_detect_730_3650)) + ggtitle('p_detect at n = 1,000'),
ggplot(big_o) + geom_line(aes(timestep, n_detect_730_3650)) + ggtitle('n_detect at n = 10,000'),
ggplot(big_o) + geom_line(aes(timestep, p_detect_730_3650)) + ggtitle('p_detect at n = 10,000')
)

simparams_big <- set_equilibrium(parameters = simparams_big, init_EIR = 50)
```

The n_detect output shows the result of sampling individuals who would be detected by microscopy.

While the p_detect output shows the sum of probabilities of detection in the population.
## Simulations

Notice that the p_detect output is slightly smoother. That's because it forgoes the sampling step. At low population sizes, p_detect will be smoother than the n_detect counterpart.
The `n_detect_730_3650` output below shows the total number of individuals in the age group rendered (here, 730-3650 timesteps or 2-10 years) who have microscopy-detectable malaria. Notice that the output is smoother at a higher population.

## Estimating variation
Some outcomes will be more sensitive than others to stochastic variation even with the same population size. In the plots below, prevalence is smoother than incidence even at the same population. This is because prevalence is a measure of existing infection, while incidence is recording new cases per timestep.

We can estimate the variation in the number of detectable cases by repeating the simulation several times...
```{r, fig.align="center",fig.height=6, fig.width=8}
# A small population
output_small_pop <- run_simulation(timesteps = 365, parameters = simparams_small)

# A larger population
output_big_pop <- run_simulation(timesteps = 365, parameters = simparams_big)

# Plotting
kellymccain28 marked this conversation as resolved.
Show resolved Hide resolved
par(mfrow = c(2,2))
plot_prev(output_small_pop, ylim = c(0.6, 0.8)); title(expression(paste(italic(Pf),"PR"[2-10], " at n = 1,000")))
plot_inci(output_small_pop, ylim = c(0, 25)); title("Incidence per 1000 children at n = 1,000")
plot_prev(output_big_pop, ylim = c(0.6, 0.8)); title(expression(paste(italic(Pf),"PR"[2-10], " at n = 10,000")))
plot_inci(output_big_pop, ylim = c(0, 25)); title("Incidence per 1000 children at n = 10,000")
```

## Stochastic elimination
With stochastic models, random elimination of malaria in a small population with low transmisison may happen. In the example below, we run two simulations: one with a very small population, and one with a larger population. There is stochastic fade out (elimination) in the smaller population, while the larger population has stable transmission over time. For this reason, it is important to run models with large-enough populations to avoid stochastic elimination.
```{r}
outputs <- run_simulation_with_repetitions(
timesteps = 365,
repetitions = 10,
overrides = p,
parallel=TRUE
)

df <- aggregate(
outputs$n_detect_730_3650,
by=list(outputs$timestep),
FUN = function(x) {
c(
median = median(x),
lowq = unname(quantile(x, probs = .25)),
highq = unname(quantile(x, probs = .75)),
mean = mean(x),
lowci = mean(x) + 1.96*sd(x),
highci = mean(x) - 1.96*sd(x)
)
}
)

df <- data.frame(cbind(t = df$Group.1, df$x))

plot_grid(
ggplot(df)
+ geom_line(aes(x = t, y = median, group = 1))
+ geom_ribbon(aes(x = t, ymin = lowq, ymax = highq), alpha = .2)
+ xlab('timestep') + ylab('n_detect') + ggtitle('IQR spread'),
ggplot(df)
+ geom_line(aes(x = t, y = mean, group = 1))
+ geom_ribbon(aes(x = t, ymin = lowci, ymax = highci), alpha = .2)
+ xlab('timestep') + ylab('n_detect') + ggtitle('95% confidence interval')
)
# A small population
simparams_small <- get_parameters(list(
human_population = 50,
clinical_incidence_rendering_min_ages = 0,
clinical_incidence_rendering_max_ages = 5 * 365,
individual_mosquitoes = FALSE
))

simparams_small <- set_equilibrium(parameters = simparams_small, init_EIR = 1)

# A larger population
simparams_big <- get_parameters(list(
human_population = 1000,
clinical_incidence_rendering_min_ages = 0,
clinical_incidence_rendering_max_ages = 5 * 365,
individual_mosquitoes = FALSE
))

simparams_big <- set_equilibrium(parameters = simparams_big, init_EIR = 1)
```

These are useful techniques for comparing the expected variance from model outputs to outcomes from trials with lower population sizes.
## Simulations

```{r, fig.align="center",fig.height=6, fig.width=8}
set.seed(555)
# A small population
output_small_pop <- run_simulation(timesteps = 365 * 2, parameters = simparams_small)

# A larger population
output_big_pop <- run_simulation(timesteps = 365 * 2, parameters = simparams_big)

# Plotting
par(mfrow = c(1, 2))
plot_prev(output_small_pop, ylim = c(0, 1)); title(expression(paste(italic(Pf),"PR"[2-10], " at n = 50")))
plot_prev(output_big_pop, ylab = FALSE, ylim = c(0, 1)); title(expression(paste(italic(Pf),"PR"[2-10], " at n = 1,000")))
```