Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simulate_scenarios() to easily calculate outbreak size distribution #275

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,4 @@ Encoding: UTF-8
Language: en-GB
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ export(rborel)
export(rgborel)
export(simulate_chain_stats)
export(simulate_chains)
export(simulate_scenarios)
importFrom(stats,aggregate)
importFrom(stats,rgamma)
importFrom(stats,rpois)
Expand Down
104 changes: 104 additions & 0 deletions R/simulate.R
Original file line number Diff line number Diff line change
Expand Up @@ -477,3 +477,107 @@ simulate_chain_stats <- function(n_chains,

return(out)
}

#' Calculate transmission chain statistics across a range of scenarios and
#' divide into intervals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a comment below on the fact that the part about dividing into intervals is just one use case and so this function should probably return the raw results.

#'
#' @details
#' The `offspring_dist` argument is not equivalent to the `offspring_dist`
#' argument in [simulate_chains()] and [simulate_chain_stats()], in those
#' functions `offspring_dist` should be a `<function>`, whereas in
#' `simulate_scenarios()` `offspring_dist` should be a vector of `character`
#' strings which match function names.
Comment on lines +485 to +489
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for deviating from the package architecture here? If this is to become part of the package I think we should follow a consistent design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was due to an issue with expand.grid() and functions. I can't remember the exact issue when we were working on it, but it was something like this:

R_seq = seq(0.1, 1, 0.1)
k_seq = seq(0.1, 0.5, 0.1)
offspring_dist = rpois
statistic = c("size", "length")
scenarios <- expand.grid(
  offspring_dist = offspring_dist,
  statistic = statistic,
  R = R_seq,
  k = k_seq,
  stringsAsFactors = FALSE
)
#> Error in paste0(nmc[i], "=", if (is.numeric(x)) format(x) else x): cannot coerce type 'closure' to vector of type 'character'

Created on 2024-09-06 with reprex v2.1.0

#'
#' @inheritParams simulate_chains
#' @param offspring_dist Offspring distribution: a `character` string with the
#' name of the random number generator function. For example those
#' provided by R to generate random numbers from given distributions (e.g.,
#' [rpois()] for Poisson). If the `character` string does not match
#' a function then the function will error (see [match.fun()] for details).
#' Examples that can be provided here are `"rpois"` for Poisson distributed
#' offspring, `"rnbinom"` for negative binomial offspring.
#' @param R_seq A `numeric` vector of reproduction number values to simulate
#' the branching process for.
#' @param k_seq A `numeric` vector of dispersion values to simulate the
#' branching process for. Only applicable for `offspring_dist = "rnbinom"`.
#' @inheritParams base::cut
Comment on lines +499 to +503
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call them R and k? They don't generally have to be a sequence here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can be renamed.

#' @param ... [dots] Extra arguments to be passed to [simulate_chain_stats()].
#' If argument does not match one from [simulate_chain_stats()] the function
#' will error.
#'
#' @return A `<data.frame>`.
#' @export
#'
#' @examples
#' simulate_scenarios(
#' statistic = c("size", "length"),
#' offspring_dist = c("rpois", "rnbinom"),
#' R_seq = seq(0.1, 0.5, 0.1),
#' k_seq = seq(0.1, 0.5, 0.1),
#' breaks = c(0, 2, 5, 10, 20, 50, 100, Inf)
#' )
#'
#' # when R > 1 pass a stopping criteria to simulate_chain_stats()
#' simulate_scenarios(
#' statistic = c("size", "length"),
#' offspring_dist = c("rpois", "rnbinom"),
#' R_seq = seq(0.5, 2.0, 0.5),
#' k_seq = seq(0.1, 0.5, 0.1),
#' breaks = c(0, 2, 5, 10, 20, 50, 100, Inf),
#' stat_threshold = 10
#' )
simulate_scenarios <- function(statistic,
offspring_dist,
R_seq,
k_seq,
breaks,
...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this function simulates only one run for each scenario, so we will need to allow for multiple runs here. This is related to #41. I'm torn between having it here versus in the main simulation functions because users may not want to explore scenarios using this function but may want to run the main functions multiple times for one parameter set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll follow your lead on where you think this is best to implement.

scenarios <- expand.grid(
offspring_dist = offspring_dist,
statistic = statistic,
R = R_seq,
k = k_seq,
stringsAsFactors = FALSE
)

# remove any variation in k for rpois as parameter not used
if ("rpois" %in% offspring_dist && length(unique(scenarios$k)) > 1) {
drop_rows <- c(
scenarios$offspring_dist == "rpois" &
scenarios$k != unique(scenarios$k)[1]
)
scenarios <- scenarios[!drop_rows, ]
# convert rpois k to NA to make clear it isn't used
scenarios$k[scenarios$offspring_dist == "rpois"] <- NA_real_
}

df_list <- vector(mode = "list", length = nrow(scenarios))
for (i in seq_len(nrow(scenarios))) {
args <- list(
n_chains = 1000,
offspring_dist = match.fun(scenarios[i, "offspring_dist"]),
Copy link
Member

@jamesmbaazam jamesmbaazam Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to consider allowing string function names so that we don't have to do match.fun as it's not ideal. I'll go back through previous considerations in #167 to remind myself why we moved away from character function names as there was a lot of discussion in the early days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ask the other way round, why not use functions here instead of character strings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It relates the the expand.grid() issue stated above. We didn't spend very long trying to find work arounds, so it may be the case that we can use the same function interface in simulate_scenarios().

statistic = scenarios[i, "statistic"]
)
if (scenarios[i, "offspring_dist"] == "rpois") {
args$lambda <- scenarios[i, "R"]
} else {
# TODO: work out how to handle cases of functions that can be matched from
# base or in the users global environment but are parameterised
# differently
args_ <- list(mu = scenarios[i, "R"], size = scenarios[i, "k"])
args <- c(args, args_)
}
args <- utils::modifyList(x = args, val = list(...))
x <- do.call(epichains::simulate_chain_stats, args = args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one point of generalisation where we can pass either simulate_chains() or simulate_chain_stats() and the class of result will depend on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

interval <- cut(x, breaks = breaks)
prop <- table(interval) / sum(table(interval))
df_ <- as.data.frame(prop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only fulfills a single use case and we may want to not do this here and leave it to the user. This function could just be about simulating the outbreak over a grid of scenarios and returning the results for downstream analyses/post-processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

df_$R <- scenarios[i, "R"]
df_$k <- scenarios[i, "k"]
df_$offspring_dist <- scenarios[i, "offspring_dist"]
df_$statistic <- scenarios[i, "statistic"]
df_list[[i]] <- df_
}
df <- do.call(rbind, df_list)
return(df)
}
75 changes: 75 additions & 0 deletions man/simulate_scenarios.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

39 changes: 39 additions & 0 deletions tests/testthat/test-simulate.R
Original file line number Diff line number Diff line change
Expand Up @@ -379,3 +379,42 @@ test_that("simulate_chain_stats is numerically correct", {
c(1.00, 3.00)
)
})

test_that("simulate_scenarios work as expected", {
df <- simulate_scenarios(
offspring_dist = c("rpois", "rnbinom"),
statistic = c("size", "length"),
R_seq = seq(0.1, 0.5, 0.1),
k_seq = seq(0.1, 0.5, 0.1),
joshwlambert marked this conversation as resolved.
Show resolved Hide resolved
breaks = c(0, 5, 10, 20, Inf)
joshwlambert marked this conversation as resolved.
Show resolved Hide resolved
)
expect_s3_class(df, class = "data.frame")
expect_identical(dim(df), c(240L, 6L))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the hardcoding here and rather find the dimensions of the result of an expand.grid().

expect_identical(
colnames(df),
c("interval", "Freq", "R", "k", "offspring_dist", "statistic")
)
expect_identical(unique(df$statistic), c("size", "length"))
expect_identical(unique(df$offspring_dist), c("rpois", "rnbinom"))
expect_s3_class(df$interval, class = "factor")
})

test_that("simulate_scenarios work as expected using `...`", {
df <- simulate_scenarios(
offspring_dist = c("rpois", "rnbinom"),
statistic = c("size", "length"),
R_seq = seq(0.1, 0.5, 0.1),
k_seq = seq(0.1, 0.5, 0.1),
breaks = c(0, 5, 10, 20, Inf),
stat_threshold = 15
)
expect_s3_class(df, class = "data.frame")
expect_identical(dim(df), c(240L, 6L))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before on removing the hardcoding to make the scenarios amenable to changes in the inputs.

expect_identical(
colnames(df),
c("interval", "Freq", "R", "k", "offspring_dist", "statistic")
)
expect_identical(unique(df$statistic), c("size", "length"))
expect_identical(unique(df$offspring_dist), c("rpois", "rnbinom"))
expect_s3_class(df$interval, class = "factor")
})