-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simulate_scenarios()
to easily calculate outbreak size distribution
#275
base: main
Are you sure you want to change the base?
Changes from 2 commits
e1b6891
94674e6
0e87e5e
6237c9b
ce7c1c9
4811ac3
d5d1d38
9348ed7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -477,3 +477,107 @@ simulate_chain_stats <- function(n_chains, | |
|
||
return(out) | ||
} | ||
|
||
#' Calculate transmission chain statistics across a range of scenarios and | ||
#' divide into intervals | ||
#' | ||
#' @details | ||
#' The `offspring_dist` argument is not equivalent to the `offspring_dist` | ||
#' argument in [simulate_chains()] and [simulate_chain_stats()], in those | ||
#' functions `offspring_dist` should be a `<function>`, whereas in | ||
#' `simulate_scenarios()` `offspring_dist` should be a vector of `character` | ||
#' strings which match function names. | ||
Comment on lines
+485
to
+489
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the rationale for deviating from the package architecture here? If this is to become part of the package I think we should follow a consistent design. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was due to an issue with R_seq = seq(0.1, 1, 0.1)
k_seq = seq(0.1, 0.5, 0.1)
offspring_dist = rpois
statistic = c("size", "length")
scenarios <- expand.grid(
offspring_dist = offspring_dist,
statistic = statistic,
R = R_seq,
k = k_seq,
stringsAsFactors = FALSE
)
#> Error in paste0(nmc[i], "=", if (is.numeric(x)) format(x) else x): cannot coerce type 'closure' to vector of type 'character' Created on 2024-09-06 with reprex v2.1.0 |
||
#' | ||
#' @inheritParams simulate_chains | ||
#' @param offspring_dist Offspring distribution: a `character` string with the | ||
#' name of the random number generator function. For example those | ||
#' provided by R to generate random numbers from given distributions (e.g., | ||
#' [rpois()] for Poisson). If the `character` string does not match | ||
#' a function then the function will error (see [match.fun()] for details). | ||
#' Examples that can be provided here are `"rpois"` for Poisson distributed | ||
#' offspring, `"rnbinom"` for negative binomial offspring. | ||
#' @param R_seq A `numeric` vector of reproduction number values to simulate | ||
#' the branching process for. | ||
#' @param k_seq A `numeric` vector of dispersion values to simulate the | ||
#' branching process for. Only applicable for `offspring_dist = "rnbinom"`. | ||
#' @inheritParams base::cut | ||
Comment on lines
+499
to
+503
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not call them There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, they can be renamed. |
||
#' @param ... [dots] Extra arguments to be passed to [simulate_chain_stats()]. | ||
#' If argument does not match one from [simulate_chain_stats()] the function | ||
#' will error. | ||
#' | ||
#' @return A `<data.frame>`. | ||
#' @export | ||
#' | ||
#' @examples | ||
#' simulate_scenarios( | ||
#' statistic = c("size", "length"), | ||
#' offspring_dist = c("rpois", "rnbinom"), | ||
#' R_seq = seq(0.1, 0.5, 0.1), | ||
#' k_seq = seq(0.1, 0.5, 0.1), | ||
#' breaks = c(0, 2, 5, 10, 20, 50, 100, Inf) | ||
#' ) | ||
#' | ||
#' # when R > 1 pass a stopping criteria to simulate_chain_stats() | ||
#' simulate_scenarios( | ||
#' statistic = c("size", "length"), | ||
#' offspring_dist = c("rpois", "rnbinom"), | ||
#' R_seq = seq(0.5, 2.0, 0.5), | ||
#' k_seq = seq(0.1, 0.5, 0.1), | ||
#' breaks = c(0, 2, 5, 10, 20, 50, 100, Inf), | ||
#' stat_threshold = 10 | ||
#' ) | ||
simulate_scenarios <- function(statistic, | ||
offspring_dist, | ||
R_seq, | ||
k_seq, | ||
breaks, | ||
...) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, this function simulates only one run for each scenario, so we will need to allow for multiple runs here. This is related to #41. I'm torn between having it here versus in the main simulation functions because users may not want to explore scenarios using this function but may want to run the main functions multiple times for one parameter set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I'll follow your lead on where you think this is best to implement. |
||
scenarios <- expand.grid( | ||
offspring_dist = offspring_dist, | ||
statistic = statistic, | ||
R = R_seq, | ||
k = k_seq, | ||
stringsAsFactors = FALSE | ||
) | ||
|
||
# remove any variation in k for rpois as parameter not used | ||
if ("rpois" %in% offspring_dist && length(unique(scenarios$k)) > 1) { | ||
drop_rows <- c( | ||
scenarios$offspring_dist == "rpois" & | ||
scenarios$k != unique(scenarios$k)[1] | ||
) | ||
scenarios <- scenarios[!drop_rows, ] | ||
# convert rpois k to NA to make clear it isn't used | ||
scenarios$k[scenarios$offspring_dist == "rpois"] <- NA_real_ | ||
} | ||
|
||
df_list <- vector(mode = "list", length = nrow(scenarios)) | ||
for (i in seq_len(nrow(scenarios))) { | ||
args <- list( | ||
n_chains = 1000, | ||
offspring_dist = match.fun(scenarios[i, "offspring_dist"]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll have to consider allowing string function names so that we don't have to do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would ask the other way round, why not use functions here instead of character strings? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It relates the the |
||
statistic = scenarios[i, "statistic"] | ||
) | ||
if (scenarios[i, "offspring_dist"] == "rpois") { | ||
args$lambda <- scenarios[i, "R"] | ||
} else { | ||
# TODO: work out how to handle cases of functions that can be matched from | ||
# base or in the users global environment but are parameterised | ||
# differently | ||
args_ <- list(mu = scenarios[i, "R"], size = scenarios[i, "k"]) | ||
args <- c(args, args_) | ||
} | ||
args <- utils::modifyList(x = args, val = list(...)) | ||
x <- do.call(epichains::simulate_chain_stats, args = args) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is one point of generalisation where we can pass either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. |
||
interval <- cut(x, breaks = breaks) | ||
prop <- table(interval) / sum(table(interval)) | ||
df_ <- as.data.frame(prop) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This only fulfills a single use case and we may want to not do this here and leave it to the user. This function could just be about simulating the outbreak over a grid of scenarios and returning the results for downstream analyses/post-processing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. |
||
df_$R <- scenarios[i, "R"] | ||
df_$k <- scenarios[i, "k"] | ||
df_$offspring_dist <- scenarios[i, "offspring_dist"] | ||
df_$statistic <- scenarios[i, "statistic"] | ||
df_list[[i]] <- df_ | ||
} | ||
df <- do.call(rbind, df_list) | ||
return(df) | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -379,3 +379,42 @@ test_that("simulate_chain_stats is numerically correct", { | |
c(1.00, 3.00) | ||
) | ||
}) | ||
|
||
test_that("simulate_scenarios work as expected", { | ||
df <- simulate_scenarios( | ||
offspring_dist = c("rpois", "rnbinom"), | ||
statistic = c("size", "length"), | ||
R_seq = seq(0.1, 0.5, 0.1), | ||
k_seq = seq(0.1, 0.5, 0.1), | ||
joshwlambert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
breaks = c(0, 5, 10, 20, Inf) | ||
joshwlambert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
expect_s3_class(df, class = "data.frame") | ||
expect_identical(dim(df), c(240L, 6L)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should remove the hardcoding here and rather find the dimensions of the result of an |
||
expect_identical( | ||
colnames(df), | ||
c("interval", "Freq", "R", "k", "offspring_dist", "statistic") | ||
) | ||
expect_identical(unique(df$statistic), c("size", "length")) | ||
expect_identical(unique(df$offspring_dist), c("rpois", "rnbinom")) | ||
expect_s3_class(df$interval, class = "factor") | ||
}) | ||
|
||
test_that("simulate_scenarios work as expected using `...`", { | ||
df <- simulate_scenarios( | ||
offspring_dist = c("rpois", "rnbinom"), | ||
statistic = c("size", "length"), | ||
R_seq = seq(0.1, 0.5, 0.1), | ||
k_seq = seq(0.1, 0.5, 0.1), | ||
breaks = c(0, 5, 10, 20, Inf), | ||
stat_threshold = 15 | ||
) | ||
expect_s3_class(df, class = "data.frame") | ||
expect_identical(dim(df), c(240L, 6L)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as before on removing the hardcoding to make the scenarios amenable to changes in the inputs. |
||
expect_identical( | ||
colnames(df), | ||
c("interval", "Freq", "R", "k", "offspring_dist", "statistic") | ||
) | ||
expect_identical(unique(df$statistic), c("size", "length")) | ||
expect_identical(unique(df$offspring_dist), c("rpois", "rnbinom")) | ||
expect_s3_class(df$interval, class = "factor") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a comment below on the fact that the part about dividing into intervals is just one use case and so this function should probably return the raw results.