Add `simulate_scenarios()` to easily calculate outbreak size distribution #275

joshwlambert · 2024-08-23T10:30:29Z

This PR adds the simulate_scenarios() function which enables the exploration of parameter space of $R$ and $k$ values for different offspring distributions and bins the resulting transmission chain statistics into user-specified intervals to enable them to be easily plotted.

Note: This is a work-in-progress feature (hence the draft pull request). Before merging we need to:

generalise the ability to pass different distribution parameters to simulate_chain_stats()
consider if more input checking is required (I think if we can align the function signatures of simulate_scenarios() with simulate_chain_stats() then most, if not all, of the input checking can be deferred to that function).

codecov-commenter · 2024-08-23T10:33:53Z

Codecov Report

Attention: Patch coverage is 98.30508% with 1 line in your changes missing coverage. Please review.

Project coverage is 99.87%. Comparing base (dc53c2c) to head (9348ed7).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
R/simulate.R	98.30%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main     #275      +/-   ##
===========================================
- Coverage   100.00%   99.87%   -0.13%     
===========================================
  Files            8        8              
  Lines          755      814      +59     
===========================================
+ Hits           755      813      +58     
- Misses           0        1       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/testthat/test-simulate.R

jamesmbaazam · 2024-08-23T12:52:23Z

tests/testthat/test-simulate.R

+    breaks = c(0, 5, 10, 20, Inf)
+  )
+  expect_s3_class(df, class = "data.frame")
+  expect_identical(dim(df), c(240L, 6L))


We should remove the hardcoding here and rather find the dimensions of the result of an expand.grid().

jamesmbaazam · 2024-08-23T12:53:23Z

tests/testthat/test-simulate.R

+    stat_threshold = 15
+  )
+  expect_s3_class(df, class = "data.frame")
+  expect_identical(dim(df), c(240L, 6L))


Same comment as before on removing the hardcoding to make the scenarios amenable to changes in the inputs.

jamesmbaazam · 2024-08-23T12:58:28Z

R/simulate.R

+  for (i in seq_len(nrow(scenarios))) {
+    args <- list(
+      n_chains = 1000,
+      offspring_dist = match.fun(scenarios[i, "offspring_dist"]),


I'll have to consider allowing string function names so that we don't have to do match.fun as it's not ideal. I'll go back through previous considerations in #167 to remind myself why we moved away from character function names as there was a lot of discussion in the early days.

I would ask the other way round, why not use functions here instead of character strings?

It relates the the expand.grid() issue stated above. We didn't spend very long trying to find work arounds, so it may be the case that we can use the same function interface in simulate_scenarios().

jamesmbaazam · 2024-08-23T12:59:33Z

R/simulate.R

+      args <- c(args, args_)
+    }
+    args <- utils::modifyList(x = args, val = list(...))
+    x <- do.call(epichains::simulate_chain_stats, args = args)


This is one point of generalisation where we can pass either simulate_chains() or simulate_chain_stats() and the class of result will depend on this.

jamesmbaazam · 2024-08-23T13:02:25Z

R/simulate.R

+    interval <- cut(x, breaks = breaks)
+    prop <- table(interval) / sum(table(interval))
+    df_ <- as.data.frame(prop)


This only fulfills a single use case and we may want to not do this here and leave it to the user. This function could just be about simulating the outbreak over a grid of scenarios and returning the results for downstream analyses/post-processing.

jamesmbaazam · 2024-08-23T13:03:54Z

R/simulate.R

@@ -477,3 +477,107 @@ simulate_chain_stats <- function(n_chains,

  return(out)
 }
+
+#' Calculate transmission chain statistics across a range of scenarios and
+#' divide into intervals


I've left a comment below on the fact that the part about dividing into intervals is just one use case and so this function should probably return the raw results.

jamesmbaazam

Thanks for submitting this, Josh. I've left some quick comments for now as this is a living PR for the work in https://github.com/jamesmbaazam/h5n1_uk_scenario_modelling/. As discussed in person, we will have to generalise this beyond just poison and negative binomial. Looking forward to collaborating on this.

jamesmbaazam · 2024-08-27T14:57:03Z

R/simulate.R

+                               R_seq,
+                               k_seq,
+                               breaks,
+                               ...) {


Currently, this function simulates only one run for each scenario, so we will need to allow for multiple runs here. This is related to #41. I'm torn between having it here versus in the main simulation functions because users may not want to explore scenarios using this function but may want to run the main functions multiple times for one parameter set.

Good point. I'll follow your lead on where you think this is best to implement.

sbfnk · 2024-09-04T08:01:14Z

Thanks @joshwlambert, this looks like useful functionality!

I think there's potentially a broader question (to which I don't know the answer) of whether this kind of functionality should be offered by a package as it's essentially a loop over package function calls. In {socialmixr} we made a decision the other way round, i.e. to remove the ability to create replicates: epiforecasts/socialmixr#63

sbfnk · 2024-09-04T07:56:21Z

R/simulate.R

+#' The `offspring_dist` argument is not equivalent to the `offspring_dist`
+#' argument in [simulate_chains()] and [simulate_chain_stats()], in those
+#' functions `offspring_dist` should be a `<function>`, whereas in
+#' `simulate_scenarios()` `offspring_dist` should be a vector of `character`
+#' strings which match function names.


What's the rationale for deviating from the package architecture here? If this is to become part of the package I think we should follow a consistent design.

It was due to an issue with expand.grid() and functions. I can't remember the exact issue when we were working on it, but it was something like this:

R_seq = seq(0.1, 1, 0.1) k_seq = seq(0.1, 0.5, 0.1) offspring_dist = rpois statistic = c("size", "length") scenarios <- expand.grid( offspring_dist = offspring_dist, statistic = statistic, R = R_seq, k = k_seq, stringsAsFactors = FALSE ) #> Error in paste0(nmc[i], "=", if (is.numeric(x)) format(x) else x): cannot coerce type 'closure' to vector of type 'character'

^{Created on 2024-09-06 with reprex v2.1.0}

sbfnk · 2024-09-04T07:56:55Z

R/simulate.R

+#' @param R_seq A `numeric` vector of reproduction number values to simulate
+#' the branching process for.
+#' @param k_seq A `numeric` vector of dispersion values to simulate the
+#' branching process for. Only applicable for `offspring_dist = "rnbinom"`.
+#' @inheritParams base::cut


why not call them R and k? They don't generally have to be a sequence here.

Yes, they can be renamed.

sbfnk · 2024-09-04T07:58:10Z

R/simulate.R

+  for (i in seq_len(nrow(scenarios))) {
+    args <- list(
+      n_chains = 1000,
+      offspring_dist = match.fun(scenarios[i, "offspring_dist"]),


I would ask the other way round, why not use functions here instead of character strings?

sbfnk · 2024-09-04T07:58:34Z

R/simulate.R

+    interval <- cut(x, breaks = breaks)
+    prop <- table(interval) / sum(table(interval))
+    df_ <- as.data.frame(prop)


Co-authored-by: James Azam <[email protected]>

joshwlambert · 2024-09-06T14:49:33Z

I think there's potentially a broader question (to which I don't know the answer) of whether this kind of functionality should be offered by a package as it's essentially a loop over package function calls. In {socialmixr} we made a decision the other way round, i.e. to remove the ability to create replicates: epiforecasts/socialmixr#63

I think before continuing development of this function it would be good to decide if we want this functionality as a function, or via documentation. I think the code in the function body could be used to write a vignette demonstrating how to simulate scenarios and then group them. This also relates to @jamesmbaazam's point:

This only fulfills a single use case and we may want to not do this here and leave it to the user. This function could just be about simulating the outbreak over a grid of scenarios and returning the results for downstream analyses/post-processing.

To me half the utility of this function comes from it wrapping cut() instead of requiring the user to know how to use it. But at the same time, a vignette is likely easier to maintain and might be more flexible to a variety of use cases we haven't foreseen.

As I'm just contributing, I'm happy to go with what you both (@jamesmbaazam & @sbfnk) prefer and then I'm happy to work on it.

…narios

joshwlambert · 2024-10-10T13:28:17Z

I think this function should be converted into a vignette. @jamesmbaazam & @sbfnk please let me know if you agree and how you'd like to proceed. I'm happy to draft the vignette and then can assign you both to review, edit and then merge if deemed suitable.

sbfnk · 2024-10-10T14:18:38Z

Thanks @joshwlambert, that sounds like a good idea to me.

jamesmbaazam · 2024-10-10T15:34:13Z

@joshwlambert I believe it's a good idea, but I would recommend pivoting the focus to highlight the application of epichains for estimating outbreak sizes based on offspring distribution scenarios. This way, the emphasis will be on the application rather than the function itself. The multi-simulation pattern is already outlined in the COVID-19 vignette, so we can build on that pattern as an application. We can in fact combine the idea here with that outlined in #77. The same simulation can be used to achieve both outcomes.

Let me know what you think.

joshwlambert · 2024-10-14T10:00:32Z

I would recommend pivoting the focus to highlight the application of epichains for estimating outbreak sizes based on offspring distribution scenarios. This way, the emphasis will be on the application rather than the function itself.

Yes, I agree, I plan to remove the simulate_scenarios() functions and instead use the code from the function body as a script to explain to the reader how to explore distributions of outbreak size and length.

I will make the changes in a new branch and open a new PR and then this PR can be closed without merging.

jamesmbaazam · 2024-10-14T10:15:47Z

I would recommend pivoting the focus to highlight the application of epichains for estimating outbreak sizes based on offspring distribution scenarios. This way, the emphasis will be on the application rather than the function itself.

Yes, I agree, I plan to remove the simulate_scenarios() functions and instead use the code from the function body as a script to explain to the reader how to explore distributions of outbreak size and length.

I will make the changes in a new branch and open a new PR and then this PR can be closed without merging.

Perfect!

joshwlambert added 2 commits August 23, 2024 11:24

add simulate_scenarios function

e1b6891

add unit tests for simulate_scenarios

94674e6

joshwlambert requested a review from jamesmbaazam August 23, 2024 10:30

joshwlambert added the enhancement New feature or request label Aug 23, 2024

jamesmbaazam reviewed Aug 23, 2024

View reviewed changes

tests/testthat/test-simulate.R Outdated Show resolved Hide resolved

jamesmbaazam reviewed Aug 23, 2024

View reviewed changes

tests/testthat/test-simulate.R Outdated Show resolved Hide resolved

jamesmbaazam reviewed Aug 23, 2024

View reviewed changes

jamesmbaazam requested changes Aug 23, 2024

View reviewed changes

jamesmbaazam reviewed Aug 27, 2024

View reviewed changes

sbfnk reviewed Sep 4, 2024

View reviewed changes

sbfnk mentioned this pull request Sep 4, 2024

Allow multiple runs of simulations #41

Open

Reduce parameter space in simulate_scenarios() tests

0e87e5e

Co-authored-by: James Azam <[email protected]>

joshwlambert added 5 commits September 6, 2024 17:07

add include_index_case argument in simulate_scenarios

6237c9b

rm browser()

ce7c1c9

allow arguments to be passed to cut() via ... in simulate_scenarios

4811ac3

update test expectation

d5d1d38

calculate and set stat_threshold and add progress bar to simulate_sce…

9348ed7

…narios

revert back to n_chains = 1000 in simulate_scenarios

cabc0e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `simulate_scenarios()` to easily calculate outbreak size distribution #275

Add `simulate_scenarios()` to easily calculate outbreak size distribution #275

joshwlambert commented Aug 23, 2024

codecov-commenter commented Aug 23, 2024 •

edited

Loading

jamesmbaazam Aug 23, 2024

jamesmbaazam Aug 23, 2024

jamesmbaazam Aug 23, 2024 •

edited

Loading

sbfnk Sep 4, 2024

joshwlambert Sep 6, 2024

jamesmbaazam Aug 23, 2024

joshwlambert Sep 6, 2024

jamesmbaazam Aug 23, 2024

sbfnk Sep 4, 2024

jamesmbaazam Aug 23, 2024

jamesmbaazam left a comment

jamesmbaazam Aug 27, 2024

joshwlambert Sep 6, 2024

sbfnk commented Sep 4, 2024

sbfnk Sep 4, 2024

joshwlambert Sep 6, 2024

sbfnk Sep 4, 2024

joshwlambert Sep 6, 2024

sbfnk Sep 4, 2024

sbfnk Sep 4, 2024

joshwlambert commented Sep 6, 2024

joshwlambert commented Oct 10, 2024

sbfnk commented Oct 10, 2024

jamesmbaazam commented Oct 10, 2024

joshwlambert commented Oct 14, 2024

jamesmbaazam commented Oct 14, 2024

Add simulate_scenarios() to easily calculate outbreak size distribution #275

Are you sure you want to change the base?

Add simulate_scenarios() to easily calculate outbreak size distribution #275

Conversation

joshwlambert commented Aug 23, 2024

codecov-commenter commented Aug 23, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmbaazam Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmbaazam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbfnk commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshwlambert commented Sep 6, 2024

joshwlambert commented Oct 10, 2024

sbfnk commented Oct 10, 2024

jamesmbaazam commented Oct 10, 2024

joshwlambert commented Oct 14, 2024

jamesmbaazam commented Oct 14, 2024

Add `simulate_scenarios()` to easily calculate outbreak size distribution #275

Add `simulate_scenarios()` to easily calculate outbreak size distribution #275

codecov-commenter commented Aug 23, 2024 •

edited

Loading

jamesmbaazam Aug 23, 2024 •

edited

Loading