-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simulate_scenarios()
to easily calculate outbreak size distribution
#275
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #275 +/- ##
===========================================
- Coverage 100.00% 99.87% -0.13%
===========================================
Files 8 8
Lines 755 814 +59
===========================================
+ Hits 755 813 +58
- Misses 0 1 +1 ☔ View full report in Codecov by Sentry. |
tests/testthat/test-simulate.R
Outdated
breaks = c(0, 5, 10, 20, Inf) | ||
) | ||
expect_s3_class(df, class = "data.frame") | ||
expect_identical(dim(df), c(240L, 6L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the hardcoding here and rather find the dimensions of the result of an expand.grid()
.
stat_threshold = 15 | ||
) | ||
expect_s3_class(df, class = "data.frame") | ||
expect_identical(dim(df), c(240L, 6L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before on removing the hardcoding to make the scenarios amenable to changes in the inputs.
for (i in seq_len(nrow(scenarios))) { | ||
args <- list( | ||
n_chains = 1000, | ||
offspring_dist = match.fun(scenarios[i, "offspring_dist"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to consider allowing string function names so that we don't have to do match.fun
as it's not ideal. I'll go back through previous considerations in #167 to remind myself why we moved away from character function names as there was a lot of discussion in the early days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ask the other way round, why not use functions here instead of character strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It relates the the expand.grid()
issue stated above. We didn't spend very long trying to find work arounds, so it may be the case that we can use the same function interface in simulate_scenarios()
.
args <- c(args, args_) | ||
} | ||
args <- utils::modifyList(x = args, val = list(...)) | ||
x <- do.call(epichains::simulate_chain_stats, args = args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one point of generalisation where we can pass either simulate_chains()
or simulate_chain_stats()
and the class of result will depend on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
R/simulate.R
Outdated
interval <- cut(x, breaks = breaks) | ||
prop <- table(interval) / sum(table(interval)) | ||
df_ <- as.data.frame(prop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only fulfills a single use case and we may want to not do this here and leave it to the user. This function could just be about simulating the outbreak over a grid of scenarios and returning the results for downstream analyses/post-processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
@@ -477,3 +477,107 @@ simulate_chain_stats <- function(n_chains, | |||
|
|||
return(out) | |||
} | |||
|
|||
#' Calculate transmission chain statistics across a range of scenarios and | |||
#' divide into intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a comment below on the fact that the part about dividing into intervals is just one use case and so this function should probably return the raw results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this, Josh. I've left some quick comments for now as this is a living PR for the work in https://github.com/jamesmbaazam/h5n1_uk_scenario_modelling/. As discussed in person, we will have to generalise this beyond just poison and negative binomial. Looking forward to collaborating on this.
R/simulate.R
Outdated
R_seq, | ||
k_seq, | ||
breaks, | ||
...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this function simulates only one run for each scenario, so we will need to allow for multiple runs here. This is related to #41. I'm torn between having it here versus in the main simulation functions because users may not want to explore scenarios using this function but may want to run the main functions multiple times for one parameter set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll follow your lead on where you think this is best to implement.
Thanks @joshwlambert, this looks like useful functionality! I think there's potentially a broader question (to which I don't know the answer) of whether this kind of functionality should be offered by a package as it's essentially a loop over package function calls. In |
#' The `offspring_dist` argument is not equivalent to the `offspring_dist` | ||
#' argument in [simulate_chains()] and [simulate_chain_stats()], in those | ||
#' functions `offspring_dist` should be a `<function>`, whereas in | ||
#' `simulate_scenarios()` `offspring_dist` should be a vector of `character` | ||
#' strings which match function names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for deviating from the package architecture here? If this is to become part of the package I think we should follow a consistent design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was due to an issue with expand.grid()
and functions. I can't remember the exact issue when we were working on it, but it was something like this:
R_seq = seq(0.1, 1, 0.1)
k_seq = seq(0.1, 0.5, 0.1)
offspring_dist = rpois
statistic = c("size", "length")
scenarios <- expand.grid(
offspring_dist = offspring_dist,
statistic = statistic,
R = R_seq,
k = k_seq,
stringsAsFactors = FALSE
)
#> Error in paste0(nmc[i], "=", if (is.numeric(x)) format(x) else x): cannot coerce type 'closure' to vector of type 'character'
Created on 2024-09-06 with reprex v2.1.0
#' @param R_seq A `numeric` vector of reproduction number values to simulate | ||
#' the branching process for. | ||
#' @param k_seq A `numeric` vector of dispersion values to simulate the | ||
#' branching process for. Only applicable for `offspring_dist = "rnbinom"`. | ||
#' @inheritParams base::cut |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call them R
and k
? They don't generally have to be a sequence here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can be renamed.
for (i in seq_len(nrow(scenarios))) { | ||
args <- list( | ||
n_chains = 1000, | ||
offspring_dist = match.fun(scenarios[i, "offspring_dist"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ask the other way round, why not use functions here instead of character strings?
R/simulate.R
Outdated
interval <- cut(x, breaks = breaks) | ||
prop <- table(interval) / sum(table(interval)) | ||
df_ <- as.data.frame(prop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
Co-authored-by: James Azam <[email protected]>
I think before continuing development of this function it would be good to decide if we want this functionality as a function, or via documentation. I think the code in the function body could be used to write a vignette demonstrating how to simulate scenarios and then group them. This also relates to @jamesmbaazam's point:
To me half the utility of this function comes from it wrapping As I'm just contributing, I'm happy to go with what you both (@jamesmbaazam & @sbfnk) prefer and then I'm happy to work on it. |
I think this function should be converted into a vignette. @jamesmbaazam & @sbfnk please let me know if you agree and how you'd like to proceed. I'm happy to draft the vignette and then can assign you both to review, edit and then merge if deemed suitable. |
Thanks @joshwlambert, that sounds like a good idea to me. |
@joshwlambert I believe it's a good idea, but I would recommend pivoting the focus to highlight the application of epichains for estimating outbreak sizes based on offspring distribution scenarios. This way, the emphasis will be on the application rather than the function itself. The multi-simulation pattern is already outlined in the COVID-19 vignette, so we can build on that pattern as an application. We can in fact combine the idea here with that outlined in #77. The same simulation can be used to achieve both outcomes. Let me know what you think. |
Yes, I agree, I plan to remove the I will make the changes in a new branch and open a new PR and then this PR can be closed without merging. |
Perfect! |
This PR adds the$R$ and $k$ values for different offspring distributions and bins the resulting transmission chain statistics into user-specified intervals to enable them to be easily plotted.
simulate_scenarios()
function which enables the exploration of parameter space ofNote: This is a work-in-progress feature (hence the draft pull request). Before merging we need to:
simulate_chain_stats()
simulate_scenarios()
withsimulate_chain_stats()
then most, if not all, of the input checking can be deferred to that function).