-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i2: support multiple categorical covariates #4
Conversation
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
R/scova.R
Outdated
@@ -63,41 +63,13 @@ scova <- R6::R6Class( | |||
# Identify columns with no variance and remove them | |||
variance_per_column <- apply(mm, 2, var) | |||
relevant_columns <- which(variance_per_column != 0) | |||
mm_reduced <- mm[, relevant_columns] | |||
mm_reduced <- mm[, relevant_columns, drop = FALSE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case this becomes a 1 column matrix, stop it from being converted to a vector with colnames removed
tests/testthat/plot.R
Outdated
@@ -0,0 +1,71 @@ | |||
library(ggplot2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this file just for manual testing
Re the issue I mentioned with the resampling hitting R's maximum vector limit: I think it was just a coincidence that I saw this for the first time when running without a regression model. I must have just taken more draws or something when I saw the error. I'll raise a separate issue for testing the resampling more thoroughly, and re-writing in a way that uses less memory if neccessary (e.g. splitting up the vector and doing the resampling one group at a time). Based on our chat earlier it seems like we might want to look into the resampling anyway, as it might not be working exactly as intended. |
R/scova.R
Outdated
# Reorder columns to have 'i' first | ||
data.table::setcolorder(dt, "p") | ||
private$covariate_lookup_table <- dt | ||
private$covariate_lookup_table <- build_covariate_lookup_table(private$data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved logic out to own function, mostly for ease of unit testing (but also could be re-used if/when we add CT model)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! You would expect the results to be slightly different between the models with and without covariates, as they have different statistical model structures. But they look different in exactly the way you'd expect, so I think its working perfectly.
R/scova.R
Outdated
@@ -228,7 +216,8 @@ scova <- R6::R6Class( | |||
#' for required columns: \code{vignette("data", package = "epikinetics")}. | |||
#' @param file_path Optional file path to model inputs in CSV format. One of data or file must be provided. | |||
#' @param priors Object of type \link[epikinetics]{scova_priors}. Default scova_priors(). | |||
#' @param covariate_formula Formula specifying hierarchical structure of model. Default ~0. | |||
#' @param covariate_formula Formula specifying linear regression model. Note all variables in the formula | |||
#' will be treated as categorical variables. Default ~0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I might be missing something, but in principle there is no need to treat covariates as categorical. There is certainly a need for continuous variables for the Ct version of the model and I can imagine others wanting continuous variables for this model possibly. It shouldn't change the overall regression parameters. It might make the covariate name recovery a bit more strange, and perhaps the "bespoke" code I wrote for this model might need some tweaking. But we should aim to include continuous variables at some point too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, totally agree that we should accept these in principle, will just be a bit fiddly to get that working so figured this was a reasonable intermediate step. Will add an issue 👍
Closes #2
This PR:
I've also added a couple of files to the
testthat
directory which can be used to manually run the model with no and multiple covariates and plot the results, to sanity check the results. I got the following plots using the Delta wave data, but @thimotei will be better placed to judge whether these look reasonable or not:With regression model ~0 + infection_history:last_vax_type
With no regression model:
The graphs of the individual trajectories look a bit different betwen the no and the multiple covariate models. I would expect them to be almost identical, but I assume this is just because I didn't run the models for very long (see
manual-test-x.R
files, both were run withOtherwise these do look reasonable to my untrained eye...