Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i2: support multiple categorical covariates #4

Merged
merged 10 commits into from
Oct 3, 2024
Merged

i2: support multiple categorical covariates #4

merged 10 commits into from
Oct 3, 2024

Conversation

hillalex
Copy link
Member

@hillalex hillalex commented Jul 29, 2024

Closes #2

This PR:

  • rewrites the logic to build the covariate lookup table to support multiple regression covariates
  • adds test cases of the above logic
  • amends fit processing logic to support the case where there are no regression variables

I've also added a couple of files to the testthat directory which can be used to manually run the model with no and multiple covariates and plot the results, to sanity check the results. I got the following plots using the Delta wave data, but @thimotei will be better placed to judge whether these look reasonable or not:

With regression model ~0 + infection_history:last_vax_type

multipleplot1
multipleplot2
multiple3

With no regression model:

nocovariates1
plot2
nocovariates3

The graphs of the individual trajectories look a bit different betwen the no and the multiple covariate models. I would expect them to be almost identical, but I assume this is just because I didn't run the models for very long (see manual-test-x.R files, both were run with

mod$fit(chains = 4,
        parallel_chains = 4,
        iter_warmup = 50,
        iter_sampling = 200,
        threads_per_chain = 4)

Otherwise these do look reasonable to my untrained eye...

Copy link

codecov bot commented Jul 29, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@hillalex hillalex changed the title i2: support multiple covariates i2: support multiple categorical covariates Jul 30, 2024
R/scova.R Outdated
@@ -63,41 +63,13 @@ scova <- R6::R6Class(
# Identify columns with no variance and remove them
variance_per_column <- apply(mm, 2, var)
relevant_columns <- which(variance_per_column != 0)
mm_reduced <- mm[, relevant_columns]
mm_reduced <- mm[, relevant_columns, drop = FALSE]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case this becomes a 1 column matrix, stop it from being converted to a vector with colnames removed

@@ -0,0 +1,71 @@
library(ggplot2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this file just for manual testing

@hillalex
Copy link
Member Author

hillalex commented Jul 31, 2024

Re the issue I mentioned with the resampling hitting R's maximum vector limit: I think it was just a coincidence that I saw this for the first time when running without a regression model. I must have just taken more draws or something when I saw the error.

I'll raise a separate issue for testing the resampling more thoroughly, and re-writing in a way that uses less memory if neccessary (e.g. splitting up the vector and doing the resampling one group at a time). Based on our chat earlier it seems like we might want to look into the resampling anyway, as it might not be working exactly as intended.

@hillalex hillalex marked this pull request as ready for review July 31, 2024 22:28
@hillalex hillalex requested a review from thimotei July 31, 2024 22:28
R/scova.R Outdated
# Reorder columns to have 'i' first
data.table::setcolorder(dt, "p")
private$covariate_lookup_table <- dt
private$covariate_lookup_table <- build_covariate_lookup_table(private$data,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved logic out to own function, mostly for ease of unit testing (but also could be re-used if/when we add CT model)

Copy link
Contributor

@thimotei thimotei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! You would expect the results to be slightly different between the models with and without covariates, as they have different statistical model structures. But they look different in exactly the way you'd expect, so I think its working perfectly.

R/scova.R Outdated
@@ -228,7 +216,8 @@ scova <- R6::R6Class(
#' for required columns: \code{vignette("data", package = "epikinetics")}.
#' @param file_path Optional file path to model inputs in CSV format. One of data or file must be provided.
#' @param priors Object of type \link[epikinetics]{scova_priors}. Default scova_priors().
#' @param covariate_formula Formula specifying hierarchical structure of model. Default ~0.
#' @param covariate_formula Formula specifying linear regression model. Note all variables in the formula
#' will be treated as categorical variables. Default ~0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I might be missing something, but in principle there is no need to treat covariates as categorical. There is certainly a need for continuous variables for the Ct version of the model and I can imagine others wanting continuous variables for this model possibly. It shouldn't change the overall regression parameters. It might make the covariate name recovery a bit more strange, and perhaps the "bespoke" code I wrote for this model might need some tweaking. But we should aim to include continuous variables at some point too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally agree that we should accept these in principle, will just be a bit fiddly to get that working so figured this was a reasonable intermediate step. Will add an issue 👍

@hillalex hillalex merged commit 3960395 into main Oct 3, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple covariates not working
2 participants