i2: support multiple categorical covariates #4

hillalex · 2024-07-29T15:06:08Z

Closes #2

This PR:

rewrites the logic to build the covariate lookup table to support multiple regression covariates
adds test cases of the above logic
amends fit processing logic to support the case where there are no regression variables

I've also added a couple of files to the testthat directory which can be used to manually run the model with no and multiple covariates and plot the results, to sanity check the results. I got the following plots using the Delta wave data, but @thimotei will be better placed to judge whether these look reasonable or not:

With regression model ~0 + infection_history:last_vax_type

With no regression model:

The graphs of the individual trajectories look a bit different betwen the no and the multiple covariate models. I would expect them to be almost identical, but I assume this is just because I didn't run the models for very long (see manual-test-x.R files, both were run with

mod$fit(chains = 4,
        parallel_chains = 4,
        iter_warmup = 50,
        iter_sampling = 200,
        threads_per_chain = 4)

Otherwise these do look reasonable to my untrained eye...

codecov · 2024-07-29T15:17:37Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

hillalex · 2024-07-31T19:03:43Z

R/scova.R

@@ -63,41 +63,13 @@ scova <- R6::R6Class(
      # Identify columns with no variance and remove them
      variance_per_column <- apply(mm, 2, var)
      relevant_columns <- which(variance_per_column != 0)
-      mm_reduced <- mm[, relevant_columns]
+      mm_reduced <- mm[, relevant_columns, drop = FALSE]


just in case this becomes a 1 column matrix, stop it from being converted to a vector with colnames removed

hillalex · 2024-07-31T19:04:41Z

tests/testthat/plot.R

@@ -0,0 +1,71 @@
+library(ggplot2)


added this file just for manual testing

hillalex · 2024-07-31T21:58:11Z

Re the issue I mentioned with the resampling hitting R's maximum vector limit: I think it was just a coincidence that I saw this for the first time when running without a regression model. I must have just taken more draws or something when I saw the error.

I'll raise a separate issue for testing the resampling more thoroughly, and re-writing in a way that uses less memory if neccessary (e.g. splitting up the vector and doing the resampling one group at a time). Based on our chat earlier it seems like we might want to look into the resampling anyway, as it might not be working exactly as intended.

hillalex · 2024-08-01T12:55:48Z

R/scova.R

-      # Reorder columns to have 'i' first
-      data.table::setcolorder(dt, "p")
-      private$covariate_lookup_table <- dt
+      private$covariate_lookup_table <- build_covariate_lookup_table(private$data,


moved logic out to own function, mostly for ease of unit testing (but also could be re-used if/when we add CT model)

thimotei

Looks good overall! You would expect the results to be slightly different between the models with and without covariates, as they have different statistical model structures. But they look different in exactly the way you'd expect, so I think its working perfectly.

thimotei · 2024-10-03T09:44:09Z

R/scova.R

@@ -228,7 +216,8 @@ scova <- R6::R6Class(
    #' for required columns: \code{vignette("data", package = "epikinetics")}.
    #' @param file_path Optional file path to model inputs in CSV format. One of data or file must be provided.
    #' @param priors Object of type \link[epikinetics]{scova_priors}. Default scova_priors().
-    #' @param covariate_formula Formula specifying hierarchical structure of model. Default ~0.
+    #' @param covariate_formula Formula specifying linear regression model. Note all variables in the formula
+    #' will be treated as categorical variables. Default ~0.


So, I might be missing something, but in principle there is no need to treat covariates as categorical. There is certainly a need for continuous variables for the Ct version of the model and I can imagine others wanting continuous variables for this model possibly. It shouldn't change the overall regression parameters. It might make the covariate name recovery a bit more strange, and perhaps the "bespoke" code I wrote for this model might need some tweaking. But we should aim to include continuous variables at some point too.

Yeah, totally agree that we should accept these in principle, will just be a bit fiddly to get that working so figured this was a reasonable intermediate step. Will add an issue 👍

working with multiple covariates

161da12

hillalex changed the title ~~i2: support multiple covariates~~ i2: support multiple categorical covariates Jul 30, 2024

hillalex added 4 commits July 30, 2024 13:42

don't reduce matrix to vector

f1fa3df

support no covariates

bfaaa9e

plot functions for manual model checking

673a36b

fix col ordering

f0e2cae

hillalex commented Jul 31, 2024

View reviewed changes

tests/testthat/plot.R Outdated

@@ -0,0 +1,71 @@

library(ggplot2)

Copy link

Member Author

hillalex Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this file just for manual testing

manual test files

07df22f

hillalex added 2 commits July 31, 2024 23:04

update docs and remove test method

9280ceb

add log

44541ea

hillalex marked this pull request as ready for review July 31, 2024 22:28

hillalex requested a review from thimotei July 31, 2024 22:28

hillalex commented Aug 1, 2024

View reviewed changes

typo

08c4c38

thimotei approved these changes Oct 3, 2024

View reviewed changes

merged main

35fbfe0

hillalex mentioned this pull request Oct 3, 2024

Support continuous covariates #12

Open

hillalex merged commit 3960395 into main Oct 3, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i2: support multiple categorical covariates #4

i2: support multiple categorical covariates #4

hillalex commented Jul 29, 2024 •

edited

Loading

codecov bot commented Jul 29, 2024

hillalex Jul 31, 2024

hillalex Jul 31, 2024

hillalex commented Jul 31, 2024 •

edited

Loading

hillalex Aug 1, 2024

thimotei left a comment

thimotei Oct 3, 2024

hillalex Oct 3, 2024

i2: support multiple categorical covariates #4

i2: support multiple categorical covariates #4

Conversation

hillalex commented Jul 29, 2024 • edited Loading

codecov bot commented Jul 29, 2024

Welcome to Codecov 🎉

hillalex Jul 31, 2024

Choose a reason for hiding this comment

hillalex Jul 31, 2024

Choose a reason for hiding this comment

hillalex commented Jul 31, 2024 • edited Loading

hillalex Aug 1, 2024

Choose a reason for hiding this comment

thimotei left a comment

Choose a reason for hiding this comment

thimotei Oct 3, 2024

Choose a reason for hiding this comment

hillalex Oct 3, 2024

Choose a reason for hiding this comment

hillalex commented Jul 29, 2024 •

edited

Loading

hillalex commented Jul 31, 2024 •

edited

Loading