Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionality for time varying CFR and different delays depending on outcome #36

Closed
adamkucharski opened this issue Dec 14, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@adamkucharski
Copy link
Member

adamkucharski commented Dec 14, 2023

Recent discussion with @ntncmch and @PaulC91 identified a couple of situations where having simulated data could be useful to test different approaches to fatality risk estimation, with a focus on simple/fast methods that could be included in packages like epishiny.

A. Cholera-like outbreak.
Typical situation: Recoveries not recorded, but deaths are, so have to make an assumption (e.g. missing = recovered).
Methodological problem: May not be accurate if not all missing entries are recovered, and in real-time don’t have known outcomes yet for all cases. Also, may not know onset-to-outcome delays for a new outbtreak (this is related to more detailed analysis work here in epidist).
Methodological solutions to test: compare ‘naive’ estimate calculating deaths/cases among individuals with known outcomes with 1) a ‘filtered’ approach that constructs a cohort with same onset times and filters out the most recent onset times (e.g. remove individuals with onset dates in last 1-2 weeks). Could also use this to calculate the delays themselves; 2) estimate calculating deaths/cases among individuals with known outcomes, assuming missing = recovered and 3) input into CFR as incidence (because then can use the functionality which estimates proportion of cases with known outcomes at present day).

B. Ebola-like outbreak.
Typical situation: delay to death shorter than delay to recovery, which can bias recent estimates.
Problem: Will bias recent estimates of CFR, because stratifying individual data on known outcome when estimating CFR from line-list data assumes onset-to-death = onset-to-recovery
Methodological solutions to test: compare 'naive’ estimate with (1) and (3) above.

Summary of simulist functionality that would required for both of the above use cases: 1) Ability to simulate using diferent onset-to-death and onset-to-recovery and 2) Ability for the 'true' CFR to vary over time in the simulation, to check can recover correct values in a simple estimation method.

@joshwlambert
Copy link
Member

Currently the outcome is given in the date_death column, meaning the outcomes are either death, in which case a date is provided, or recovered, in which case NA is stated (thus no date of recovery).

Given the features outlined above require the time of recovery to be explicitly given in the line list, what is the preferred format of the data?

  1. One column called outcome containing dead,recovered, etc. and one called date_outcome containing the date of the corresponding event.
  2. One column called date_death (already in the current {simulist} version) and another called date_recovered each with a date or NA for each individual.
  3. Another format.

Some comparisons to other line lists:

  • Global.health Ebola and Mpox line lists have Outcome, Date_death, Date_recovered columns
  • The Ebola line list in {outbreaks} has date_of_outcome and outcome columns
  • The MERS line list in {outbreaks} has outcome and dt_death (where recovered individuals are NA)

Please let me know your preference. Once we have a general consensus I will start implementing the updates.

@adamkucharski
Copy link
Member Author

Most real line lists I've seen use format (2) (although @ntncmch and @PaulC91 may have better perspective on this!)

Can't think of major benefit of choosing (1) if (2) is easier to add to current format, given it's a binary outcome (dead/recovered) and both approaches require two columns.

@PaulC91
Copy link

PaulC91 commented Mar 29, 2024

Our linelists usually use the 1st format with outcome and date_of_outcome variables. I think the benefit is that you can easily filter or group by outcome, which would be less straightforward with just 2 date variables for death and recovery. We also often have many outcome options so having a separate date column for each is less efficient.

@joshwlambert
Copy link
Member

PR #99 adds the onset-to-recovery delays and the $outcome and $date_outcome columns to the resulting line list data.

I will leave this PR open for a few days for comments and then will merge it into the main branch.

@joshwlambert
Copy link
Member

I'm currently implementing the time-varying case fatality risk. One questions is how to setup the death-risk, a couple of options:

  1. The time-varying death risk is the risk of dying given the time you were infected
  2. The time-varying death risk is the risk of dying given the time you would have died

Both options can feasibly be implemented. So let me know which is preferable in most use cases.

@joshwlambert
Copy link
Member

The first implementation of time-varying case fatality risk uses the first option which can be found in the time-varying-cfr branch: https://github.com/epiverse-trace/simulist/blob/time-varying-cfr/R/add_cols.R#L147-L170. This can be changed so feedback is still welcome.

@adamkucharski
Copy link
Member Author

Thanks for sharing. On the above edit, how does the normalised function interact with the overall CFR? It seems like an intuitive option for a user would be to directly define the CFR over time as a model input (e.g. cfr <- function(t) ifelse(t>100,0.01,0.2)). Maybe I've misunderstood, but in current implementation would they need to do some additional work to consider how the function will interact with the normalisation (e.g. code would effectively implement cfr_norm <- function(t) ifelse(t>100,0.05,1)).

On the question of time-varying death at point of infection or death, could see use case for either end of the spectrum, e.g. vaccination pre-defining risk at point of infection, like an Ebola campaign, or treatment quality influence outcome later in infection. However, the latter would typically have the intervention working before point of death (e.g. treatment once ill or hospitalisations). So perhaps cleaner for use the time of infection assumption, as this has a direct link to vaccination or early treatment quality.

Generally if delay from onset-to-death is quite long, it would be tricky to estimate sudden changes in CFR (just as it's hard to estimate changes in $R$ on timescales shorter than generation time). So maybe of most interest is ability to have 'eras' of different severity, such as periods of weeks of a given severity during a months long epidemic?

@joshwlambert
Copy link
Member

how does the normalised function interact with the overall CFR? It seems like an intuitive option for a user would be to directly define the CFR over time as a model input (e.g. cfr <- function(t) ifelse(t>100,0.01,0.2))

The overall CFR is an interaction of the hospitalised death risk (hosp_death_risk argument) and non-hospitalised death risk (non_hosp_death_risk argument) with the time-varying CFR function if provided. If not provided the CFR is assumed constant through time with risks equal to the hosp_death_risk and non_hosp_death_risk for the respective groups. If a time-varying CFR function is provided then the overall CFR is the interactions of the hosp_death_risk and non_hosp_death_risk with the function provided by the user. The time-varying function changes the interpretation of the death risk from constant values to maximum values which can be decreased when the time-varying function is not at its maximum value (e.g. an exponential decline at $t = 0$ the risk will be equal to the user input, but will decline when $t &gt; 0$).

The other possible implementation for this feature is to allow the hosp_death_risk and non_hosp_death_risk arguments to accept either numbers (as it currently does) as well as anonymous functions. I could then normalise these functions internally to ensure the risk is $[0, 1]$. If I understand correctly this second implementation would be more similar to you cfr <- function(t) ifelse(t>100,0.01,0.2) idea?

A more full explanation of the first implementation can be found here: https://github.com/epiverse-trace/simulist/blob/time-varying-cfr/vignettes/time-varying-cfr.Rmd (can be best read if you clone the package and run pkgdown::build_site() and look at the Time-varying case fatality risk vignette).

So perhaps cleaner for use the time of infection assumption, as this has a direct link to vaccination or early treatment quality.

Okay, I'll keep with the time-varying function $f(t)$ using $t$ as the time of infection for now, but we can change this in later versions or give users the option to choose if the need arises.

So maybe of most interest is ability to have 'eras' of different severity, such as periods of weeks of a given severity during a months long epidemic?

There is some early exploration in https://github.com/epiverse-trace/simulist/blob/time-varying-cfr/vignettes/time-varying-cfr.Rmd on "eras" of severity using stepwise CFR functions, seems to produce sensible results.

@adamkucharski
Copy link
Member Author

That makes sense – given risks are separately defined (i.e. P(hospitalisation) and hospitalisation death risk), a defined CFR would probably abstract too much away from user. So just to understand, if there is a time varying function defined, then at the peak normalised value, the risk would equal hosp_death_risk etc? If so, that seems sensible.

@joshwlambert
Copy link
Member

joshwlambert commented Apr 12, 2024

So just to understand, if there is a time varying function defined, then at the peak normalised value, the risk would equal hosp_death_risk etc?

Yes exactly. The same is true for non_hosp_death_risk.

However, your earlier comment got me thinking about the possibility of passing functions directly to hosp_death_risk and non_hosp_death_risk. This would have the added benefit of being able to specify different time-varying CFR functions for the fatality risk inside and outside of hospitals (quite an advanced use case but might still be useful). Let me know what you think. Also tagging @Bisaloo & @CarmenTamayo as this is mainly a usability decision.

Edit: keeping this discussion open, but deciding to go with the first implementation for now to get this feature integrated into the package. I had overlooked that the hosp_death_risk and non_hosp_death_risk can also take <data.frame>s with the age-stratified risks, so the time-varying function needs to be in config (for now at least) to work with age-stratified risks.

@joshwlambert
Copy link
Member

Second PR introducing time-varying cfr to the package now live #101. Comments and suggestions welcomed.

@joshwlambert
Copy link
Member

Once PRs #99 and #101 are merged I will close this issue. The two case studies that can now be carried out with these new features will continue to be tracked in epiverse-trace/howto#37.

joshwlambert added a commit that referenced this issue Apr 25, 2024
…sim_input to return compound warnings or errors, WIP #36
joshwlambert added a commit that referenced this issue Apr 25, 2024
joshwlambert added a commit that referenced this issue Apr 25, 2024
@joshwlambert
Copy link
Member

Closing as PRs #99 and #101 have been merged. Thanks for all the input from everyone on this issue and to @adamkucharski, @ntncmch and @PaulC91 for raising the feature requests.

@adamkucharski
Copy link
Member Author

Also adding this PR for reference, that shows how these two situations can to some extent be addressed with existing {cfr} functionality: epiverse-trace/cfr#170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants