From 2ceeff461313341f1e3f5cf81870231a0df05d23 Mon Sep 17 00:00:00 2001 From: Thomas Debray <117118104+tdebray123@users.noreply.github.com> Date: Mon, 20 Nov 2023 18:32:50 +0100 Subject: [PATCH] update code --- _freeze/chapter_09/execute-results/html.json | 10 +- .../figure-html/hom miss exploration-1.png | Bin 0 -> 19323 bytes .../figure-html/hom miss exploration-2.png | Bin 0 -> 24072 bytes chapter_03.qmd | 17 ++ chapter_06.qmd | 17 ++ chapter_09.qmd | 240 +++++++++++++++++- .../figure-html/hom miss exploration-1.png | Bin 0 -> 19323 bytes .../figure-html/hom miss exploration-2.png | Bin 0 -> 24072 bytes .../figure-html/homimp convergency-1.png | Bin 0 -> 61260 bytes .../figure-html/homimp convergency-2.png | Bin 0 -> 39042 bytes docs/chapter_09.html | 223 +++++++++++++++- .../figure-html/hom miss exploration-1.png | Bin 0 -> 19323 bytes .../figure-html/hom miss exploration-2.png | Bin 0 -> 24072 bytes docs/search.json | 30 +++ docs/site_libs/bootstrap/bootstrap.min.css | 2 +- 15 files changed, 531 insertions(+), 8 deletions(-) create mode 100644 _freeze/chapter_09/figure-html/hom miss exploration-1.png create mode 100644 _freeze/chapter_09/figure-html/hom miss exploration-2.png create mode 100644 chapter_09_files/figure-html/hom miss exploration-1.png create mode 100644 chapter_09_files/figure-html/hom miss exploration-2.png create mode 100644 chapter_09_files/figure-html/homimp convergency-1.png create mode 100644 chapter_09_files/figure-html/homimp convergency-2.png create mode 100644 docs/chapter_09_files/figure-html/hom miss exploration-1.png create mode 100644 docs/chapter_09_files/figure-html/hom miss exploration-2.png create mode 100644 docs/search.json diff --git a/_freeze/chapter_09/execute-results/html.json b/_freeze/chapter_09/execute-results/html.json index 7fb77aa..64e6c84 100644 --- a/_freeze/chapter_09/execute-results/html.json +++ b/_freeze/chapter_09/execute-results/html.json @@ -1,12 +1,16 @@ { - "hash": "a39ee86535897e320528de0cd49c1a8f", + "hash": "ad94d5bc07cb343bd0ced1408f0b3e7a", "result": { - "markdown": "---\ntitle: \"Dealing with missing data\"\nauthors: \n - name: Johanna Munoz\n affiliations:\n - ref: julius\n - name: Thomas Debray\n orcid: 0000-0002-1790-2719\n affiliations:\n - ref: smartdas\naffiliations:\n - id: smartdas\n name: Smart Data Analysis and Statistics B.V.\n city: Utrecht\n - id: julius\n name: Julius Center for Health Sciences and Primary Care\n city: Utrecht\nformat:\n html:\n toc: true\n number-sections: true\nexecute:\n cache: true\n---\n\n\n\n\n## Main Analysis\n\nThe main objective of this analysis is to assess whether the number of episodes (y) occurring within specific time periods (years) differs between the treatment groups (1: DMF and 0: TERI). To address potential confounding factors, the researchers consider variables such as patient age, the log of premedical cost (`logPremedicalcost`), previous DMT efficacy (`prevDMTefficacy`), and the number of episodes in previous relapses (prerelapseNum).\n\nWhen estimating treatment effects from observational data, an assumption is made that the patient populations in both treatment groups are as similar as possible. Various methods for balancing data across treatment groups are proposed, including matching, inverse propensity weighting, stratification, and regression adjustment.\n\nIn this case, the focus is specifically on the matching method, which offers advantages over regression adjustment by potentially alleviating issues related to model mis-specification. This includes addressing non-linear relationships between certain confounders and the outcome variable and accounting for treatment effects that may depend on specific confounders (treatment-confounder interaction terms). Propensity scores are used to match subjects in the treatment groups.\n\nMoreover, intentionally introducing incomplete covariate variables in this example adds complexity to the propensity score estimation. Depending on the propensity score estimation technique employed, it may be necessary to incorporate an imputation step. For instance, logistic regression estimation requires complete data for all observations, while XGBoost is robust to missing data \\cite{zhao_propensity_2021}.\n\nTo estimate marginal treatment effects, the g-computation method is employed \\cite{snowden_implementation_2011}. This method involves specifying a model for the outcome dependent on the treatment and covariates. The potential outcomes, i.e., the predicted values of the outcome on treatment ($y_i^1$) and control ($y_i^0$) for each sample unit $i$, are estimated. The marginal treatment effect is then calculated by contrasting the averaged estimated potential outcomes.\n\n\nIn this example, we consider the estimation of comparative treatment effects in the absence of treatment-effect heterogeneity.\n\n\n## Estimation workflow\n\nThe proposed workflow consists of the following steps:\n\n\n\n1. **Data Exploration:** In this step, we examine the observed data to comprehend the variables within the dataset. Our primary focus lies on identifying missing patterns and relationships among observed variables, including missing indicator variables and others. This exploration aids in discerning the most plausible missing mechanisms and suitable imputation techniques. Additionally, field experts' insights may be incorporated to enhance understanding of the missing process, potentially considering MNAR assumptions.\n2. **Imputation:** It is essential to evaluate whether the imputation procedure is necessary or if simpler methods, such as complete case analysis, are more suitable. In case imputation procedures are required, selecting plausible imputation methods that align with the main model analysis is crucial. This involves choosing individual imputation methods for each incomplete variable, determining the predictor variables on the imputation model. Pre_imputation (where imputation values can be deterministically derived from other variables) and Post-imputation (e.g.ensuring imputed values fall within a reasonable range) steps may also considered.\n3. **Data Balancing:** Several methods, including PS matching or inverse weighting propensity score, can be utilized. It is required to evaluate the balance, which could be done via visual inspection.(eg.cobalt package). In this example, we estimate propensity scores using logistic regression. For most balancing procedures in R, counterparts specifically designed for imputed datasets are available, such as those in the matchthem R package, which includes PS matching and IPW as done in the matchit R package.\n4. **Model Fit:** : It is fit a model to predict the outcomes for each sample unit under each possible treatment value (DMF and TERI), as predictors include the treatment and optionally the baseline covariates and also the propensity score.\n5. **Treatment Estimation & Pooling:** For simplicity in this tutorial, we will use the comparison functions from the R **matchingmethods** package \\cite{arel_marginaleffects_2023}, which can be used for completed data and also from outputs from the imputation process. In the last case, internally the functions calculate the treatment effects on each imputed dataset and pool the estimates using Rubin's Rules.\n\nLet's start by preparing the R environment. All the functions used in this tutorial can be found in the resource file `functions.r`.\n\n\n::: {.cell hash='chapter_09_cache/html/unnamed-chunk-2_abd7974b84b47b8689227ead0f93d1e2'}\n\n```{.r .cell-code}\n# Load the required packages and additional functions\nsource(\"resources/chapter 09/functions.r\") \n```\n:::\n\n\n## Homogeneous Treatment Effect\n\nIn this example, we focus on estimating comparative treatment effects in the absence of heterogeneous treatment effects (HTE).\n\n\n### Generating an Observational Dataset\n\nWe can simulate an observational dataset of $N = 3000$ patients as follows:\n\n\n::: {.cell hash='chapter_09_cache/html/hom gendata_70debf96318908085a01fdf6f56df0ff'}\n\n```{.r .cell-code}\ndata_hom <- generate_data(n = 3000, seed = 1234) \n```\n:::\n\n\nThe **generate_data()** function allows the specification of various treatment effect options, easily adjustable by modifying the beta parameter. In this instance, we assume a consistent treatment effect across the entire target population. This dataset currently contains no missing values.\n\nThe simulated dataset comprises two treatment groups with variations in baseline characteristics. For example, the figure below illustrates baseline imbalances in covariates such as age.\n\n\n::: {.cell layout-align=\"center\" hash='chapter_09_cache/html/hom plotdata_d4575521427b03abd94cd895c6a9de08'}\n::: {.cell-output-display}\n{fig-align='center' width=576}\n:::\n:::\n\n\nWe can calculate the treatment effect on the complete observed dataset. To do this, we start by balancing the dataset using Propensity Score matching. In this case, the propensity score model uses confounder variables only: `age`, `gender`, `prevDMTefficacy`, `logPremedicalcost`, and `prerelapseNum`.\n\n\n::: {.cell hash='chapter_09_cache/html/full1_875cd9ca24a5d04e24790c881d2669b0'}\n\n```{.r .cell-code}\n## Apply Matching on the PS for the ATE\nmF <- matchit( treatment ~ age + gender + prevDMTefficacy + logPremedicalcost + prerelapseNum, \n data = data_hom,\n family = binomial,\n method = \"full\",\n caliper = 0.2,\n estimand = \"ATE\",\n replace = FALSE) \n\n## Extract matched data\nmdata <- match.data(mF)\n```\n:::\n\n\nThen, we proceed to model estimation. In this case, a Poisson model is used with the form:\n\n$$\\begin{eqnarray}count_i\\sim& Poisson(\\lambda_i)\\end{eqnarray}$$ $$\\begin{eqnarray} log(\\lambda_i) &=& \\beta_0 + \\beta_1treatment_i + \\beta_2age + \\beta_3gender \\\\ &+& \\beta_4prevDMTefficacy + \\beta_5logPremedicalcost\n\\\\ &+& \\beta_6prerelapseNum + \\beta_7numSymptoms + offset(log(years)),\\end{eqnarray}$$\n\nSince patient measurements were recorded over varying time frames, the natural logarithm of the years of evaluation is incorporated as an offset in the model. The model is fitted with a glm function, and we include the treatment and the baseline covariates as predictors, which are optional if the data is well balanced. Additionally, it is necessary to specify the matching weights in the glm function.\n\n\n::: {.cell hash='chapter_09_cache/html/full2_d3b47e8ad45febc02ab4765383d4770e'}\n\n```{.r .cell-code}\n# Model fit\nfit_mod <- glm(as.formula(\"y ~ treatment + gender + age + logPremedicalcost + prerelapseNum + prevDMTefficacy + numSymptoms + offset(log(years))\"),\n family = poisson(link = \"log\"),\n data = mdata,\n weights = weights)\n```\n:::\n\n\nTypically, Poisson models adjust standard errors using robust standard errors to accommodate small values arising from the equidispersion assumption. This correction can be directly applied to the model using the vcovCL() function \\cite{zeileis_sandwich_2022}. However, given that we will calculate the treatment effect using the functions of the **marginaleffects** package, this step becomes redundant. This package allows specifying HC3 sandwich standard errors during treatment estimation.\n\nFinally, we calculate the Average Treatment Effect (ATE). The ATE is defined as $$\\tau_{ATE}=E(y_i^1-y_i^0)$$\n\nBut this cannot be directly extracted from the $\\beta_1$ parameter, as the model has $log(\\lambda)$ as the response. We estimate it as: $$\\tau_{ATE}=E(\\lambda^1_i-\\lambda^0_i)$$ This can be done with the function **avg_comparisons()**, from the R package marginaleffect, that calculates the potential outcomes for each unit sample and then combines them to summarize the average effect.\n\n\n::: {.cell hash='chapter_09_cache/html/full3_d2794d19f736d902a2956bbebe7cf2bc'}\n\n```{.r .cell-code}\n# Estimation of treatment effects with robust standard errors\nATE <- avg_comparisons(fit_mod, \n variables = list(treatment = c(\"TERI\",\"DMF\")),\n vcov = \"HC3\",\n newdata = mdata,\n wts = \"weights\")\n\nresult_ATE <- data.frame( ATE,\n analysis = \"Full Data\")\n```\n:::\n\n\nHenceforth, for ease of explanation, we will use the function **TE_estimation()** attached to the function code that performs all the previous estimation steps at once.\n\n\n## Version info {.unnumbered}\nThis chapter was rendered using the following version of R and its packages:\n\n\n::: {.cell hash='chapter_09_cache/html/unnamed-chunk-3_d3160f04f58ad096cc92a47f4048b9d2'}\n::: {.cell-output .cell-output-stdout}\n```\nR version 4.2.3 (2023-03-15 ucrt)\nPlatform: x86_64-w64-mingw32/x64 (64-bit)\nRunning under: Windows 10 x64 (build 19045)\n\nMatrix products: default\n\nlocale:\n[1] LC_COLLATE=Dutch_Netherlands.utf8 LC_CTYPE=Dutch_Netherlands.utf8 \n[3] LC_MONETARY=Dutch_Netherlands.utf8 LC_NUMERIC=C \n[5] LC_TIME=Dutch_Netherlands.utf8 \n\nattached base packages:\n[1] grid stats graphics grDevices utils datasets methods \n[8] base \n\nother attached packages:\n [1] marginaleffects_0.15.0 ggplot2_3.4.3 missForest_1.5 \n [4] sandwich_3.0-2 PSweight_1.1.8 cobalt_4.5.1 \n [7] WeightIt_0.14.2 MatchIt_4.5.4 optmatch_0.10.6 \n[10] truncnorm_1.0-9 MASS_7.3-60 survey_4.2-1 \n[13] survival_3.5-5 Matrix_1.5-4.1 data.table_1.14.8 \n[16] tidyr_1.3.0 MatchThem_1.1.0 ggmice_0.1.0 \n[19] dplyr_1.1.2 mice_3.16.0 table1_1.4.3 \n[22] kableExtra_1.3.4 \n\nloaded via a namespace (and not attached):\n [1] nlme_3.1-162 webshot_0.5.5 httr_1.4.7 \n [4] numDeriv_2016.8-1.1 doRNG_1.8.6 tools_4.2.3 \n [7] backports_1.4.1 utf8_1.2.3 R6_2.5.1 \n[10] rpart_4.1.19 DBI_1.1.3 colorspace_2.1-0 \n[13] jomo_2.7-6 nnet_7.3-19 withr_2.5.0 \n[16] gbm_2.1.8.1 tidyselect_1.2.0 compiler_4.2.3 \n[19] glmnet_4.1-7 cli_3.6.1 rvest_1.0.3 \n[22] xml2_1.3.4 scales_1.2.1 nnls_1.5 \n[25] randomForest_4.7-1.1 systemfonts_1.0.4 stringr_1.5.0 \n[28] digest_0.6.31 minqa_1.2.5 rmarkdown_2.24 \n[31] svglite_2.1.1 pkgconfig_2.0.3 htmltools_0.5.5 \n[34] lme4_1.1-33 itertools_0.1-3 fastmap_1.1.1 \n[37] htmlwidgets_1.6.2 rlang_1.1.1 rstudioapi_0.15.0 \n[40] shape_1.4.6 generics_0.1.3 zoo_1.8-12 \n[43] jsonlite_1.8.5 magrittr_2.0.3 Formula_1.2-5 \n[46] Rcpp_1.0.10 munsell_0.5.0 fansi_1.0.4 \n[49] lifecycle_1.0.3 stringi_1.7.12 yaml_2.3.7 \n[52] parallel_4.2.3 mitml_0.4-5 crayon_1.5.2 \n[55] lattice_0.21-8 splines_4.2.3 knitr_1.44 \n[58] pillar_1.9.0 boot_1.3-28.1 rngtools_1.5.2 \n[61] codetools_0.2-19 pan_1.6 glue_1.6.2 \n[64] evaluate_0.21 mitools_2.4 vctrs_0.6.3 \n[67] nloptr_2.0.3 foreach_1.5.2 gtable_0.3.4 \n[70] purrr_1.0.1 xfun_0.39 SuperLearner_2.0-28.1\n[73] broom_1.0.5 viridisLite_0.4.2 tibble_3.2.1 \n[76] iterators_1.0.14 gam_1.22-2 \n```\n:::\n:::\n", + "markdown": "---\ntitle: \"Dealing with missing data\"\nauthors: \n - name: Johanna Munoz\n affiliations:\n - ref: julius\n - name: Thomas Debray\n orcid: 0000-0002-1790-2719\n affiliations:\n - ref: smartdas\naffiliations:\n - id: smartdas\n name: Smart Data Analysis and Statistics B.V.\n city: Utrecht\n - id: julius\n name: Julius Center for Health Sciences and Primary Care\n city: Utrecht\nformat:\n html:\n toc: true\n number-sections: true\nexecute:\n cache: true\n---\n\n\n\n\n## Main Analysis\n\nThe main objective of this analysis is to assess whether the number of episodes (y) occurring within specific time periods (years) differs between the treatment groups (1: DMF and 0: TERI). To address potential confounding factors, the researchers consider variables such as patient age, the log of premedical cost (`logPremedicalcost`), previous DMT efficacy (`prevDMTefficacy`), and the number of episodes in previous relapses (prerelapseNum).\n\nWhen estimating treatment effects from observational data, an assumption is made that the patient populations in both treatment groups are as similar as possible. Various methods for balancing data across treatment groups are proposed, including matching, inverse propensity weighting, stratification, and regression adjustment.\n\nIn this case, the focus is specifically on the matching method, which offers advantages over regression adjustment by potentially alleviating issues related to model mis-specification. This includes addressing non-linear relationships between certain confounders and the outcome variable and accounting for treatment effects that may depend on specific confounders (treatment-confounder interaction terms). Propensity scores are used to match subjects in the treatment groups.\n\nMoreover, intentionally introducing incomplete covariate variables in this example adds complexity to the propensity score estimation. Depending on the propensity score estimation technique employed, it may be necessary to incorporate an imputation step. For instance, logistic regression estimation requires complete data for all observations, while XGBoost is robust to missing data \\cite{zhao_propensity_2021}.\n\nTo estimate marginal treatment effects, the g-computation method is employed \\cite{snowden_implementation_2011}. This method involves specifying a model for the outcome dependent on the treatment and covariates. The potential outcomes, i.e., the predicted values of the outcome on treatment ($y_i^1$) and control ($y_i^0$) for each sample unit $i$, are estimated. The marginal treatment effect is then calculated by contrasting the averaged estimated potential outcomes.\n\n\nIn this example, we consider the estimation of comparative treatment effects in the absence of treatment-effect heterogeneity.\n\n\n## Estimation workflow\n\nThe proposed workflow consists of the following steps:\n\n\n\n1. **Data Exploration:** In this step, we examine the observed data to comprehend the variables within the dataset. Our primary focus lies on identifying missing patterns and relationships among observed variables, including missing indicator variables and others. This exploration aids in discerning the most plausible missing mechanisms and suitable imputation techniques. Additionally, field experts' insights may be incorporated to enhance understanding of the missing process, potentially considering MNAR assumptions.\n2. **Imputation:** It is essential to evaluate whether the imputation procedure is necessary or if simpler methods, such as complete case analysis, are more suitable. In case imputation procedures are required, selecting plausible imputation methods that align with the main model analysis is crucial. This involves choosing individual imputation methods for each incomplete variable, determining the predictor variables on the imputation model. Pre_imputation (where imputation values can be deterministically derived from other variables) and Post-imputation (e.g.ensuring imputed values fall within a reasonable range) steps may also considered.\n3. **Data Balancing:** Several methods, including PS matching or inverse weighting propensity score, can be utilized. It is required to evaluate the balance, which could be done via visual inspection.(eg.cobalt package). In this example, we estimate propensity scores using logistic regression. For most balancing procedures in R, counterparts specifically designed for imputed datasets are available, such as those in the matchthem R package, which includes PS matching and IPW as done in the matchit R package.\n4. **Model Fit:** : It is fit a model to predict the outcomes for each sample unit under each possible treatment value (DMF and TERI), as predictors include the treatment and optionally the baseline covariates and also the propensity score.\n5. **Treatment Estimation & Pooling:** For simplicity in this tutorial, we will use the comparison functions from the R **matchingmethods** package \\cite{arel_marginaleffects_2023}, which can be used for completed data and also from outputs from the imputation process. In the last case, internally the functions calculate the treatment effects on each imputed dataset and pool the estimates using Rubin's Rules.\n\nLet's start by preparing the R environment. All the functions used in this tutorial can be found in the resource file `functions.r`.\n\n\n::: {.cell hash='chapter_09_cache/html/unnamed-chunk-2_abd7974b84b47b8689227ead0f93d1e2'}\n\n```{.r .cell-code}\n# Load the required packages and additional functions\nsource(\"resources/chapter 09/functions.r\") \n```\n:::\n\n\n## Homogeneous Treatment Effect\n\nIn this example, we focus on estimating comparative treatment effects in the absence of heterogeneous treatment effects (HTE).\n\n\n### Generating an Observational Dataset\n\nWe can simulate an observational dataset of $N = 3000$ patients as follows:\n\n\n::: {.cell hash='chapter_09_cache/html/hom gendata_70debf96318908085a01fdf6f56df0ff'}\n\n```{.r .cell-code}\ndata_hom <- generate_data(n = 3000, seed = 1234) \n```\n:::\n\n\nThe **generate_data()** function allows the specification of various treatment effect options, easily adjustable by modifying the beta parameter. In this instance, we assume a consistent treatment effect across the entire target population. This dataset currently contains no missing values.\n\nThe simulated dataset comprises two treatment groups with variations in baseline characteristics. For example, the figure below illustrates baseline imbalances in covariates such as age.\n\n\n::: {.cell layout-align=\"center\" hash='chapter_09_cache/html/hom plotdata_d4575521427b03abd94cd895c6a9de08'}\n::: {.cell-output-display}\n{fig-align='center' width=576}\n:::\n:::\n\n\nWe can calculate the treatment effect on the complete observed dataset. To do this, we start by balancing the dataset using Propensity Score matching. In this case, the propensity score model uses confounder variables only: `age`, `gender`, `prevDMTefficacy`, `logPremedicalcost`, and `prerelapseNum`.\n\n\n::: {.cell hash='chapter_09_cache/html/full1_875cd9ca24a5d04e24790c881d2669b0'}\n\n```{.r .cell-code}\n## Apply Matching on the PS for the ATE\nmF <- matchit( treatment ~ age + gender + prevDMTefficacy + logPremedicalcost + prerelapseNum, \n data = data_hom,\n family = binomial,\n method = \"full\",\n caliper = 0.2,\n estimand = \"ATE\",\n replace = FALSE) \n\n## Extract matched data\nmdata <- match.data(mF)\n```\n:::\n\n\nThen, we proceed to model estimation. In this case, a Poisson model is used with the form:\n\n$$\\begin{eqnarray}count_i\\sim Poisson(\\lambda_i)\\end{eqnarray}$$ \n\n$$\\begin{eqnarray} log(\\lambda_i) &=& \\beta_0 + \\beta_1treatment_i + \\beta_2age + \\beta_3gender \\\\ &+& \\beta_4prevDMTefficacy + \\beta_5logPremedicalcost\n\\\\ &+& \\beta_6prerelapseNum + \\beta_7numSymptoms + offset(log(years))\\end{eqnarray}$$\n\nSince patient measurements were recorded over varying time frames, the natural logarithm of the years of evaluation is incorporated as an offset in the model. The model is fitted with a glm function, and we include the treatment and the baseline covariates as predictors, which are optional if the data is well balanced. Additionally, it is necessary to specify the matching weights in the glm function.\n\n\n::: {.cell hash='chapter_09_cache/html/full2_d3b47e8ad45febc02ab4765383d4770e'}\n\n```{.r .cell-code}\n# Model fit\nfit_mod <- glm(as.formula(\"y ~ treatment + gender + age + logPremedicalcost + prerelapseNum + prevDMTefficacy + numSymptoms + offset(log(years))\"),\n family = poisson(link = \"log\"),\n data = mdata,\n weights = weights)\n```\n:::\n\n\nTypically, Poisson models adjust standard errors using robust standard errors to accommodate small values arising from the equidispersion assumption. This correction can be directly applied to the model using the vcovCL() function \\cite{zeileis_sandwich_2022}. However, given that we will calculate the treatment effect using the functions of the **marginaleffects** package, this step becomes redundant. This package allows specifying HC3 sandwich standard errors during treatment estimation.\n\nFinally, we calculate the Average Treatment Effect (ATE). The ATE is defined as $$\\tau_{ATE}=E(y_i^1-y_i^0)$$\n\nBut this cannot be directly extracted from the $\\beta_1$ parameter, as the model has $log(\\lambda)$ as the response. We estimate it as: $$\\tau_{ATE}=E(\\lambda^1_i-\\lambda^0_i)$$ This can be done with the function **avg_comparisons()**, from the R package marginaleffect, that calculates the potential outcomes for each unit sample and then combines them to summarize the average effect.\n\n\n::: {.cell hash='chapter_09_cache/html/full3_d2794d19f736d902a2956bbebe7cf2bc'}\n\n```{.r .cell-code}\n# Estimation of treatment effects with robust standard errors\nATE <- avg_comparisons(fit_mod, \n variables = list(treatment = c(\"TERI\",\"DMF\")),\n vcov = \"HC3\",\n newdata = mdata,\n wts = \"weights\")\n\nresult_ATE <- data.frame( ATE,\n analysis = \"Full Data\")\n```\n:::\n\n\nHenceforth, for ease of explanation, we will use the function **TE_estimation()** attached to the function code that performs all the previous estimation steps at once.\n\n### Generating Missing Values\n\nIn this example, we focus on the case where confounder variables are incomplete.\n\nMissing values can be generated using the **getmissdata()** function, considering the following patterns of missingness for the log previous medical cost (`logPremedicalcost`):\n\n1. MAR: missingness depends on `age` and `sex`\n2. MART: missingness depends on `age`, `sex` and the treatment variable `treatment`\n3. MARTY: missingness depends on `age`, `sex`, `treatment` and the outcome variable `y`\n4. MNAR: missingness depends on `age`, `sex` and `logPremedicalcost`\n\nLets select the missing data pattern \"MART\"\n\n\n::: {.cell hash='chapter_09_cache/html/hom miss_e7e5a0f66dd546f3b0a8422a6ec73d0b'}\n\n```{.r .cell-code}\nm_data_hom <- get_missdata(data = data_hom, scenario = \"MART\")\n```\n:::\n\n\nAfter introducing missing values, we only have complete data for $N=$ 1015 patients.\n\n\n::: {.cell hash='chapter_09_cache/html/hom misstable_a07fd9937593cd5b68de116d98fd7315'}\n::: {.cell-output-display}\n```{=html}\n
\n | DMF (N=2265) | \nTERI (N=735) | \nOverall (N=3000) | \n
---|---|---|---|
Age (years) | \n\n | \n | \n |
Mean (SD) | \n44.4 (9.95) | \n51.5 (8.59) | \n46.2 (10.1) | \n
Median [Min, Max] | \n45.0 [18.0, 64.0] | \n53.0 [23.0, 64.0] | \n47.0 [18.0, 64.0] | \n
Missing | \n303 (13.4%) | \n54 (7.3%) | \n357 (11.9%) | \n
Gender | \n\n | \n | \n |
Female | \n1740 (76.8%) | \n526 (71.6%) | \n2266 (75.5%) | \n
Male | \n525 (23.2%) | \n209 (28.4%) | \n734 (24.5%) | \n
Efficacy of previous DMT | \n\n | \n | \n |
None | \n725 (32.0%) | \n318 (43.3%) | \n1043 (34.8%) | \n
Low_efficacy | \n190 (8.4%) | \n52 (7.1%) | \n242 (8.1%) | \n
Medium_high_efficacy | \n800 (35.3%) | \n225 (30.6%) | \n1025 (34.2%) | \n
Missing | \n550 (24.3%) | \n140 (19.0%) | \n690 (23.0%) | \n
Log prior medical costs | \n\n | \n | \n |
Mean (SD) | \n8.71 (1.03) | \n9.45 (1.20) | \n8.98 (1.15) | \n
Median [Min, Max] | \n8.75 [5.10, 11.3] | \n9.48 [5.56, 12.7] | \n8.98 [5.10, 12.7] | \n
Missing | \n1145 (50.6%) | \n109 (14.8%) | \n1254 (41.8%) | \n
Number of prior symptoms | \n\n | \n | \n |
0 | \n158 (7.0%) | \n53 (7.2%) | \n211 (7.0%) | \n
1 | \n1142 (50.4%) | \n401 (54.6%) | \n1543 (51.4%) | \n
>=2 | \n432 (19.1%) | \n151 (20.5%) | \n583 (19.4%) | \n
Missing | \n533 (23.5%) | \n130 (17.7%) | \n663 (22.1%) | \n
Number of prior relapses | \n\n | \n | \n |
Mean (SD) | \n0.438 (0.650) | \n0.414 (0.653) | \n0.432 (0.650) | \n
Median [Min, Max] | \n0 [0, 4.00] | \n0 [0, 3.00] | \n0 [0, 4.00] | \n
Missing | \n305 (13.5%) | \n71 (9.7%) | \n376 (12.5%) | \n
1nX+6G0?Ho
zK_CzY^~)EpK_FBJ2!!GjEd{tzQ*v?^0)bZRYU-(iKOqoV2*eU1D+`eYpAittA_)lz
za0&b^3$Y|!CcVIy2w6)@SxfL)BpU(VI62bmww&B!x!Y-Sk889@FMBPmFfDuV&e96|
z2JYRKyM0^k@on%*8@2?)kSSVPLP(TIuL#Qs@Gi29z*!bWSQZso7J)BBK*b)s!m{kc
zzS(Di_t;}FObsz?pSrzP80qG>uy12w--r>ikO;t>B_zTUd;)s1B$A|85%?ETgp0rx
zMc}|EG3~Zo+GDx2VPclOR+hbeR+xQOR#?^;vB(lqWGPz|0p5U%h$7Mp2l|1JA{_Wa
zEFu;a5phLCV$EZ@8e&Zik@zi4>sy%pw=D1mToAtziMR+^T!bYq0*q!!8j#c#3=2NN
ziwJ0d4;+y=oOXM77_dMb%d#K)7B=<`j87aNAD^C{USD4atbj29bwG|ZAm}li2AUGb
zz7fYjLlO&Ma4;I61ZWV6#3up@tU!kc9$q$dhCsN2NdKYX<4YD02roqa;srgA#QFY^
zgj1a_2RAb--^!zpJSjj&P_kcbHe#<2T>EBo--2QMzWkas+oku}Hg^{_y#=#qf+IN&
z+;k1S*{*7K;Z}spUR4 =tr8W12%1b7TcDmk6qZFzjb$$F)VzA@st9Vg!w?_$Da;i
ziyzDDy%^ln 3v3B12=EWu)p2{~FMq|oX}`T)
z*~8|lvuDrd#Y(M`@t=o!qp}XiCGp=?nJzUn-{%ZJSI=?C+STM{Pb;# 1nX+6G0?Ho
zK_CzY^~)EpK_FBJ2!!GjEd{tzQ*v?^0)bZRYU-(iKOqoV2*eU1D+`eYpAittA_)lz
za0&b^3$Y|!CcVIy2w6)@SxfL)BpU(VI62bmww&B!x!Y-Sk889@FMBPmFfDuV&e96|
z2JYRKyM0^k@on%*8@2?)kSSVPLP(TIuL#Qs@Gi29z*!bWSQZso7J)BBK*b)s!m{kc
zzS(Di_t;}FObsz?pSrzP80qG>uy12w--r>ikO;t>B_zTUd;)s1B$A|85%?ETgp0rx
zMc}|EG3~Zo+GDx2VPclOR+hbeR+xQOR#?^;vB(lqWGPz|0p5U%h$7Mp2l|1JA{_Wa
zEFu;a5phLCV$EZ@8e&Zik@zi4>sy%pw=D1mToAtziMR+^T!bYq0*q!!8j#c#3=2NN
ziwJ0d4;+y=oOXM77_dMb%d#K)7B=<`j87aNAD^C{USD4atbj29bwG|ZAm}li2AUGb
zz7fYjLlO&Ma4;I61ZWV6#3up@tU!kc9$q$dhCsN2NdKYX<4YD02roqa;srgA#QFY^
zgj1a_2RAb--^!zpJSjj&P_kcbHe#<2T>EBo--2QMzWkas+oku}Hg^{_y#=#qf+IN&
z+;k1S*{*7K;Z}spUR4 =tr8W12%1b7TcDmk6qZFzjb$$F)VzA@st9Vg!w?_$Da;i
ziyzDDy%^lnLZG-UMl0(pW34QZq=(hs5sVt3Ll$?80N7uYROc*lGo7XhM#})
zDx56PjX?f0P_GZiyr%;PjE%;bML3h1oW5Jk@9b?Y2c{)v#u?hFtZvNioH!a}m=fPS
z6JFI2xGZ>K{N3YmhKlkS6j{g)H%BrrAN#28*}yt=r}>SfK-9}VL;3g_yUTF;Oa^Ri
z;8}_+`UC;nP3((y_J&l-&MdkZDUaNcBv(%kHj?Qu41Ty6ua4Jy?$OaNREJ>8I&IK$
zv~vv1d_ae&f<(ck#-LtmicD|HxGFq&@4R~TBQ6Q|&ZJA%nC;f}#exZ`9H)1y_Bl)>
zw^?M+y0
c#k
zdgY>wSe)VQ7IHsXM8>!$kNew6usa(KIqCqOp>lgskr1K+bwSTULQ};DJ^FNVO*E7x
z)az&9c}#5Zw!{dACYxL%7vq8FVdf|)X5qiNaK72
zzCeeoSl|#ol06WED(4_H4g5HZW6U`7p=DfR=nRqTe8gIKca!FA1_vRBQaTV
T;jDtdcdQl$8;k-j2c3RlU22irDFVr(;gUBR|of
zbg;*Ul@EB-$B)Q{7k;Cs`@442rWT#0=xGT6GNNhb5oONvgGHgOLDdD?D^-2yx8;}u
zbPx%B;oYSWPE7$8y^da)H&T1@G!Yj23+Qv;QH*;Pqti70Eq+haTRBSP)Z=sL-b8~^
zz>2!j1rFnHL>qIN*SIQCKAq1BCtZsmbl0EsK%j>6o1okC?@kfZhyIpr(#g&73sQT~
zC+ip~#6)gyt>4>rp^-LPfAS2m{1zrBG#od2Sg`#=^#+gDi}g9F1zHrPfz&LMa~*7=
z8Y%w$o;c6oQ1b3WajNS*crCO*3wb@m&v=YoCCfN#+j>#hyKAPTQozkwcq?b7P=xgJ
zOt`diJL93jz9+pevN8Vo4o^(Xux0s>)pegOI> LZG-UMl0(pW34QZq=(hs5sVt3Ll$?80N7uYROc*lGo7XhM#})
zDx56PjX?f0P_GZiyr%;PjE%;bML3h1oW5Jk@9b?Y2c{)v#u?hFtZvNioH!a}m=fPS
z6JFI2xGZ>K{N3YmhKlkS6j{g)H%BrrAN#28*}yt=r}>SfK-9}VL;3g_yUTF;Oa^Ri
z;8}_+`UC;nP3((y_J&l-&MdkZDUaNcBv(%kHj?Qu41Ty6ua4Jy?$OaNREJ>8I&IK$
zv~vv1d_ae&f<(ck#-LtmicD|HxGFq&@4R~TBQ6Q|&ZJA%nC;f}#exZ`9H)1y_Bl)>
zw^?M+y0
c#k
zdgY>wSe)VQ7IHsXM8>!$kNew6usa(KIqCqOp>lgskr1K+bwSTULQ};DJ^FNVO*E7x
z)az&9