diff --git a/_freeze/chapter_06/execute-results/html.json b/_freeze/chapter_06/execute-results/html.json index 7e7d40e..4281337 100644 --- a/_freeze/chapter_06/execute-results/html.json +++ b/_freeze/chapter_06/execute-results/html.json @@ -1,7 +1,8 @@ { - "hash": "07d894518697e975228c87340b5426bf", + "hash": "550a867d7102c4fa82a8ea710527c853", "result": { - "markdown": "---\ntitle: \"Confounding adjustment using propensity score methods\"\nauthors: \n - name: Tammy Jiang\n affiliations:\n - ref: biogen\n - name: Thomas Debray\n orcid: 0000-0002-1790-2719\n affiliations:\n - ref: smartdas\naffiliations:\n - id: smartdas\n name: Smart Data Analysis and Statistics B.V.\n city: Utrecht\n - id: biogen\n name: Biogen\n city: Cambridge, MA, USA\nformat:\n html:\n toc: true\n number-sections: true\nexecute:\n cache: true\nbibliography: 'https://api.citedrive.com/bib/0d25b38b-db8f-43c4-b934-f4e2f3bd655a/references.bib?x=eyJpZCI6ICIwZDI1YjM4Yi1kYjhmLTQzYzQtYjkzNC1mNGUyZjNiZDY1NWEiLCAidXNlciI6ICIyNTA2IiwgInNpZ25hdHVyZSI6ICI0MGFkYjZhMzYyYWE5Y2U0MjQ2NWE2ZTQzNjlhMWY3NTk5MzhhNzUxZDNjYWIxNDlmYjM4NDgwOTYzMzY5YzFlIn0=/bibliography.bib'\n---\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-1_a15cf0d3ef1861af5bff906ae3518025'}\n\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-2_1bdb61147f662d12f4498126976e5a94'}\n\n:::\n\n\n\n\n\n\n## Introduction\n\nThe purpose of this document is to provide example R code that demonstrates how to estimate the propensity score and implement matching, stratification, weighting, and regression adjustment for the continuous propensity score. In this example using simulated data, we have two disease modifying therapies (DMT1 and DMT0) and the outcome is the number of post-treatment multiple sclerosis relapses during follow-up. We will estimate the average treatment effect in the treated (ATT) using propensity score matching, stratification, and weighting. We will estimate the average treatment effect in the population (ATE) using regression adjustment for the continuous propensity score. The treatment effects can be interpreted as annualized relapse rate ratios (ARR).\n\nWe consider an example dataset with the following characteristics:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-5_a2453ce71a38166a1d3e201899ad012a'}\n\n```{.r .cell-code}\nhead(dat)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n age female prevDMTefficacy premedicalcost numSymptoms prerelapse_num\n1: 50 1 None 3899.61 1 1\n2: 51 0 None 9580.51 1 0\n3: 56 0 None 4785.89 1 0\n4: 44 1 None 8696.80 1 1\n5: 63 0 None 2588.03 1 0\n6: 28 1 None 5435.57 1 0\n treatment y years Iscore\n1: DMT1 0 1.78507871 Moderate A1\n2: DMT1 0 0.01368925 High A1\n3: DMT1 2 3.25530459 High A1\n4: DMT1 2 5.73853525 Neutral\n5: DMT1 0 1.31143053 High A1\n6: DMT1 0 0.59137577 Moderate A0\n```\n:::\n:::\n\n\n## Comparing baseline characteristics\n\n- `DMT1` is the treatment group and `DMT0` is the control group\n- `prevDMTefficacy` is previous DMT efficacy (none, low efficacy, and medium/high efficacy)\n- `prerelapse_num` is the number of previous MS relapses\n\n\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-7_c6371261383f2543c8ca86f662f229cf'}\n::: {.cell-output-display}\n| |DMT0 |DMT1 |\n|:--------------------------|:------------|:------------|\n|n |2300 |7700 |\n|age (mean (SD)) |51.39 (8.32) |44.25 (9.79) |\n|female = 1 (%) |1671 (72.65) |5915 (76.82) |\n|prevDMTefficacy (%) | | |\n|None |1247 (54.22) |3171 (41.18) |\n|Low_efficacy |261 (11.35) |858 (11.14) |\n|Medium_high_efficacy |792 (34.43) |3671 (47.68) |\n|prerelapse_num (mean (SD)) |0.39 (0.62) |0.46 (0.68) |\n:::\n:::\n\n\n## Estimating the propensity score\n\n### Logistic regression\n\nWe sought to restore balance in the distribution of baseline covariates in patients treated with DMT1 (index treatment) and DMT0 (control tratment). We fit a multivariable logistic regression model in which treatment was regressed on baseline characteristics including age, sex, previous DMT efficacy, and previous number of relapses.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-8_f4e02787f761617316996db8165e03b6'}\n\n```{.r .cell-code}\n# Fit logistic regression model\nps.model <- glm(treatment ~ age + female + prevDMTefficacy + prerelapse_num, \n data = dat, family = binomial())\n\n# Summary of logistic regression model\nsummary(ps.model)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nglm(formula = treatment ~ age + female + prevDMTefficacy + prerelapse_num, \n family = binomial(), data = dat)\n\nDeviance Residuals: \n Min 1Q Median 3Q Max \n-2.7949 0.2585 0.5220 0.7478 1.5033 \n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \n(Intercept) 4.809473 0.157127 30.609 < 2e-16 ***\nage -0.086708 0.002996 -28.939 < 2e-16 ***\nfemale1 0.253611 0.057664 4.398 1.09e-05 ***\nprevDMTefficacyLow_efficacy 0.310394 0.083022 3.739 0.000185 ***\nprevDMTefficacyMedium_high_efficacy 0.660266 0.054393 12.139 < 2e-16 ***\nprerelapse_num 0.156318 0.039288 3.979 6.93e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for binomial family taken to be 1)\n\n Null deviance: 10786 on 9999 degrees of freedom\nResidual deviance: 9597 on 9994 degrees of freedom\nAIC: 9609\n\nNumber of Fisher Scoring iterations: 5\n```\n:::\n\n```{.r .cell-code}\n# Extract propensity scores\ndat$ps <- predict(ps.model, data = dat, type = \"response\")\n```\n:::\n\n\n### Assessing overlap\n\nWe examined the degree of overlap in the distribution of propensity scores across treatment groups using histograms and side-by-side box plots.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-9_39ff109988ae313b39dfc313c8e1e104'}\n\n```{.r .cell-code}\n# Histogram\nggplot(dat, aes(x = ps, fill = as.factor(treatment), color = as.factor(treatment))) + \n geom_histogram(alpha = 0.3, position='identity', bins = 15) + \n facet_grid(as.factor(treatment) ~ .) + \n xlab(\"Probability of Treatment\") + \n ylab(\"Count\") +\n ggtitle(\"Propensity Score Distribution by Treatment Group\") +\n theme(legend.position = \"bottom\", legend.direction = \"vertical\")\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Side-by-side box plots\nggplot(dat, aes(x=as.factor(treatment), y=ps, fill=as.factor(treatment))) +\n geom_boxplot() + \n ggtitle(\"Propensity Score Distribution by Treatment Group\") +\n ylab(\"Probability of Treatment\") + \n xlab(\"Treatment group\") +\n theme(legend.position = \"none\")\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-9-2.png){width=672}\n:::\n\n```{.r .cell-code}\n# Distribution of propensity scores by treatment groups\nsummary(dat$ps[dat$treatment == \"DMT1\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 0.3230 0.7214 0.8265 0.7970 0.9010 0.9854 \n```\n:::\n\n```{.r .cell-code}\nsummary(dat$ps[dat$treatment == \"DMT0\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 0.3230 0.5730 0.6894 0.6795 0.7975 0.9799 \n```\n:::\n:::\n\n\n## Propensity score matching\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-10_0763bb0e79ff2ca5ed0dc8c271cc17da'}\n\n:::\n\n\n### 1:1 Optimal full matching without replacement\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-11_3a7fc53d71f5f9e2b671eb685a9eac9b'}\n\n```{.r .cell-code}\nlibrary(MatchIt)\n\n# Use MatchIt package for PS matching\nopt <- matchit(treatment ~ age + female + prevDMTefficacy + prerelapse_num, \n data = dat, \n method = \"full\",\n estimand = \"ATT\")\n\nopt\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nA matchit object\n - method: Optimal full matching\n - distance: Propensity score\n - estimated with logistic regression\n - number of obs.: 10000 (original), 10000 (matched)\n - target estimand: ATT\n - covariates: age, female, prevDMTefficacy, prerelapse_num\n```\n:::\n:::\n\n\n### Assess balance after matching\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-12_35bc9e3ee34c5b03978866fa4e2d5f2f'}\n\n```{.r .cell-code}\nsummary(opt)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nmatchit(formula = treatment ~ age + female + prevDMTefficacy + \n prerelapse_num, data = dat, method = \"full\", estimand = \"ATT\")\n\nSummary of Balance for All Data:\n Means Treated Means Control Std. Mean Diff.\ndistance 0.7970 0.6795 0.8943\nage 44.2496 51.3883 -0.7289\nfemale0 0.2318 0.2735 -0.0987\nfemale1 0.7682 0.7265 0.0987\nprevDMTefficacyNone 0.4118 0.5422 -0.2649\nprevDMTefficacyLow_efficacy 0.1114 0.1135 -0.0065\nprevDMTefficacyMedium_high_efficacy 0.4768 0.3443 0.2651\nprerelapse_num 0.4595 0.3930 0.0976\n Var. Ratio eCDF Mean eCDF Max\ndistance 0.7873 0.1917 0.3379\nage 1.3868 0.1519 0.3085\nfemale0 . 0.0417 0.0417\nfemale1 . 0.0417 0.0417\nprevDMTefficacyNone . 0.1304 0.1304\nprevDMTefficacyLow_efficacy . 0.0020 0.0020\nprevDMTefficacyMedium_high_efficacy . 0.1324 0.1324\nprerelapse_num 1.1990 0.0133 0.0383\n\nSummary of Balance for Matched Data:\n Means Treated Means Control Std. Mean Diff.\ndistance 0.7970 0.7970 0.0003\nage 44.2496 44.3185 -0.0070\nfemale0 0.2318 0.2275 0.0101\nfemale1 0.7682 0.7725 -0.0101\nprevDMTefficacyNone 0.4118 0.4130 -0.0024\nprevDMTefficacyLow_efficacy 0.1114 0.0893 0.0703\nprevDMTefficacyMedium_high_efficacy 0.4768 0.4977 -0.0419\nprerelapse_num 0.4595 0.4399 0.0288\n Var. Ratio eCDF Mean eCDF Max\ndistance 0.9976 0.0005 0.0075\nage 1.0392 0.0038 0.0153\nfemale0 . 0.0043 0.0043\nfemale1 . 0.0043 0.0043\nprevDMTefficacyNone . 0.0012 0.0012\nprevDMTefficacyLow_efficacy . 0.0221 0.0221\nprevDMTefficacyMedium_high_efficacy . 0.0209 0.0209\nprerelapse_num 1.1319 0.0060 0.0229\n Std. Pair Dist.\ndistance 0.0008\nage 0.0667\nfemale0 0.1775\nfemale1 0.1775\nprevDMTefficacyNone 0.1100\nprevDMTefficacyLow_efficacy 0.1846\nprevDMTefficacyMedium_high_efficacy 0.1614\nprerelapse_num 0.2170\n\nSample Sizes:\n Control Treated\nAll 2300. 7700\nMatched (ESS) 307.06 7700\nMatched 2300. 7700\nUnmatched 0. 0\nDiscarded 0. 0\n```\n:::\n\n```{.r .cell-code}\nplot(summary(opt))\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-12-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# black line is treated group, grey line is control group\nplot(opt, type = \"density\", which.xs = vars) \n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-12-2.png){width=672}\n:::\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-12-3.png){width=672}\n:::\n:::\n\n\n### Estimating the ATT\n\nWe can estimate the ATT in the matched sample using Poisson regression in which the number of post-treatment relapses is regressed on treatment status and follow-up time for each patient (captured by the variable `years`). More details are provided at \\url{https://cran.r-project.org/web/packages/MatchIt/vignettes/estimating-effects.html}.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-13_defdae964fedc6fbbf5421b4d30be8a3'}\n\n```{.r .cell-code}\n# Matched data\nmatched.data <- match.data(opt)\n\n# Poisson regression model\nopt.fit <- glm(y ~ treatment + offset(log(years)), \n family = poisson(link = \"log\"),\n data = matched.data, \n weights = weights)\n\n# Treatment effect estimation\nopt.comp <- comparisons(opt.fit,\n variables = \"treatment\",\n vcov = ~subclass,\n newdata = subset(matched.data, treatment == \"DMT1\"),\n wts = \"weights\",\n transform_pre = \"ratio\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: The `transform_pre` argument is deprecated. Use `comparison` instead.\n```\n:::\n\n```{.r .cell-code}\nopt.comp |> tidy()\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWarning: The `transform_pre` argument is deprecated. Use `comparison` instead.\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 1 × 8\n term contrast estimate std.error statistic p.value conf.low conf.high\n \n1 treatment mean(DMT1)… 0.804 0.102 7.88 3.25e-15 0.604 1.00\n```\n:::\n:::\n\n\n\n\nAs indicated in the summary output above, the annualized relapse rate ratio for DMT1 vs DMT0 among patients treated with DMT0 (ATT) is given as 0.8 with a 95% confidence interval ranging from 0.6 to 1.\n\n## Propensity score stratification\n\n### Divide sample into quintiles of propensity scores\n\nWe will form five mutually exclusive groups of the estimated propensity score.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-15_0fa1ff843b0ef791a43ee1c53794d087'}\n\n```{.r .cell-code}\n# Divide the PS scores into five strata of roughly equal size\nbreaks <- quantile(dat$ps, probs = seq(0, 1, by = 0.2))\n\ndat <- dat %>% mutate(ps.strata = cut(ps, \n breaks = breaks,\n labels = seq(1:5),\n include.lowest = TRUE))\n\n# Number of patients in each stratum\ntable(dat$ps.strata)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n 1 2 3 4 5 \n2002 2015 1991 1997 1995 \n```\n:::\n:::\n\n\n### Assess balance within each propensity score stratum\n\nWithin each propensity score stratum, treated and control patients should have similar values of the propensity score and the distribution of baseline covariates should be approximately balanced between treatment groups.\n\n#### Propensity Score Stratum #1\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-16_74ea54cf1d64bd789e355c5e65475a00'}\n\n```{.r .cell-code}\ntab1.strata1 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 1), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", \n test = FALSE)\n\ntab1.strata1.print <- print(tab1.strata1, \n catDigits = 2, \n contDigits = 2, \n smd = TRUE)\n\ntab1.strata1.print\n```\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-17_c8e6b93a48546975150d9d10970e7f42'}\n::: {.cell-output-display}\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |901 |1101 | |\n|age (mean (SD)) |58.38 (3.67) |57.45 (3.73) |0.251 |\n|female = 1 (%) |605 (67.15) |775 (70.39) |0.070 |\n|prevDMTefficacy (%) | | |0.056 |\n|None |650 (72.14) |771 (70.03) | |\n|Low_efficacy |106 (11.76) |130 (11.81) | |\n|Medium_high_efficacy |145 (16.09) |200 (18.17) | |\n|prerelapse_num (mean (SD)) |0.29 (0.53) |0.33 (0.56) |0.074 |\n:::\n:::\n\n\n#### Propensity Score Stratum #2\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-18_a8212d8d6c78ccd9b923ca1f65f40cd5'}\n\n```{.r .cell-code}\ntab1.strata2 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 2), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata2.print <- print(tab1.strata2, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-19_8298901e968c2103f48cce0572760d9b'}\n::: {.cell-output-display}\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |617 |1398 | |\n|age (mean (SD)) |52.18 (4.35) |51.97 (4.22) |0.049 |\n|female = 1 (%) |458 (74.23) |1048 (74.96) |0.017 |\n|prevDMTefficacy (%) | | |0.054 |\n|None |292 (47.33) |624 (44.64) | |\n|Low_efficacy |69 (11.18) |162 (11.59) | |\n|Medium_high_efficacy |256 (41.49) |612 (43.78) | |\n|prerelapse_num (mean (SD)) |0.40 (0.64) |0.41 (0.66) |0.004 |\n:::\n:::\n\n\n#### Propensity Score Stratum #3\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-20_2ab203cb4471818949b2f989c7b09686'}\n\n```{.r .cell-code}\ntab1.strata3 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 3), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata3.print <- print(tab1.strata3, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-21_f3442d2af4abafa33eca02f6324d3c02'}\n::: {.cell-output-display}\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |392 |1599 | |\n|age (mean (SD)) |46.73 (4.06) |46.36 (4.08) |0.092 |\n|female = 1 (%) |305 (77.81) |1193 (74.61) |0.075 |\n|prevDMTefficacy (%) | | |0.041 |\n|None |168 (42.86) |687 (42.96) | |\n|Low_efficacy |52 (13.27) |191 (11.94) | |\n|Medium_high_efficacy |172 (43.88) |721 (45.09) | |\n|prerelapse_num (mean (SD)) |0.49 (0.68) |0.47 (0.66) |0.031 |\n:::\n:::\n\n\n#### Propensity Score Stratum #4\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-22_c5eb0a7850d7db05e7c155d549696e6a'}\n\n```{.r .cell-code}\ntab1.strata4 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 4), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata4.print <- print(tab1.strata4, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-23_cacdf2ebb52436f7904ab8dfdec0d0ad'}\n::: {.cell-output-display}\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |269 |1728 | |\n|age (mean (SD)) |41.07 (4.11) |40.88 (4.29) |0.046 |\n|female = 1 (%) |203 (75.46) |1356 (78.47) |0.071 |\n|prevDMTefficacy (%) | | |0.084 |\n|None |105 (39.03) |634 (36.69) | |\n|Low_efficacy |22 ( 8.18) |181 (10.47) | |\n|Medium_high_efficacy |142 (52.79) |913 (52.84) | |\n|prerelapse_num (mean (SD)) |0.50 (0.69) |0.51 (0.71) |0.012 |\n:::\n:::\n\n\n#### Propensity Score Stratum #5\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-24_1c462c0ac7732aa3e24634c3e3a32f0f'}\n\n```{.r .cell-code}\ntab1.strata5 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 5), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata5.print <- print(tab1.strata5, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-25_bade15351661aeeb7638487ff5fd7a32'}\n::: {.cell-output-display}\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |121 |1874 | |\n|age (mean (SD)) |33.26 (4.95) |32.04 (5.58) |0.233 |\n|female = 1 (%) |100 (82.64) |1543 (82.34) |0.008 |\n|prevDMTefficacy (%) | | |0.050 |\n|None |32 (26.45) |455 (24.28) | |\n|Low_efficacy |12 ( 9.92) |194 (10.35) | |\n|Medium_high_efficacy |77 (63.64) |1225 (65.37) | |\n|prerelapse_num (mean (SD)) |0.52 (0.66) |0.52 (0.73) |0.004 |\n:::\n:::\n\n\n### Estimating and pooling of stratum-specific treatment effects\n\nThe overall ATT across strata can be estimated by weighting stratum-specific estimates by the proportion of treated patients in each stratum over all treated patients in the sample.\n\nWe first define a function `att.strata.function()` to calculate stratum-specific estimates of the treatment effect:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-26_04142c4102d61337c60bd002dab2f6fd'}\n\n```{.r .cell-code}\natt.strata.function <- function(data, stratum, confint = TRUE) {\n\n fit <- glm(\"y ~ treatment + offset(log(years))\",\n family = poisson(link = \"log\"),\n data = data %>% filter(ps.strata == stratum))\n\n arr <- round(as.numeric(exp(coef(fit)[\"treatmentDMT1\"])), digits = 3)\n ll <- ul <- NA\n \n if (confint) {\n ll <- round(exp(confint(fit))[\"treatmentDMT1\",1], digits = 3)\n ul <- round(exp(confint(fit))[\"treatmentDMT1\",2], digits = 3)\n }\n \n return(c(\"stratum\" = stratum,\n \"arr\" = arr,\n \"ci_lower\" = ll,\n \"ci_upper\" = ul))\n}\n\narr.strata <- as.data.frame(t(sapply(1:5, att.strata.function, data = dat)))\narr.strata\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n stratum arr ci_lower ci_upper\n1 1 0.904 0.760 1.076\n2 2 0.822 0.696 0.975\n3 3 0.798 0.666 0.961\n4 4 0.716 0.587 0.881\n5 5 0.589 0.463 0.761\n```\n:::\n:::\n\n\nSubsequently, we define a function `weights.strata.function()` to calculate the weights for each stratum. The weight is the proportion of treated patients in each stratum over all treated patients in the sample:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-27_b8939c3f4adc7c8e2e5a4fccbd14f9a5'}\n\n```{.r .cell-code}\nweights.strata.function <- function(data, stratum) {\n n_DMT1_stratum <- nrow(data %>% filter(ps.strata == stratum & treatment == \"DMT1\"))\n n_DMT1_all <- nrow(data %>% filter(treatment == \"DMT1\"))\n weight <- n_DMT1_stratum/n_DMT1_all\n return(c(\"stratum\" = stratum, \"weight\" = weight))\n}\n\nweights.strata <- as.data.frame(t(sapply(1:5, weights.strata.function, data = dat)))\nweights.strata\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n stratum weight\n1 1 0.1429870\n2 2 0.1815584\n3 3 0.2076623\n4 4 0.2244156\n5 5 0.2433766\n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-28_54d6b1a867e07ad4571056ecd54fecf2'}\n\n```{.r .cell-code}\n# Create table with ARRs and weights for each PS stratum\narr.weights.merged <- merge(arr.strata, weights.strata, by = \"stratum\")\n\n# Calculate the weighted ARR for each stratum\narr.weights.merged <- arr.weights.merged %>%\n mutate(weighted.arr = as.numeric(arr) * weight)\n\n# Sum the weighted ARRs across strata to get the overall ATT\nsum(arr.weights.merged$weighted.arr)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.7482462\n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-29_97ffa0dccd5d2576f33ced4eee15b39a'}\n\n:::\n\n\nWe now define a new function `ps.stratification.bootstrap()` that integrates estimation of the ATT and the PS weights for bootstrapping purposes:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-30_11e3397f516871071de083028334e45c'}\n\n```{.r .cell-code}\nps.stratification.bootstrap <- function(data, inds) {\n d <- data[inds,]\n \n d$ps.strata <- cut(d$ps, \n breaks = c(quantile(dat$ps, probs = seq(0, 1, by = 0.2))),\n labels = seq(5),\n include.lowest = TRUE)\n \n arr.strata <- as.data.frame(t(sapply(1:5, att.strata.function, \n data = d, confint = FALSE)))\n \n weights.strata <- as.data.frame(t(sapply(1:5, weights.strata.function, data = d)))\n \n return(arr.strata$arr[1] * weights.strata$weight[1] + \n arr.strata$arr[2] * weights.strata$weight[2] +\n arr.strata$arr[3] * weights.strata$weight[3] + \n arr.strata$arr[4] * weights.strata$weight[4] +\n arr.strata$arr[5] * weights.strata$weight[5]) \n}\n```\n:::\n\n\nWe can now estimate the treatment effect and its confidence interval using the bootstrap procedure:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-31_5667cf2d7c55626933820ae8ac50691f'}\n\n```{.r .cell-code}\nlibrary(boot)\n\nset.seed(1854)\narr.stratification.boot <- boot(data = dat, \n statistic = ps.stratification.bootstrap, \n R = 1000)\n```\n:::\n\n\nWe can summarize the bootstrap samples as follows:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-32_5dea8b91bbcda9c8bba8b2b176826aa4'}\n\n```{.r .cell-code}\n# Bootstrap estimate of the ARR\nmedian(arr.stratification.boot$t)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.7558609\n```\n:::\n\n```{.r .cell-code}\n# Bootstrap 95% CI of the ARR\nboot.ci(arr.stratification.boot, conf = 0.95, type = \"perc\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS\nBased on 1000 bootstrap replicates\n\nCALL : \nboot.ci(boot.out = arr.stratification.boot, conf = 0.95, type = \"perc\")\n\nIntervals : \nLevel Percentile \n95% ( 0.6833, 0.8366 ) \nCalculations and Intervals on Original Scale\n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-33_a59240988eb8ee5b46c07a6be8ae8679'}\n\n:::\n\n\n## Propensity score weighting\n\n### Calculate propensity score weights for ATT\n\nPropensity score weighting reweights the study sample to generate an artificial population (i.e., pseudo-population) in which the covariates are no longer associated with treatment, thereby removing confounding by measured covariates. For the ATT, the weight for all treated patients is set to one. Conversely, the weight for patients in the control group is set to the propensity score divided by one minus the propensity score, that is, (PS/(1 − PS)). We estimated stabilized weights to address extreme weights.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-34_346b5add1f599dfeab9c42f7f503bec9'}\n\n```{.r .cell-code}\nlibrary(WeightIt)\n\nw.out <- weightit(treatment ~ age + female + prevDMTefficacy + prerelapse_num,\n data = dat,\n method = \"ps\",\n estimand = \"ATT\")\n #stabilize = TRUE)\n\nw.out\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nA weightit object\n - method: \"glm\" (propensity score weighting with GLM)\n - number of obs.: 10000\n - sampling weights: none\n - treatment: 2-category\n - estimand: ATT (focal: DMT1)\n - covariates: age, female, prevDMTefficacy, prerelapse_num\n```\n:::\n\n```{.r .cell-code}\nsummary(w.out)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Summary of weights\n\n- Weight ranges:\n\n Min Max\nDMT0 0.4772 |---------------------------| 48.6856\nDMT1 1.0000 || 1.0000\n\n- Units with the 5 most extreme weights by group:\n \n 9492 8836 6544 9610 4729\n DMT0 32.1027 32.1027 34.3126 38.1817 48.6856\n 8 7 4 2 1\n DMT1 1 1 1 1 1\n\n- Weight statistics:\n\n Coef of Var MAD Entropy # Zeros\nDMT0 1.098 0.673 0.383 0\nDMT1 0.000 0.000 -0.000 0\n\n- Effective Sample Sizes:\n\n DMT0 DMT1\nUnweighted 2300. 7700\nWeighted 1043.16 7700\n```\n:::\n\n```{.r .cell-code}\nplot(summary(w.out))\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-34-1.png){width=672}\n:::\n:::\n\n\n### Assess balance in the weighted sample\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-35_aa90cebe75f040f6018954795203d760'}\n\n```{.r .cell-code}\nbal.tab(w.out, stats = c(\"m\", \"v\"), thresholds = c(m = .05))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBalance Measures\n Type Diff.Adj M.Threshold\nprop.score Distance -0.0045 Balanced, <0.05\nage Contin. 0.0054 Balanced, <0.05\nfemale Binary 0.0005 Balanced, <0.05\nprevDMTefficacy_None Binary -0.0003 Balanced, <0.05\nprevDMTefficacy_Low_efficacy Binary 0.0023 Balanced, <0.05\nprevDMTefficacy_Medium_high_efficacy Binary -0.0020 Balanced, <0.05\nprerelapse_num Contin. -0.0034 Balanced, <0.05\n V.Ratio.Adj\nprop.score 0.9926\nage 1.0102\nfemale .\nprevDMTefficacy_None .\nprevDMTefficacy_Low_efficacy .\nprevDMTefficacy_Medium_high_efficacy .\nprerelapse_num 1.0941\n\nBalance tally for mean differences\n count\nBalanced, <0.05 7\nNot Balanced, >0.05 0\n\nVariable with the greatest mean difference\n Variable Diff.Adj M.Threshold\n age 0.0054 Balanced, <0.05\n\nEffective sample sizes\n DMT0 DMT1\nUnadjusted 2300. 7700\nAdjusted 1043.16 7700\n```\n:::\n:::\n\n\n### Estimate the ATT\n\nOne way to estimate the ATT is to use the survey package. The function `svyglm()` generates model-robust (Horvitz-Thompson-type) standard errors by default, and thus does not require additional adjustments.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-36_b908d6b681a774ca42ffc55aa14a0ba9'}\n\n```{.r .cell-code}\nlibrary(survey)\n\nweighted.data <- svydesign(ids = ~1, data = dat, weights = ~w.out$weights)\n\nweighted.fit <- svyglm(y ~ treatment + offset(log(years)),\n family = poisson(link = \"log\"),\n design = weighted.data)\n\nexp(coef(weighted.fit)[\"treatmentDMT1\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ntreatmentDMT1 \n 0.7083381 \n```\n:::\n\n```{.r .cell-code}\nexp(confint(weighted.fit))[\"treatmentDMT1\",] \n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 % \n0.6245507 0.8033662 \n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-37_cff261099e9cdd74847bd049274d6636'}\n\n:::\n\n\nAs indicated above, propensity score weighting yielded an ATT estimate of 0.71 (95% CI: 0.62; 0.8).\n\nAn alternative approach is to use `glm()` to estimate the treatment effect and calculate robust standard errors.\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-38_1488e2d259f77e9e17d6a7a158960e9e'}\n\n```{.r .cell-code}\n# Alternative way to estimate treatment effect\nweighted.fit2 <- glm(y ~ treatment + offset(log(years)),\n family = poisson(link = \"log\"),\n data = dat,\n weights = w.out$weights)\n\n# Extract the estimated ARR\nexp(coef(weighted.fit2))[\"treatmentDMT1\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ntreatmentDMT1 \n 0.7083381 \n```\n:::\n\n```{.r .cell-code}\n# Calculate robust standard error and p-value of the log ARR\ncoeftest(weighted.fit2, vcov. = vcovHC)[\"treatmentDMT1\",]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n Estimate Std. Error z value Pr(>|z|) \n-3.448337e-01 6.442745e-02 -5.352280e+00 8.685284e-08 \n```\n:::\n\n```{.r .cell-code}\n# Derive 95% confidence interval of the ARR\nexp(lmtest::coefci(weighted.fit2, \n level = 0.95, # 95% confidence interval\n vcov. = vcovHC)[\"treatmentDMT1\",])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 % \n0.6243094 0.8036767 \n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-39_f85cdf8b48c522f66950df0896049d2c'}\n\n:::\n\n\nUsing this approach, the ATT estimate was 0.71 (95% CI: 0.62; 0.8).\n\n## Regression adjustment for the propensity score for the ATE\n\nIn this approach, a regression model is fitted to describe the observed outcome as a function of the received treatment and the estimated propensity score:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-40_1fda26ad359820a84b7855efffc7c13a'}\n\n```{.r .cell-code}\nps.reg.fit <- glm(y ~ treatment + ps + offset(log(years)),\n family = poisson(link = \"log\"),\n data = dat)\n\nsummary(ps.reg.fit)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nglm(formula = y ~ treatment + ps + offset(log(years)), family = poisson(link = \"log\"), \n data = dat)\n\nDeviance Residuals: \n Min 1Q Median 3Q Max \n-2.0160 -0.7336 -0.4441 -0.1352 4.2634 \n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \n(Intercept) -1.99585 0.10359 -19.266 < 2e-16 ***\ntreatmentDMT1 -0.25598 0.04431 -5.777 7.60e-09 ***\nps 1.07521 0.13878 7.748 9.36e-15 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for poisson family taken to be 1)\n\n Null deviance: 7514.7 on 9999 degrees of freedom\nResidual deviance: 7443.0 on 9997 degrees of freedom\nAIC: 12378\n\nNumber of Fisher Scoring iterations: 6\n```\n:::\n\n```{.r .cell-code}\n# ATE\nexp(coef(ps.reg.fit))[\"treatmentDMT1\"] \n```\n\n::: {.cell-output .cell-output-stdout}\n```\ntreatmentDMT1 \n 0.7741606 \n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-41_7a19b53bebc8ea655af5b66fc055884b'}\n::: {.cell-output .cell-output-stderr}\n```\nWaiting for profiling to be done...\nWaiting for profiling to be done...\n```\n:::\n:::\n\n\nBootstrapped confidence intervals can be obtained as follows:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-42_737790f91954dd7020fb6da322bb58f5'}\n\n```{.r .cell-code}\n# Function to bootstrap for 95% CIs\nps.reg.bootstrap <- function(data, inds) {\n d <- data[inds,]\n \n fit <- glm(y ~ treatment + ps + offset(log(years)),\n family = poisson(link = \"log\"),\n data = d)\n \n return(exp(coef(fit))[\"treatmentDMT1\"])\n}\n\nset.seed(1854)\n\n# Generate 1000 bootstrap replicates\narr.boot <- boot(dat, statistic = ps.reg.bootstrap, R = 1000) \n\n# Extract the median annualized relapse rate across 1000 bootstrap replicates\nmedian(arr.boot$t) \n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.7750426\n```\n:::\n\n```{.r .cell-code}\n# Bootstrap 95% CI of the ARR\nboot.ci(arr.boot, conf = 0.95, type = \"perc\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nBOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS\nBased on 1000 bootstrap replicates\n\nCALL : \nboot.ci(boot.out = arr.boot, conf = 0.95, type = \"perc\")\n\nIntervals : \nLevel Percentile \n95% ( 0.7007, 0.8547 ) \nCalculations and Intervals on Original Scale\n```\n:::\n:::\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-43_eefe32f304598c1ae134cee7355dad3b'}\n\n:::\n\n\n## Overview\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-44_9d84c2b88bfd5d71079d463850d6ae15'}\n::: {.cell-output-display}\n|Method |Estimand | Estimate| 95% CI (lower)| 95% CI (upper)|\n|:----------------------------------------------------|:--------|---------:|--------------:|--------------:|\n|Optimal full matching |ATT | 0.8039901| 0.6040436| 1.0039366|\n|Propensity score stratification |ATT | 0.7482462| NA| NA|\n|Propensity score stratification (with bootstrapping) |ATT | 0.7558609| 0.6832955| 0.8365798|\n|Propensity score weighting |ATT | 0.7083381| 0.6245507| 0.8033662|\n|Propensity score weighting (robust SE) |ATT | 0.7083381| 0.6243094| 0.8036767|\n|PS regression adjustment |ATE | 0.7741606| 0.7101080| 0.8448218|\n|PS regression adjustment (bootstrapping) |ATE | 0.7750426| 0.7006837| 0.8546834|\n:::\n:::\n\n\n## Version info {.unnumbered}\nThis chapter was rendered using the following version of R and its packages:\n\n\n::: {.cell hash='chapter_06_cache/html/unnamed-chunk-45_2867b20e33a2cb50a556cc371f27c648'}\n::: {.cell-output .cell-output-stdout}\n```\nR version 4.2.3 (2023-03-15 ucrt)\nPlatform: x86_64-w64-mingw32/x64 (64-bit)\nRunning under: Windows 10 x64 (build 19045)\n\nMatrix products: default\n\nlocale:\n[1] LC_COLLATE=Dutch_Netherlands.utf8 LC_CTYPE=Dutch_Netherlands.utf8 \n[3] LC_MONETARY=Dutch_Netherlands.utf8 LC_NUMERIC=C \n[5] LC_TIME=Dutch_Netherlands.utf8 \n\nattached base packages:\n[1] grid stats graphics grDevices utils datasets methods \n[8] base \n\nother attached packages:\n [1] WeightIt_0.14.2 boot_1.3-28.1 MatchIt_4.5.5 \n [4] sandwich_3.0-2 truncnorm_1.0-9 tableone_0.13.2 \n [7] survey_4.2-1 survival_3.5-5 Matrix_1.5-4.1 \n[10] MASS_7.3-60 marginaleffects_0.15.0 lmtest_0.9-40 \n[13] zoo_1.8-12 knitr_1.45 ggplot2_3.4.4 \n[16] data.table_1.14.8 cobalt_4.5.2 dplyr_1.1.2 \n\nloaded via a namespace (and not attached):\n [1] tidyselect_1.2.0 xfun_0.39 mitools_2.4 splines_4.2.3 \n [5] lattice_0.21-8 colorspace_2.1-0 vctrs_0.6.3 generics_0.1.3 \n [9] htmltools_0.5.5 yaml_2.3.7 utf8_1.2.3 rlang_1.1.1 \n[13] pillar_1.9.0 glue_1.6.2 withr_2.5.2 DBI_1.1.3 \n[17] lifecycle_1.0.4 munsell_0.5.0 gtable_0.3.4 htmlwidgets_1.6.2\n[21] codetools_0.2-19 evaluate_0.23 labeling_0.4.3 fastmap_1.1.1 \n[25] fansi_1.0.4 Rcpp_1.0.10 backports_1.4.1 scales_1.2.1 \n[29] jsonlite_1.8.7 farver_2.1.1 chk_0.9.1 digest_0.6.31 \n[33] cli_3.6.1 tools_4.2.3 magrittr_2.0.3 tibble_3.2.1 \n[37] crayon_1.5.2 pkgconfig_2.0.3 rmarkdown_2.25 rstudioapi_0.15.0\n[41] R6_2.5.1 compiler_4.2.3 \n```\n:::\n:::\n\n\n## References {.unnumbered}\n", + "engine": "knitr", + "markdown": "---\ntitle: \"Confounding adjustment using propensity score methods\"\nauthors: \n - name: Tammy Jiang\n affiliations:\n - ref: biogen\n - name: Thomas Debray\n orcid: 0000-0002-1790-2719\n affiliations:\n - ref: smartdas\naffiliations:\n - id: smartdas\n name: Smart Data Analysis and Statistics B.V.\n city: Utrecht\n - id: biogen\n name: Biogen\n city: Cambridge, MA, USA\nformat:\n html:\n toc: true\n number-sections: true\nexecute:\n cache: true\nbibliography: 'https://api.citedrive.com/bib/0d25b38b-db8f-43c4-b934-f4e2f3bd655a/references.bib?x=eyJpZCI6ICIwZDI1YjM4Yi1kYjhmLTQzYzQtYjkzNC1mNGUyZjNiZDY1NWEiLCAidXNlciI6ICIyNTA2IiwgInNpZ25hdHVyZSI6ICI0MGFkYjZhMzYyYWE5Y2U0MjQ2NWE2ZTQzNjlhMWY3NTk5MzhhNzUxZDNjYWIxNDlmYjM4NDgwOTYzMzY5YzFlIn0=/bibliography.bib'\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n\n\n\n\n## Introduction\n\nThe purpose of this document is to provide example R code that demonstrates how to estimate the propensity score and implement matching, stratification, weighting, and regression adjustment for the continuous propensity score. In this example using simulated data, we have two disease modifying therapies (DMT1 and DMT0) and the outcome is the number of post-treatment multiple sclerosis relapses during follow-up. We will estimate the average treatment effect in the treated (ATT) using propensity score matching, stratification, and weighting. We will estimate the average treatment effect in the population (ATE) using regression adjustment for the continuous propensity score. The treatment effects can be interpreted as annualized relapse rate ratios (ARR).\n\nWe consider an example dataset with the following characteristics:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(dat)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n age female prevDMTefficacy premedicalcost numSymptoms prerelapse_num\n \n1: 50 1 None 3899.61 1 1\n2: 51 0 None 9580.51 1 0\n3: 56 0 None 4785.89 1 0\n4: 44 1 None 8696.80 1 1\n5: 63 0 None 2588.03 1 0\n6: 28 1 None 5435.57 1 0\n treatment y years Iscore\n \n1: DMT1 0 1.78507871 Moderate A1\n2: DMT1 0 0.01368925 High A1\n3: DMT1 2 3.25530459 High A1\n4: DMT1 2 5.73853525 Neutral\n5: DMT1 0 1.31143053 High A1\n6: DMT1 0 0.59137577 Moderate A0\n```\n\n\n:::\n:::\n\n\n## Comparing baseline characteristics\n\n- `DMT1` is the treatment group and `DMT0` is the control group\n- `prevDMTefficacy` is previous DMT efficacy (none, low efficacy, and medium/high efficacy)\n- `prerelapse_num` is the number of previous MS relapses\n\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n| |DMT0 |DMT1 |\n|:--------------------------|:------------|:------------|\n|n |2300 |7700 |\n|age (mean (SD)) |51.39 (8.32) |44.25 (9.79) |\n|female = 1 (%) |1671 (72.65) |5915 (76.82) |\n|prevDMTefficacy (%) | | |\n|None |1247 (54.22) |3171 (41.18) |\n|Low_efficacy |261 (11.35) |858 (11.14) |\n|Medium_high_efficacy |792 (34.43) |3671 (47.68) |\n|prerelapse_num (mean (SD)) |0.39 (0.62) |0.46 (0.68) |\n\n\n:::\n:::\n\n\n## Estimating the propensity score\n\n### Logistic regression\n\nWe sought to restore balance in the distribution of baseline covariates in patients treated with DMT1 (index treatment) and DMT0 (control tratment). We fit a multivariable logistic regression model in which treatment was regressed on baseline characteristics including age, sex, previous DMT efficacy, and previous number of relapses.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Fit logistic regression model\nps.model <- glm(treatment ~ age + female + prevDMTefficacy + prerelapse_num, \n data = dat, family = binomial())\n\n# Summary of logistic regression model\nsummary(ps.model)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\nCall:\nglm(formula = treatment ~ age + female + prevDMTefficacy + prerelapse_num, \n family = binomial(), data = dat)\n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \n(Intercept) 4.809473 0.157127 30.609 < 2e-16 ***\nage -0.086708 0.002996 -28.939 < 2e-16 ***\nfemale1 0.253611 0.057664 4.398 1.09e-05 ***\nprevDMTefficacyLow_efficacy 0.310394 0.083022 3.739 0.000185 ***\nprevDMTefficacyMedium_high_efficacy 0.660266 0.054393 12.139 < 2e-16 ***\nprerelapse_num 0.156318 0.039288 3.979 6.93e-05 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for binomial family taken to be 1)\n\n Null deviance: 10786 on 9999 degrees of freedom\nResidual deviance: 9597 on 9994 degrees of freedom\nAIC: 9609\n\nNumber of Fisher Scoring iterations: 5\n```\n\n\n:::\n\n```{.r .cell-code}\n# Extract propensity scores\ndat$ps <- predict(ps.model, data = dat, type = \"response\")\n```\n:::\n\n\n### Assessing overlap\n\nWe examined the degree of overlap in the distribution of propensity scores across treatment groups using histograms and side-by-side box plots.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Histogram\nggplot(dat, aes(x = ps, fill = as.factor(treatment), color = as.factor(treatment))) + \n geom_histogram(alpha = 0.3, position='identity', bins = 15) + \n facet_grid(as.factor(treatment) ~ .) + \n xlab(\"Probability of Treatment\") + \n ylab(\"Count\") +\n ggtitle(\"Propensity Score Distribution by Treatment Group\") +\n theme(legend.position = \"bottom\", legend.direction = \"vertical\")\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-9-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# Side-by-side box plots\nggplot(dat, aes(x=as.factor(treatment), y=ps, fill=as.factor(treatment))) +\n geom_boxplot() + \n ggtitle(\"Propensity Score Distribution by Treatment Group\") +\n ylab(\"Probability of Treatment\") + \n xlab(\"Treatment group\") +\n theme(legend.position = \"none\")\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-9-2.png){width=672}\n:::\n\n```{.r .cell-code}\n# Distribution of propensity scores by treatment groups\nsummary(dat$ps[dat$treatment == \"DMT1\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 0.3230 0.7214 0.8265 0.7970 0.9010 0.9854 \n```\n\n\n:::\n\n```{.r .cell-code}\nsummary(dat$ps[dat$treatment == \"DMT0\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 0.3230 0.5730 0.6894 0.6795 0.7975 0.9799 \n```\n\n\n:::\n:::\n\n\n## Propensity score matching\n\n\n::: {.cell}\n\n:::\n\n\n### 1:1 Optimal full matching without replacement\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(MatchIt)\n\n# Use MatchIt package for PS matching\nopt <- matchit(treatment ~ age + female + prevDMTefficacy + prerelapse_num, \n data = dat, \n method = \"full\",\n estimand = \"ATT\")\n\nopt\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nA matchit object\n - method: Optimal full matching\n - distance: Propensity score\n - estimated with logistic regression\n - number of obs.: 10000 (original), 10000 (matched)\n - target estimand: ATT\n - covariates: age, female, prevDMTefficacy, prerelapse_num\n```\n\n\n:::\n:::\n\n\n### Assess balance after matching\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(opt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\nCall:\nmatchit(formula = treatment ~ age + female + prevDMTefficacy + \n prerelapse_num, data = dat, method = \"full\", estimand = \"ATT\")\n\nSummary of Balance for All Data:\n Means Treated Means Control Std. Mean Diff.\ndistance 0.7970 0.6795 0.8943\nage 44.2496 51.3883 -0.7289\nfemale0 0.2318 0.2735 -0.0987\nfemale1 0.7682 0.7265 0.0987\nprevDMTefficacyNone 0.4118 0.5422 -0.2649\nprevDMTefficacyLow_efficacy 0.1114 0.1135 -0.0065\nprevDMTefficacyMedium_high_efficacy 0.4768 0.3443 0.2651\nprerelapse_num 0.4595 0.3930 0.0976\n Var. Ratio eCDF Mean eCDF Max\ndistance 0.7873 0.1917 0.3379\nage 1.3868 0.1519 0.3085\nfemale0 . 0.0417 0.0417\nfemale1 . 0.0417 0.0417\nprevDMTefficacyNone . 0.1304 0.1304\nprevDMTefficacyLow_efficacy . 0.0020 0.0020\nprevDMTefficacyMedium_high_efficacy . 0.1324 0.1324\nprerelapse_num 1.1990 0.0133 0.0383\n\nSummary of Balance for Matched Data:\n Means Treated Means Control Std. Mean Diff.\ndistance 0.7970 0.7970 0.0001\nage 44.2496 44.1749 0.0076\nfemale0 0.2318 0.2327 -0.0022\nfemale1 0.7682 0.7673 0.0022\nprevDMTefficacyNone 0.4118 0.4391 -0.0555\nprevDMTefficacyLow_efficacy 0.1114 0.0873 0.0766\nprevDMTefficacyMedium_high_efficacy 0.4768 0.4736 0.0064\nprerelapse_num 0.4595 0.4759 -0.0241\n Var. Ratio eCDF Mean eCDF Max\ndistance 0.9968 0.0007 0.0100\nage 0.9979 0.0047 0.0300\nfemale0 . 0.0009 0.0009\nfemale1 . 0.0009 0.0009\nprevDMTefficacyNone . 0.0273 0.0273\nprevDMTefficacyLow_efficacy . 0.0241 0.0241\nprevDMTefficacyMedium_high_efficacy . 0.0032 0.0032\nprerelapse_num 1.0230 0.0060 0.0229\n Std. Pair Dist.\ndistance 0.0012\nage 0.1158\nfemale0 0.2334\nfemale1 0.2334\nprevDMTefficacyNone 0.2183\nprevDMTefficacyLow_efficacy 0.3742\nprevDMTefficacyMedium_high_efficacy 0.3758\nprerelapse_num 0.3690\n\nSample Sizes:\n Control Treated\nAll 2300. 7700\nMatched (ESS) 253.16 7700\nMatched 2300. 7700\nUnmatched 0. 0\nDiscarded 0. 0\n```\n\n\n:::\n\n```{.r .cell-code}\nplot(summary(opt))\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-12-1.png){width=672}\n:::\n\n```{.r .cell-code}\n# black line is treated group, grey line is control group\nplot(opt, type = \"density\", which.xs = vars) \n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-12-2.png){width=672}\n:::\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-12-3.png){width=672}\n:::\n:::\n\n\n### Estimating the ATT\n\nWe can estimate the ATT in the matched sample using Poisson regression in which the number of post-treatment relapses is regressed on treatment status and follow-up time for each patient (captured by the variable `years`). More details are provided at [this link](https://cran.r-project.org/web/packages/MatchIt/vignettes/estimating-effects.html).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Matched data\nmatched.data <- match.data(opt)\n\n# Poisson regression model\nopt.fit <- glm(y ~ treatment + offset(log(years)), \n family = poisson(link = \"log\"),\n data = matched.data, \n weights = weights)\n\n# Treatment effect estimation\nopt.comp <- avg_comparisons(opt.fit,\n variables = \"treatment\",\n vcov = ~subclass,\n newdata = subset(matched.data, treatment == \"DMT1\"),\n wts = \"weights\",\n comparison = \"ratio\")\n\nopt.comp |> tidy()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 1 × 12\n term contrast estimate std.error statistic p.value s.value conf.low\n \n1 treatment mean(DMT1) /… 0.772 0.105 7.37 1.68e-13 42.4 0.566\n# ℹ 4 more variables: conf.high , predicted_lo , predicted_hi ,\n# predicted \n```\n\n\n:::\n:::\n\n\n\n\nAs indicated in the summary output above, the annualized relapse rate ratio for DMT1 vs DMT0 among patients treated with DMT0 (ATT) is given as 0.77 with a 95% confidence interval ranging from 0.57 to 0.98.\n\n## Propensity score stratification\n\n### Divide sample into quintiles of propensity scores\n\nWe will form five mutually exclusive groups of the estimated propensity score.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Divide the PS scores into five strata of roughly equal size\nbreaks <- quantile(dat$ps, probs = seq(0, 1, by = 0.2))\n\ndat <- dat %>% mutate(ps.strata = cut(ps, \n breaks = breaks,\n labels = seq(1:5),\n include.lowest = TRUE))\n\n# Number of patients in each stratum\ntable(dat$ps.strata)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\n 1 2 3 4 5 \n2002 2015 1991 1997 1995 \n```\n\n\n:::\n:::\n\n\n### Assess balance within each propensity score stratum\n\nWithin each propensity score stratum, treated and control patients should have similar values of the propensity score and the distribution of baseline covariates should be approximately balanced between treatment groups.\n\n#### Propensity Score Stratum #1\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntab1.strata1 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 1), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", \n test = FALSE)\n\ntab1.strata1.print <- print(tab1.strata1, \n catDigits = 2, \n contDigits = 2, \n smd = TRUE)\n\ntab1.strata1.print\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |901 |1101 | |\n|age (mean (SD)) |58.38 (3.67) |57.45 (3.73) |0.251 |\n|female = 1 (%) |605 (67.15) |775 (70.39) |0.070 |\n|prevDMTefficacy (%) | | |0.056 |\n|None |650 (72.14) |771 (70.03) | |\n|Low_efficacy |106 (11.76) |130 (11.81) | |\n|Medium_high_efficacy |145 (16.09) |200 (18.17) | |\n|prerelapse_num (mean (SD)) |0.29 (0.53) |0.33 (0.56) |0.074 |\n\n\n:::\n:::\n\n\n#### Propensity Score Stratum #2\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntab1.strata2 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 2), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata2.print <- print(tab1.strata2, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |617 |1398 | |\n|age (mean (SD)) |52.18 (4.35) |51.97 (4.22) |0.049 |\n|female = 1 (%) |458 (74.23) |1048 (74.96) |0.017 |\n|prevDMTefficacy (%) | | |0.054 |\n|None |292 (47.33) |624 (44.64) | |\n|Low_efficacy |69 (11.18) |162 (11.59) | |\n|Medium_high_efficacy |256 (41.49) |612 (43.78) | |\n|prerelapse_num (mean (SD)) |0.40 (0.64) |0.41 (0.66) |0.004 |\n\n\n:::\n:::\n\n\n#### Propensity Score Stratum #3\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntab1.strata3 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 3), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata3.print <- print(tab1.strata3, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |392 |1599 | |\n|age (mean (SD)) |46.73 (4.06) |46.36 (4.08) |0.092 |\n|female = 1 (%) |305 (77.81) |1193 (74.61) |0.075 |\n|prevDMTefficacy (%) | | |0.041 |\n|None |168 (42.86) |687 (42.96) | |\n|Low_efficacy |52 (13.27) |191 (11.94) | |\n|Medium_high_efficacy |172 (43.88) |721 (45.09) | |\n|prerelapse_num (mean (SD)) |0.49 (0.68) |0.47 (0.66) |0.031 |\n\n\n:::\n:::\n\n\n#### Propensity Score Stratum #4\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntab1.strata4 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 4), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata4.print <- print(tab1.strata4, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |269 |1728 | |\n|age (mean (SD)) |41.07 (4.11) |40.88 (4.29) |0.046 |\n|female = 1 (%) |203 (75.46) |1356 (78.47) |0.071 |\n|prevDMTefficacy (%) | | |0.084 |\n|None |105 (39.03) |634 (36.69) | |\n|Low_efficacy |22 ( 8.18) |181 (10.47) | |\n|Medium_high_efficacy |142 (52.79) |913 (52.84) | |\n|prerelapse_num (mean (SD)) |0.50 (0.69) |0.51 (0.71) |0.012 |\n\n\n:::\n:::\n\n\n#### Propensity Score Stratum #5\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntab1.strata5 <- CreateTableOne(vars, data = dat %>% filter(ps.strata == 5), \n factorVars = c(\"female\", \"prevDMTefficacy\"), \n strata = \"treatment\", test = FALSE)\n\ntab1.strata5.print <- print(tab1.strata5, catDigits = 2, contDigits = 2, \n smd = TRUE)\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n| |DMT0 |DMT1 |SMD |\n|:--------------------------|:------------|:------------|:-----|\n|n |121 |1874 | |\n|age (mean (SD)) |33.26 (4.95) |32.04 (5.58) |0.233 |\n|female = 1 (%) |100 (82.64) |1543 (82.34) |0.008 |\n|prevDMTefficacy (%) | | |0.050 |\n|None |32 (26.45) |455 (24.28) | |\n|Low_efficacy |12 ( 9.92) |194 (10.35) | |\n|Medium_high_efficacy |77 (63.64) |1225 (65.37) | |\n|prerelapse_num (mean (SD)) |0.52 (0.66) |0.52 (0.73) |0.004 |\n\n\n:::\n:::\n\n\n### Estimating and pooling of stratum-specific treatment effects\n\nThe overall ATT across strata can be estimated by weighting stratum-specific estimates by the proportion of treated patients in each stratum over all treated patients in the sample.\n\nWe first define a function `att.strata.function()` to calculate stratum-specific estimates of the treatment effect:\n\n\n::: {.cell}\n\n```{.r .cell-code}\natt.strata.function <- function(data, stratum, confint = TRUE) {\n\n fit <- glm(\"y ~ treatment + offset(log(years))\",\n family = poisson(link = \"log\"),\n data = data %>% filter(ps.strata == stratum))\n\n arr <- round(as.numeric(exp(coef(fit)[\"treatmentDMT1\"])), digits = 3)\n ll <- ul <- NA\n \n if (confint) {\n ll <- round(exp(confint(fit))[\"treatmentDMT1\",1], digits = 3)\n ul <- round(exp(confint(fit))[\"treatmentDMT1\",2], digits = 3)\n }\n \n return(c(\"stratum\" = stratum,\n \"arr\" = arr,\n \"ci_lower\" = ll,\n \"ci_upper\" = ul))\n}\n\narr.strata <- as.data.frame(t(sapply(1:5, att.strata.function, data = dat)))\narr.strata\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n stratum arr ci_lower ci_upper\n1 1 0.904 0.760 1.076\n2 2 0.822 0.696 0.975\n3 3 0.798 0.666 0.961\n4 4 0.716 0.587 0.881\n5 5 0.589 0.463 0.761\n```\n\n\n:::\n:::\n\n\nSubsequently, we define a function `weights.strata.function()` to calculate the weights for each stratum. The weight is the proportion of treated patients in each stratum over all treated patients in the sample:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nweights.strata.function <- function(data, stratum) {\n n_DMT1_stratum <- nrow(data %>% filter(ps.strata == stratum & treatment == \"DMT1\"))\n n_DMT1_all <- nrow(data %>% filter(treatment == \"DMT1\"))\n weight <- n_DMT1_stratum/n_DMT1_all\n return(c(\"stratum\" = stratum, \"weight\" = weight))\n}\n\nweights.strata <- as.data.frame(t(sapply(1:5, weights.strata.function, data = dat)))\nweights.strata\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n stratum weight\n1 1 0.1429870\n2 2 0.1815584\n3 3 0.2076623\n4 4 0.2244156\n5 5 0.2433766\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create table with ARRs and weights for each PS stratum\narr.weights.merged <- merge(arr.strata, weights.strata, by = \"stratum\")\n\n# Calculate the weighted ARR for each stratum\narr.weights.merged <- arr.weights.merged %>%\n mutate(weighted.arr = as.numeric(arr) * weight)\n\n# Sum the weighted ARRs across strata to get the overall ATT\nsum(arr.weights.merged$weighted.arr)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.7482462\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n:::\n\n\nWe now define a new function `ps.stratification.bootstrap()` that integrates estimation of the ATT and the PS weights for bootstrapping purposes:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nps.stratification.bootstrap <- function(data, inds) {\n d <- data[inds,]\n \n d$ps.strata <- cut(d$ps, \n breaks = c(quantile(dat$ps, probs = seq(0, 1, by = 0.2))),\n labels = seq(5),\n include.lowest = TRUE)\n \n arr.strata <- as.data.frame(t(sapply(1:5, att.strata.function, \n data = d, confint = FALSE)))\n \n weights.strata <- as.data.frame(t(sapply(1:5, weights.strata.function, data = d)))\n \n return(arr.strata$arr[1] * weights.strata$weight[1] + \n arr.strata$arr[2] * weights.strata$weight[2] +\n arr.strata$arr[3] * weights.strata$weight[3] + \n arr.strata$arr[4] * weights.strata$weight[4] +\n arr.strata$arr[5] * weights.strata$weight[5]) \n}\n```\n:::\n\n\nWe can now estimate the treatment effect and its confidence interval using the bootstrap procedure:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(boot)\n\nset.seed(1854)\narr.stratification.boot <- boot(data = dat, \n statistic = ps.stratification.bootstrap, \n R = 1000)\n```\n:::\n\n\nWe can summarize the bootstrap samples as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Bootstrap estimate of the ARR\nmedian(arr.stratification.boot$t)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.7558609\n```\n\n\n:::\n\n```{.r .cell-code}\n# Bootstrap 95% CI of the ARR\nboot.ci(arr.stratification.boot, conf = 0.95, type = \"perc\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS\nBased on 1000 bootstrap replicates\n\nCALL : \nboot.ci(boot.out = arr.stratification.boot, conf = 0.95, type = \"perc\")\n\nIntervals : \nLevel Percentile \n95% ( 0.6833, 0.8366 ) \nCalculations and Intervals on Original Scale\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n:::\n\n\n## Propensity score weighting\n\n### Calculate propensity score weights for ATT\n\nPropensity score weighting reweights the study sample to generate an artificial population (i.e., pseudo-population) in which the covariates are no longer associated with treatment, thereby removing confounding by measured covariates. For the ATT, the weight for all treated patients is set to one. Conversely, the weight for patients in the control group is set to the propensity score divided by one minus the propensity score, that is, (PS/(1 − PS)). We estimated stabilized weights to address extreme weights.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(WeightIt)\n\nw.out <- weightit(treatment ~ age + female + prevDMTefficacy + prerelapse_num,\n data = dat,\n method = \"ps\",\n estimand = \"ATT\")\n #stabilize = TRUE)\n\nw.out\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nA weightit object\n - method: \"glm\" (propensity score weighting with GLM)\n - number of obs.: 10000\n - sampling weights: none\n - treatment: 2-category\n - estimand: ATT (focal: DMT1)\n - covariates: age, female, prevDMTefficacy, prerelapse_num\n```\n\n\n:::\n\n```{.r .cell-code}\nsummary(w.out)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n Summary of weights\n\n- Weight ranges:\n\n Min Max\nDMT0 0.4772 |---------------------------| 48.6856\nDMT1 1.0000 || 1.0000\n\n- Units with the 5 most extreme weights by group:\n \n 9492 8836 6544 9610 4729\n DMT0 32.1027 32.1027 34.3126 38.1817 48.6856\n 5 4 3 2 1\n DMT1 1 1 1 1 1\n\n- Weight statistics:\n\n Coef of Var MAD Entropy # Zeros\nDMT0 1.098 0.673 0.383 0\nDMT1 0.000 0.000 0.000 0\n\n- Effective Sample Sizes:\n\n DMT0 DMT1\nUnweighted 2300. 7700\nWeighted 1043.16 7700\n```\n\n\n:::\n\n```{.r .cell-code}\nplot(summary(w.out))\n```\n\n::: {.cell-output-display}\n![](chapter_06_files/figure-html/unnamed-chunk-34-1.png){width=672}\n:::\n:::\n\n\n### Assess balance in the weighted sample\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbal.tab(w.out, stats = c(\"m\", \"v\"), thresholds = c(m = .05))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBalance Measures\n Type Diff.Adj M.Threshold\nprop.score Distance -0.0045 Balanced, <0.05\nage Contin. 0.0054 Balanced, <0.05\nfemale Binary 0.0005 Balanced, <0.05\nprevDMTefficacy_None Binary -0.0003 Balanced, <0.05\nprevDMTefficacy_Low_efficacy Binary 0.0023 Balanced, <0.05\nprevDMTefficacy_Medium_high_efficacy Binary -0.0020 Balanced, <0.05\nprerelapse_num Contin. -0.0034 Balanced, <0.05\n V.Ratio.Adj\nprop.score 0.9926\nage 1.0102\nfemale .\nprevDMTefficacy_None .\nprevDMTefficacy_Low_efficacy .\nprevDMTefficacy_Medium_high_efficacy .\nprerelapse_num 1.0941\n\nBalance tally for mean differences\n count\nBalanced, <0.05 7\nNot Balanced, >0.05 0\n\nVariable with the greatest mean difference\n Variable Diff.Adj M.Threshold\n age 0.0054 Balanced, <0.05\n\nEffective sample sizes\n DMT0 DMT1\nUnadjusted 2300. 7700\nAdjusted 1043.16 7700\n```\n\n\n:::\n:::\n\n\n### Estimate the ATT\n\nOne way to estimate the ATT is to use the survey package. The function `svyglm()` generates model-robust (Horvitz-Thompson-type) standard errors by default, and thus does not require additional adjustments.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(survey)\n\nweighted.data <- svydesign(ids = ~1, data = dat, weights = ~w.out$weights)\n\nweighted.fit <- svyglm(y ~ treatment + offset(log(years)),\n family = poisson(link = \"log\"),\n design = weighted.data)\n\nexp(coef(weighted.fit)[\"treatmentDMT1\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntreatmentDMT1 \n 0.7083381 \n```\n\n\n:::\n\n```{.r .cell-code}\nexp(confint(weighted.fit))[\"treatmentDMT1\",] \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n 2.5 % 97.5 % \n0.6245507 0.8033662 \n```\n\n\n:::\n:::\n\n::: {.cell}\n\n:::\n\n\nAs indicated above, propensity score weighting yielded an ATT estimate of 0.71 (95% CI: 0.62; 0.8).\n\nAn alternative approach is to use `glm()` to estimate the treatment effect and calculate robust standard errors.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Alternative way to estimate treatment effect\nweighted.fit2 <- glm(y ~ treatment + offset(log(years)),\n family = poisson(link = \"log\"),\n data = dat,\n weights = w.out$weights)\n\n# Extract the estimated ARR\nexp(coef(weighted.fit2))[\"treatmentDMT1\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntreatmentDMT1 \n 0.7083381 \n```\n\n\n:::\n\n```{.r .cell-code}\n# Calculate robust standard error and p-value of the log ARR\ncoeftest(weighted.fit2, vcov. = vcovHC)[\"treatmentDMT1\",]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n Estimate Std. Error z value Pr(>|z|) \n-3.448337e-01 6.442745e-02 -5.352280e+00 8.685284e-08 \n```\n\n\n:::\n\n```{.r .cell-code}\n# Derive 95% confidence interval of the ARR\nexp(lmtest::coefci(weighted.fit2, \n level = 0.95, # 95% confidence interval\n vcov. = vcovHC)[\"treatmentDMT1\",])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n 2.5 % 97.5 % \n0.6243094 0.8036767 \n```\n\n\n:::\n:::\n\n::: {.cell}\n\n:::\n\n\nUsing this approach, the ATT estimate was 0.71 (95% CI: 0.62; 0.8).\n\n## Regression adjustment for the propensity score for the ATE\n\nIn this approach, a regression model is fitted to describe the observed outcome as a function of the received treatment and the estimated propensity score:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nps.reg.fit <- glm(y ~ treatment + ps + offset(log(years)),\n family = poisson(link = \"log\"),\n data = dat)\n\nsummary(ps.reg.fit)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n\nCall:\nglm(formula = y ~ treatment + ps + offset(log(years)), family = poisson(link = \"log\"), \n data = dat)\n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \n(Intercept) -1.99585 0.10359 -19.266 < 2e-16 ***\ntreatmentDMT1 -0.25598 0.04431 -5.777 7.60e-09 ***\nps 1.07521 0.13878 7.748 9.36e-15 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for poisson family taken to be 1)\n\n Null deviance: 7514.7 on 9999 degrees of freedom\nResidual deviance: 7443.0 on 9997 degrees of freedom\nAIC: 12378\n\nNumber of Fisher Scoring iterations: 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# ATE\nexp(coef(ps.reg.fit))[\"treatmentDMT1\"] \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntreatmentDMT1 \n 0.7741606 \n```\n\n\n:::\n:::\n\n::: {.cell}\n::: {.cell-output .cell-output-stderr}\n\n```\nWaiting for profiling to be done...\nWaiting for profiling to be done...\n```\n\n\n:::\n:::\n\n\nBootstrapped confidence intervals can be obtained as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Function to bootstrap for 95% CIs\nps.reg.bootstrap <- function(data, inds) {\n d <- data[inds,]\n \n fit <- glm(y ~ treatment + ps + offset(log(years)),\n family = poisson(link = \"log\"),\n data = d)\n \n return(exp(coef(fit))[\"treatmentDMT1\"])\n}\n\nset.seed(1854)\n\n# Generate 1000 bootstrap replicates\narr.boot <- boot(dat, statistic = ps.reg.bootstrap, R = 1000) \n\n# Extract the median annualized relapse rate across 1000 bootstrap replicates\nmedian(arr.boot$t) \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.7750426\n```\n\n\n:::\n\n```{.r .cell-code}\n# Bootstrap 95% CI of the ARR\nboot.ci(arr.boot, conf = 0.95, type = \"perc\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS\nBased on 1000 bootstrap replicates\n\nCALL : \nboot.ci(boot.out = arr.boot, conf = 0.95, type = \"perc\")\n\nIntervals : \nLevel Percentile \n95% ( 0.7007, 0.8547 ) \nCalculations and Intervals on Original Scale\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n:::\n\n\n## Overview\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n\n|Method |Estimand | Estimate| 95% CI (lower)| 95% CI (upper)|\n|:----------------------------------------------------|:--------|---------:|--------------:|--------------:|\n|Optimal full matching |ATT | 0.7715105| 0.5663852| 0.9766358|\n|Propensity score stratification |ATT | 0.7482462| NA| NA|\n|Propensity score stratification (with bootstrapping) |ATT | 0.7558609| 0.6832955| 0.8365798|\n|Propensity score weighting |ATT | 0.7083381| 0.6245507| 0.8033662|\n|Propensity score weighting (robust SE) |ATT | 0.7083381| 0.6243094| 0.8036767|\n|PS regression adjustment |ATE | 0.7741606| 0.7101080| 0.8448218|\n|PS regression adjustment (bootstrapping) |ATE | 0.7750426| 0.7006837| 0.8546834|\n\n\n:::\n:::\n\n\n## Version info {.unnumbered}\nThis chapter was rendered using the following version of R and its packages:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\nR version 4.4.1 (2024-06-14 ucrt)\nPlatform: x86_64-w64-mingw32/x64\nRunning under: Windows 10 x64 (build 19045)\n\nMatrix products: default\n\n\nlocale:\n[1] LC_COLLATE=English_United Kingdom.utf8 \n[2] LC_CTYPE=English_United Kingdom.utf8 \n[3] LC_MONETARY=English_United Kingdom.utf8\n[4] LC_NUMERIC=C \n[5] LC_TIME=English_United Kingdom.utf8 \n\ntime zone: Europe/Paris\ntzcode source: internal\n\nattached base packages:\n[1] grid stats graphics grDevices utils datasets methods \n[8] base \n\nother attached packages:\n [1] WeightIt_1.3.1 boot_1.3-31 MatchIt_4.5.5 \n [4] sandwich_3.1-1 truncnorm_1.0-9 tableone_0.13.2 \n [7] survey_4.4-2 survival_3.7-0 Matrix_1.7-0 \n[10] MASS_7.3-61 marginaleffects_0.23.0 lmtest_0.9-40 \n[13] zoo_1.8-12 knitr_1.48 ggplot2_3.5.1 \n[16] data.table_1.15.4 cobalt_4.5.5 dplyr_1.1.4 \n\nloaded via a namespace (and not attached):\n [1] gtable_0.3.5 xfun_0.45 htmlwidgets_1.6.4 insight_0.20.4 \n [5] lattice_0.22-6 vctrs_0.6.5 tools_4.4.1 generics_0.1.3 \n [9] tibble_3.2.1 proxy_0.4-27 fansi_1.0.6 pkgconfig_2.0.3 \n[13] checkmate_2.3.2 lifecycle_1.0.4 compiler_4.4.1 farver_2.1.2 \n[17] munsell_0.5.1 mitools_2.4 codetools_0.2-20 htmltools_0.5.8.1\n[21] class_7.3-22 yaml_2.3.8 pillar_1.9.0 crayon_1.5.3 \n[25] rlemon_0.2.1 tidyselect_1.2.1 digest_0.6.36 labeling_0.4.3 \n[29] forcats_1.0.0 splines_4.4.1 labelled_2.13.0 fastmap_1.2.0 \n[33] colorspace_2.1-0 cli_3.6.3 magrittr_2.0.3 optmatch_0.10.8 \n[37] utf8_1.2.4 e1071_1.7-14 withr_3.0.1 scales_1.3.0 \n[41] backports_1.5.0 rmarkdown_2.28 chk_0.9.2 hms_1.1.3 \n[45] evaluate_0.24.0 haven_2.5.4 rlang_1.1.4 Rcpp_1.0.13 \n[49] glue_1.7.0 DBI_1.2.3 rstudioapi_0.16.0 jsonlite_1.8.8 \n[53] R6_2.5.1 \n```\n\n\n:::\n:::\n\n\n## References {.unnumbered}\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/chapter_06/figure-html/unnamed-chunk-12-1.png b/_freeze/chapter_06/figure-html/unnamed-chunk-12-1.png index b772845..4dd3261 100644 Binary files a/_freeze/chapter_06/figure-html/unnamed-chunk-12-1.png and b/_freeze/chapter_06/figure-html/unnamed-chunk-12-1.png differ diff --git a/_freeze/chapter_06/figure-html/unnamed-chunk-12-2.png b/_freeze/chapter_06/figure-html/unnamed-chunk-12-2.png index 4983d16..0a7dbf8 100644 Binary files a/_freeze/chapter_06/figure-html/unnamed-chunk-12-2.png and b/_freeze/chapter_06/figure-html/unnamed-chunk-12-2.png differ diff --git a/_freeze/chapter_06/figure-html/unnamed-chunk-12-3.png b/_freeze/chapter_06/figure-html/unnamed-chunk-12-3.png index 889a0ed..f3d12ef 100644 Binary files a/_freeze/chapter_06/figure-html/unnamed-chunk-12-3.png and b/_freeze/chapter_06/figure-html/unnamed-chunk-12-3.png differ diff --git a/_freeze/chapter_06/figure-html/unnamed-chunk-9-1.png b/_freeze/chapter_06/figure-html/unnamed-chunk-9-1.png index 2b9c26c..62dc8e3 100644 Binary files a/_freeze/chapter_06/figure-html/unnamed-chunk-9-1.png and b/_freeze/chapter_06/figure-html/unnamed-chunk-9-1.png differ diff --git a/chapter_06.qmd b/chapter_06.qmd index 0b05896..7fa75ec 100644 --- a/chapter_06.qmd +++ b/chapter_06.qmd @@ -297,7 +297,7 @@ plot(opt, type = "density", which.xs = vars) ### Estimating the ATT -We can estimate the ATT in the matched sample using Poisson regression in which the number of post-treatment relapses is regressed on treatment status and follow-up time for each patient (captured by the variable `years`). More details are provided at \url{https://cran.r-project.org/web/packages/MatchIt/vignettes/estimating-effects.html}. +We can estimate the ATT in the matched sample using Poisson regression in which the number of post-treatment relapses is regressed on treatment status and follow-up time for each patient (captured by the variable `years`). More details are provided at [this link](https://cran.r-project.org/web/packages/MatchIt/vignettes/estimating-effects.html). ```{r} # Matched data @@ -310,12 +310,12 @@ opt.fit <- glm(y ~ treatment + offset(log(years)), weights = weights) # Treatment effect estimation -opt.comp <- comparisons(opt.fit, - variables = "treatment", - vcov = ~subclass, - newdata = subset(matched.data, treatment == "DMT1"), - wts = "weights", - transform_pre = "ratio") +opt.comp <- avg_comparisons(opt.fit, + variables = "treatment", + vcov = ~subclass, + newdata = subset(matched.data, treatment == "DMT1"), + wts = "weights", + comparison = "ratio") opt.comp |> tidy() ``` diff --git a/chapter_06_files/figure-html/unnamed-chunk-12-1.png b/chapter_06_files/figure-html/unnamed-chunk-12-1.png index b772845..4dd3261 100644 Binary files a/chapter_06_files/figure-html/unnamed-chunk-12-1.png and b/chapter_06_files/figure-html/unnamed-chunk-12-1.png differ diff --git a/chapter_06_files/figure-html/unnamed-chunk-12-2.png b/chapter_06_files/figure-html/unnamed-chunk-12-2.png index 4983d16..0a7dbf8 100644 Binary files a/chapter_06_files/figure-html/unnamed-chunk-12-2.png and b/chapter_06_files/figure-html/unnamed-chunk-12-2.png differ diff --git a/chapter_06_files/figure-html/unnamed-chunk-12-3.png b/chapter_06_files/figure-html/unnamed-chunk-12-3.png index 889a0ed..f3d12ef 100644 Binary files a/chapter_06_files/figure-html/unnamed-chunk-12-3.png and b/chapter_06_files/figure-html/unnamed-chunk-12-3.png differ diff --git a/chapter_06_files/figure-html/unnamed-chunk-9-1.png b/chapter_06_files/figure-html/unnamed-chunk-9-1.png index 2b9c26c..62dc8e3 100644 Binary files a/chapter_06_files/figure-html/unnamed-chunk-9-1.png and b/chapter_06_files/figure-html/unnamed-chunk-9-1.png differ diff --git a/docs/chapter_06.html b/docs/chapter_06.html index 29c0447..8bc97f7 100644 --- a/docs/chapter_06.html +++ b/docs/chapter_06.html @@ -2,7 +2,7 @@ - + @@ -22,7 +22,7 @@ } /* CSS for syntax highlighting */ pre > code.sourceCode { white-space: pre; position: relative; } -pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } +pre > code.sourceCode > span { line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } @@ -81,7 +81,12 @@ "collapse-after": 3, "panel-placement": "start", "type": "textbox", - "limit": 20, + "limit": 50, + "keyboard-shortcut": [ + "f", + "/", + "s" + ], "language": { "search-no-results-text": "No results", "search-matching-documents-text": "matching documents", @@ -90,6 +95,7 @@ "search-more-match-text": "more match in this document", "search-more-matches-text": "more matches in this document", "search-clear-button-title": "Clear", + "search-text-placeholder": "", "search-detached-cancel-button-title": "Cancel", "search-submit-button-title": "Submit", "search-label": "Search" @@ -103,7 +109,7 @@
-