diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..e5c3b28 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,29 @@ +# How to Contribute + +We'd love to accept your patches and contributions to this project. There are +just a few small guidelines you need to follow. + +## Contributor License Agreement + +Contributions to this project must be accompanied by a Contributor License +Agreement (CLA). You (or your employer) retain the copyright to your +contribution; this simply gives us permission to use and redistribute your +contributions as part of the project. Head over to + to see your current agreements on file or +to sign a new one. + +You generally only need to submit a CLA once, so if you've already submitted one +(even if it was for a different project), you probably don't need to do it +again. + +## Code Reviews + +All submissions, including submissions by project members, require review. We +use GitHub pull requests for this purpose. Consult +[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more +information on using pull requests. + +## Community Guidelines + +This project follows +[Google's Open Source Community Guidelines](https://opensource.google/conduct/). diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..c1a4fca --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,46 @@ +Type: Package +Package: bsynth +Title: Bayesian Synthetic Control +Version: 0.0.0.9000 +Authors@R: + person(given = "Ignacio", + family = "Martinez", + role = c("aut", "cre"), + email = "martinezig@google.com", + comment = c(ORCID = "0000-0002-3721-8172")) +Description: Provides causal inference with a Bayesian synthetic control method. +License: file LICENSE +URL: https://github.com/google/bsynth +BugReports: https://github.com/google/bsynth/issues +Depends: + R (>= 3.4.0) +Imports: + R6, + Rcpp, + RcppParallel, + cubelyr, + dplyr, + ggplot2, + glue, + magrittr, + methods, + purrr, + rstan, + rstantools, + scales, + tibble, + tidyr, + vizdraws +Suggests: + knitr, + rmarkdown, + roxygen2, + gsynth +VignetteBuilder: knitr +Encoding: UTF-8 +LazyData: true +Roxygen: list(markdown = TRUE) +RoxygenNote: 7.1.1 +Config/testthat/edition: 3 +Biarch: true +Date/Publication: 2021-08-13 21:14 UTC diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..ecc4cfd --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,9 @@ +# Generated by roxygen2: do not edit by hand + +export("%>%") +export(bayesianFactor) +export(bayesianSynth) +import(Rcpp) +import(methods) +importFrom(magrittr,"%>%") +importFrom(rstan,sampling) diff --git a/R/bfactor.R b/R/bfactor.R new file mode 100644 index 0000000..bbe8db7 --- /dev/null +++ b/R/bfactor.R @@ -0,0 +1,603 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' Create a Bayesian Synthetic Control Object Using Panel Data +#' +#' @description +#' +#' A Bayesian Factor Model has raw data and draws from the posterior +#' distribution. This is represented by an R6 Class. +#' +#' Code and theory based on +#' [Pinkney 2021](https://arxiv.org/abs/2103.16244). +#' +#' public methods: +#' +#' * `initialize()` initializes the variables and model parameters +#' * `fit()` fits the stan model and returns a fit object +#' * `updateWidth` updates the width of the credible interval +#' * `placeboPlot` generates a counterfactual placebo plot +#' * `effectPlot` returns a plot of the treatment effect over time +#' * `summarizeLift`returns descriptive statistics of the lift estimate +#' * `biasDraws` returns a plot of the relative bias in a LFM +#' * `liftDraws` returns a plot of the posterior lift distribution +#' * `liftBias` returns a plot of the relative bias given a lift offset +#' +#' @param time Name of the variable in the data frame that +# ` identifies the time period (e.g. year, month, week etc). +#' @param id Name of the variable in the data frame that +#' identifies the units (e.g. country, region etc). +#' @param treated Name of the variable in the data frame that contains the +#' treatment assignment of the intervention. +#' @param data Long data.frame object with fields outcome, time, id, +#' and treatment indicator. +#' @param outcome Name of the outcome variable. +#' @param ci_width Credible interval's width. This number is in the +#' (0,1) interval. +#' @param covariates Dataframe with a column for {id} and the other columns +#' Defaults to NULL if no covariates should be included in the model. +#' @export +bayesianFactor <- R6::R6Class( + classname = "bayesianFactor", + private = list( + time = NULL, + id = NULL, + treated = NULL, + data = NULL, + treated_ids = NULL, + time_tiles = NULL, + intervention = NULL, + outcome = NULL, + ci_width = NULL, + stan_model = NULL, + covariates = NULL, + fitted = NULL, + plot_data = NULL, + y_synth_draws = NULL + ), + active = list( + #' @field timeTiles ggplot2 object that shows when + #' the intervention happened. + timeTiles = function() { + return(private$time_tiles) + }, + + #' @field plotData tibble with the observed outcome and the + #' counterfactual data. + plotData = function() { + return(private$plot_data) + }, + + #' @field interventionTime returns the intervention time period. + interventionTime = function() { + return(private$intervention) + }, + + #' @field synthetic ggplot2 object that shows the + #' observed and counterfactual outcomes over time. + synthetic = function() { + df_plot <- private$plot_data %>% + dplyr::rename( + Observed = !!private$outcome, + Synthetic = y_synth + ) %>% + dplyr::select(!!private$time, Observed, Synthetic, LB, UB) %>% + tidyr::pivot_longer(cols = c(Observed, Synthetic)) + + synthetic_plot <- ggplot2::ggplot( + data = df_plot, + ggplot2::aes(x = !!private$time) + ) + + ggplot2::geom_line(ggplot2::aes(y = value, linetype = name)) + + ggplot2::geom_ribbon(ggplot2::aes(ymin = LB, ymax = UB), + color = "gray", + alpha = 0.2 + ) + + ggplot2::theme_bw(base_size = 14) + + ggplot2::theme( + legend.title = ggplot2::element_blank(), + legend.position = c(0.9, .1), + legend.background = ggplot2::element_rect( + fill = + ggplot2::alpha("white", 0) + ), + panel.border = ggplot2::element_blank(), + axis.line = ggplot2::element_line() + ) + + ggplot2::geom_vline( + xintercept = private$intervention, + linetype = "dashed" + ) + + return(synthetic_plot) + } + ), + public = list( + #' @description + #' Create a new bayesianFactor object. + #' @details params described in the data structure section of the + #' documentation of the R6 class at the top of the file. + #' @return A new `bayesianFactor` object. + initialize = function(data, + time, + id, + treated, + outcome, + ci_width = 0.75, + covariates) { + stopifnot((ci_width > 0 & ci_width < 1)) + private$time <- rlang::enquo(time) + private$id <- rlang::enquo(id) + private$treated <- rlang::enquo(treated) + if (!missing(covariates)) { + # TODO(jvives): ADD SOME CHECKS (no missing data, sorted, etc) + # Check that covariates contain no missing values. + if (covariates %>% dplyr::select(-{{id}}) %>% any(is.na())) { + stop("Covariates have missing data.") + } + + # Check that covariates are ordered the same way as input data. + # if (covariates %>% dplyr::select({{id}}) + # stopif(covariates %>% + # dplyr::select(-{{id}}) %>% + # any(is.na())) + + private$stan_model <- bsynth.models::factor_model_with_covariates + } else { + private$stan_model <- bsynth.models::factor_model_without_covariates + } + + private$data <- data %>% + dplyr::mutate( + status = dplyr::case_when( + {{ treated }} == 1 ~ "Treated", + {{ treated }} == 0 ~ "Untreated", + is.na({{ treated }}) ~ "N/A" + ), + status = factor(status, levels = c( + "Treated", + "Untreated", + "N/A" + )) + ) + if (!(data %>% dplyr::pull({{ id }}) %>% is.factor())) { + private$data <- private$data %>% + dplyr::mutate(!!rlang::quo_name(private$id) := factor({{ id }})) + } + + private$treated_ids <- private$data %>% + dplyr::filter(!!private$treated == 1) %>% + dplyr::select({{ id }}) %>% + dplyr::distinct() %>% + dplyr::pull({{ id }}) + + private$time_tiles <- time_tiles( + data = private$data, + time = !!private$time, + id = !!private$id, + status = status + ) + + private$intervention <- private$data %>% + dplyr::filter(status == "Treated") %>% + dplyr::summarise(!!rlang::as_label(private$time) := min(!!private$time)) %>% + dplyr::pull(!!private$time) + + private$outcome <- rlang::enquo(outcome) + private$updateWidth(ci_width) + + # TODO(jvives): Add support to handle more than one treated unit. + if (length(private$treated_ids) != 1) { + stop("Support for more than one treated unit is not yet avaialble.") + } + }, + + #' @description + #' Fit Stan model. + #' @param L Number of factors. + #' @param ... other arguments passed to [rstan::sampling()]. + fit = function(L = 8, ...) { + y_treated_pre <- private$data %>% + dplyr::filter( + !!private$time < private$intervention, + !!private$id %in% private$treated_ids + ) %>% + dplyr::pull(!!private$outcome) + + y_donors_pre <- private$data %>% + dplyr::filter( + !!private$time < private$intervention, + !(!!private$id %in% private$treated_ids) + ) %>% + dplyr::select(!!private$id, !!private$outcome, !!private$time) %>% + tidyr::pivot_wider( + names_from = !!private$id, + values_from = !!private$outcome + ) %>% + dplyr::arrange(!!private$time) %>% + dplyr::select(-!!private$time) %>% + dplyr::select(sort(tidyselect::peek_vars())) %>% + as.matrix() %>% + t() + + y_donors_post <- private$data %>% + dplyr::filter( + !!private$time >= private$intervention, + !(!!private$id %in% private$treated_ids) + ) %>% + dplyr::select(!!private$id, !!private$outcome, !!private$time) %>% + tidyr::pivot_wider( + names_from = !!private$id, + values_from = !!private$outcome + ) %>% + dplyr::arrange(!!private$time) %>% + dplyr::select(-!!private$time) %>% + dplyr::select(sort(tidyselect::peek_vars())) %>% + as.matrix() %>% + t() + + if (!is.null(private$covariates)) { + stan_data <- list( + L = L, + N = length(y_treated_pre), + y_treated_pre = y_treated_pre, + J = nrow(y_donors_pre), + y_donors_pre = y_donors_pre, + N_pred = ncol(y_donors_post), + y_donors_post = y_donors_post, + P = nrow(private$covariates), + X = private$covariates + ) + } else { + stan_data <- list( + L = L, + N = length(y_treated_pre), + y_treated_pre = y_treated_pre, + J = nrow(y_donors_pre), + y_donors_pre = y_donors_pre, + N_pred = ncol(y_donors_post), + y_donors_post = y_donors_post + ) + } + + private$fitted <- + rstan::sampling(private$stan_model, + data = stan_data, + ... + ) + + timeXwalk <- private$data %>% + dplyr::select(!!private$time) %>% + dplyr::distinct() %>% + dplyr::arrange(!!private$time) %>% + dplyr::mutate(idx = seq_len(dplyr::n())) + + private$y_synth_draws <- + .get_par_long(fit = private$fitted, par = synth_out) %>% + dplyr::inner_join(timeXwalk, by = "idx") %>% + dplyr::rename(y_synth = synth_out) + + # TODO(jvives): Rename synth_out to y_sim. + # private$fitted <- private$fitted %>% + # dplyr::rename_all( + # funs(stringr::str_replace_all(., "synth_out", "y_sim")) + # ) + + wide_df <- .makeWide( + data = private$data, + id = private$id, + time = private$time, + outcome = private$outcome, + treatment = private$treated + ) + + pre_data <- wide_df %>% + dplyr::filter(!!private$time < private$intervention) + + post_data <- wide_df %>% + dplyr::filter(!!private$time >= private$intervention) + + # Add outcome variable to y_synth_draws. + # TODO(jvives): Integrate this step in get_synth_draws function. + pre_outcome <- pre_data %>% + dplyr::select(!!private$outcome, !!private$time) + post_outcome <- post_data %>% + dplyr::select(!!private$outcome, !!private$time) + + y <- dplyr::bind_rows(pre_outcome, post_outcome) + private$y_synth_draws <- dplyr::full_join( + private$y_synth_draws, y, + by = rlang::as_name(private$time)) + + private$plot_data <- + .get_plot_df( + y_synth_draws = private$y_synth_draws, + pre_data = pre_data, + post_data = post_data, + ci = private$ci_width, + time = private$time, + outcome = private$outcome + ) + }, + + #' @description + #' Update the width of the credible interval. + #' @param ci_width New width for the credible interval. This number + #' should be in the (0,1) interval. + updateWidth = function(ci_width = 0.75) { + stopifnot((ci_width > 0 & ci_width < 1)) + private$ci_width <- ci_width + + wide_df <- .makeWide( + data = private$data, + id = private$id, + time = private$time, + outcome = private$outcome, + treatment = private$treated + ) + + pre_data <- wide_df %>% + dplyr::filter(!!private$time < private$intervention) + + post_data <- wide_df %>% + dplyr::filter(!!private$time >= private$intervention) + + private$plot_data <- + .get_plot_df( + y_synth_draws = private$y_synth_draws, + pre_data = pre_data, + post_data = post_data, + ci = private$ci_width, + time = private$time, + outcome = private$outcome + ) + }, + + #' @description summarizeLift returns descriptive statistics of + #' the lift estimate. + summarizeLift = function() { + if (is.null(private$lift_draws)) { + stop("You first need to run the `liftDraws()` method.") + } + + return({ + c( + point = mean(private$lift_draws$lift), + lower_bound = quantile( + private$lift_draws$lift, + (1 - private$ci_width) / 2 + ), + upper_bound = quantile( + private$lift_draws$lift, + 1 - (1 - private$ci_width) / 2 + ) + ) + }) + }, + + #' @description effectPlot returns ggplot2 object that shows the + #' effect of the intervention over time. + effectPlot = function() { + tau_plot <- .plot_tau( + data = private$plot_data, + x = private$time, + y = tau, + ymin = tau_LB, + ymax = tau_UB, + xintercept = private$intervention + ) + + return(tau_plot) + }, + + #' @description + #' Plots lift. + #' @param from First period to consider when calculating lift. If infinite, + #' set to the time of the intervention. + #' @param to Last period to consider when calculating lift. If infinite, set + #' to the last period. + #' @param ... other arguments passed to vizdraws::vizdraws(). + #' @return vizdraws object with the posterior distribution of the lift. + liftDraws = function(from, to, ...) { + if (inherits(from, "Date")) { + if (is.infinite(from)) { + from <- private$intervention + } + if (is.infinite(to)) { + to <- private$data %>% + pull(!!private$time) %>% + max() + } + } + # TODO(jvives): Add check for small values of sum_y0. + lift <- private$y_synth_draws %>% + dplyr::filter(!!private$time >= from, !!private$time <= to) %>% + dplyr::group_by(draw) %>% + dplyr::summarise( + sum_y0 = sum(y_synth), + sum_y1 = sum(!!private$outcome), + lift = (sum_y1 - sum_y0) / sum_y0 + ) %>% pull(lift) + + vizdraws::vizdraws( + posterior = lift, + percentage = TRUE, + quantity = TRUE, + tense = "past", + display_mode_name = TRUE, + title = "Contrafactual Lift", + xlab = "Effect of the Intervention", + units = "the contrafacutal lift", + ... + ) %>% return() + }, + + #' @description + #' Plot bias magnitude in terms of lift for period [firstT, lastT] + #' @param firstT start of the time period to compute relative bias + #' over. They must be after the intervention. + #' @param lastT end of the Time period to compute relative bias + #' over. They must be after the intervention. + #' @param offset Target lift %. + #' @param ... other arguments passed to vizdraws::vizdraws(). + #' @returns vizdraws object with the relative bias with offset. + liftBias = function(firstT, lastT, offset, ...) { + # Calculate the gaps. + # TODO(jvives): Add capability for this to work with covariates. + mad_pre_gaps <- private$y_synth_draws %>% + dplyr::filter(!!private$time < private$intervention) %>% + dplyr::mutate(gap = y_synth - !!private$outcome) %>% + dplyr::group_by(draw) %>% + dplyr::summarise(mad = stats::mad(gap)) + + # Get all time periods in TL. + Ts <- private$y_synth_draws %>% + dplyr::filter(!!private$time <= lastT, !!private$time >= firstT) %>% + dplyr::select(!!private$time) %>% + unique() %>% + dplyr::pull() + + # Get time periods between intervention and first_T. + Ts_from_T0 <- private$y_synth_draws %>% + dplyr::filter(!!private$time <= firstT, + !!private$time >= private$intervention) %>% + dplyr::select(!!private$time) %>% + dplyr::n_distinct() %>% + as.numeric() + T_from_T0 <- Ts_from_T0 + 1 + + # Calculate the estimated treatment effects (taus). + # TODO(jvives): Consolidate functions into one. + taus_TL <- private$y_synth_draws %>% + dplyr::filter(!!private$time >= private$intervention) %>% + dplyr::filter(!!private$time == Ts) %>% + dplyr::group_by(draw) %>% + dplyr::mutate(gapTauTL = abs(sum(y_synth - !!private$outcome))) %>% + dplyr::mutate(sumTauTL = abs(sum(!!private$outcome))) + + # Divide by first period tau. For now just first period. + TL <- length(Ts) + if (TL >= 2) { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate( + bias_ratio = abs(T_from_T0 * 0.5 * TL * (TL - 1) * mad - sumTauTL * + offset) / gapTauTL) %>% + dplyr::filter(bias_ratio < 2.) + } else { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate( + bias_ratio = abs(T_from_T0 * mad - sumTauTL * offset) / gapTauTL) %>% + dplyr::filter(bias_ratio < 2.) + } + + # Return the relative bias vizdraw plot. + biasDrawsViz <- vizdraws::vizdraws( + posterior = rel_bias$bias_ratio, + percentage = TRUE, + quantity = TRUE, + tense = "past", + display_mode_name = TRUE, + title = sprintf( + "Upper bias changing a %#.2f %% Lift", + offset * 100 + ), + xlab = "Bias Relative to Lift", + units = "Bias Relative to Lift", + breaks = c(0.3, 1.), + break_names = c( + "Unlikely", + "Close to change", + "Changes bucket" + ), + colors = c("#4daf4a", "#377eb8", "#e41a1c"), + ... + ) + return(biasDrawsViz) + }, + + #' @description + #' Plots relative upper bias / tau for a time period [firstT, lastT]. + #' @param small_bias Threshold value for considering the bias "small". + #' @param firstT,lastT Time periods to compute relative bias over, they must + #' after the intervention. + #' @return vizdraw object with the posterior distribution of relative bias. + #' Bias is scaled by the time periods. + biasDraws = function(small_bias = 0.3, firstT, lastT) { + # Calculate the gaps. + # TODO(jvives): Add capability for it to work with covariates. + mad_pre_gaps <- private$y_synth_draws %>% + dplyr::filter(!!private$time < private$intervention) %>% + dplyr::mutate(gap = y_synth - !!private$outcome) %>% + dplyr::group_by(draw) %>% + dplyr::summarise(mad = stats::mad(gap)) + + # Get all time periods in TL. + Ts <- private$y_synth_draws %>% + dplyr::filter(!!private$time <= lastT, !!private$time >= firstT) %>% + dplyr::select(!!private$time) %>% + unique() %>% + dplyr::pull() + + # Get time periods between intervention and first_T. + Ts_from_T0 <- private$y_synth_draws %>% + dplyr::filter( + !!private$time <= firstT, + !!private$time >= private$intervention + ) %>% + dplyr::select(!!private$time) %>% + dplyr::n_distinct() %>% + as.numeric() + T_from_T0 <- Ts_from_T0 + 1 + + # Calculate the estimated treatment effects (taus). + taus_TL <- private$y_synth_draws %>% + dplyr::filter(!!private$time >= private$intervention) %>% + dplyr::filter(!!private$time == Ts) %>% + dplyr::group_by(draw) %>% + dplyr::mutate(gapTauTL = abs(sum(y_synth - !!private$outcome))) + + # Divide by first period tau. For now just first period. + TL <- length(Ts) + if (TL >= 2) { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate(bias_ratio = T_from_T0 * 0.5 * TL * (TL - 1) * mad / gapTauTL) %>% + dplyr::filter(bias_ratio < 5.) + } else { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate(bias_ratio = T_from_T0 * mad / gapTauTL) %>% + dplyr::filter(bias_ratio < 5.) + } + + # Return the relative bias vizdraw plot. + biasDrawsViz <- vizdraws::vizdraws( + posterior = rel_bias$bias_ratio, + percentage = TRUE, + quantity = TRUE, + tense = "past", + display_mode_name = TRUE, + title = "Counterfactual Upper Bias", + xlab = "Relative Bias", + units = "Relative Bias", + breaks = c(small_bias, 1.), + break_names = c( + "Small Bias", + "Some Bias", + "Sign Change" + ), + colors = c("#4daf4a", "#377eb8", "#e41a1c") + ) + return(biasDrawsViz) + } + ) # end public methods +) # end class diff --git a/R/bsynth-package.R b/R/bsynth-package.R new file mode 100644 index 0000000..c41316f --- /dev/null +++ b/R/bsynth-package.R @@ -0,0 +1,31 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' The 'bsynth' package. +#' +#' @description Provides causal inference with a Bayesian synthetic +#' control method. +#' +#' @docType package +#' @name bsynth-package +#' @aliases bsynth +#' @useDynLib bsynth, .registration = TRUE +#' @import methods +#' @import Rcpp +#' @importFrom rstan sampling +#' +#' @references +#' Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org +#' +NULL diff --git a/R/factory.R b/R/factory.R new file mode 100644 index 0000000..4e434cd --- /dev/null +++ b/R/factory.R @@ -0,0 +1,949 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' Create a Bayesian Synthetic Control Object Using Panel Data +#' +#' @description +#' A Bayesian Synthetic Control has raw data and draws from the posterior +#' distribution. This is represented by an R6 Class. +#' +#' public methods: +#' +#' * `initialize()` initializes the variables and model parameters +#' * `fit()` fits the stan model and returns a fit object +#' * `updateWidth` updates the width of the credible interval +#' * `placeboPlot` generates a counterfactual placebo plot +#' * `effectPlot` returns a plot of the treatment effect over time +#' * `summarizeLift`returns descriptive statistics of the lift estimate +#' * `biasDraws` returns a plot of the relative bias in a LFM +#' * `liftDraws` returns a plot of the posterior lift distribution +#' * `liftBias` returns a plot of the relative bias given a lift offset +#' Data structure: +#' @param data Long data.frame object with fields outcome, time, id, +#' and treatment indicator. +#' @param covariates Data.frame with time dependent covariates for +#' for each unit and time field. +#' Defaults to NULL if no covariates should be included in the model. +#' @param predictor_match_covariates0 data.frame with time independent +#' covariates on each row and column indicating the control unit names +#' (dim k x J+1). +#' @param predictor_match_covariates1 Vector with time independent +#' covariates for the treated unit (dim k x 1). +#' @param predictor_match Logical that indicates whether or not to run +#' the matching version of the Bayesian Synthetic Control. This option can +#' not be used with gp, covariates or multiple treated units. +#' @param vs Vector of weights for the importance of the predictors used +#' in creating the synthetic control. +#' Defaults to equal weight for all predictors. +#' @param time Name of the variable in the data frame that +#' identifies the time period (e.g. year, month, week etc). +#' @param id Name of the variable in the data frame that +#' identifies the units (e.g. country, region etc). +#' @param treated Name of the variable in the data frame that contains +#' the treatment assignment of the intervention. +#' @param outcome Name of the outcome variable. +#' @param ci_width Credible interval's width. This number is in the +#' [0,1] interval. +#' @param intervention Intervention time period (e.g., year) +#' in which the treatment occurred. +#' @param plot_data Tibble with the observed outcome and the +#' counterfactual data. +#' @param time_Tiles ggplot2 object with a timeline for the intervention. +#' @export +bayesianSynth <- R6::R6Class( + classname = "bayesianSynth", + private = list( + data = NULL, + covariates = NULL, + predictor_match_covariates0 = NULL, + predictor_match_covariates1 = NULL, + predictor_match = NULL, + vs = NULL, + time = NULL, + id = NULL, + treated = NULL, + outcome = NULL, + ci_width = NULL, + intervention = NULL, + fitted = NULL, + plot_data = NULL, + time_tiles = NULL, + stan_model = NULL, + y_synth_draws = NULL, + lift_draws = NULL, + treated_ids = NULL, + mcmc_checks = NULL # a mcmc_checks object + ), + active = list( + + #' @field timeTiles ggplot2 object that shows when + #' the intervention happened. + timeTiles = function() { + return(private$time_tiles) + }, + + #' @field plotData returns tibble with the observed outcome and the + #' counterfactual data. + plotData = function() { + return(private$plot_data) + }, + + #' @field interventionTime returns intervention time period (e.g., year) + #' in which the treatment occurred. + interventionTime = function() { + return(private$intervention) + }, + + #' @field synthetic returns ggplot2 object that shows the + #' observed and counterfactual outcomes over time. + synthetic = function() { + df_plot <- private$plot_data %>% + dplyr::rename( + Observed = !!private$outcome, + Synthetic = y_synth + ) + + if (length(private$treated_ids) == 1) { + df_plot <- df_plot %>% + dplyr::select(!!private$time, Observed, Synthetic, LB, UB) + } else { + df_plot <- df_plot %>% + dplyr::select( + !!private$id, !!private$time, Observed, + Synthetic, LB, UB + ) %>% + dplyr::filter(!!private$id != "Average") + } + df_plot <- df_plot %>% + tidyr::pivot_longer(cols = c(Observed, Synthetic)) + + synthetic_plot <- ggplot2::ggplot( + data = df_plot, + ggplot2::aes(x = !!private$time) + ) + + ggplot2::geom_line( + ggplot2::aes(y = value, linetype = name) + ) + + ggplot2::geom_ribbon( + ggplot2::aes(ymin = LB, ymax = UB), + color = "gray", + alpha = 0.2 + ) + + ggplot2::theme_bw(base_size = 14) + + ggplot2::theme( + legend.title = ggplot2::element_blank(), + legend.position = c(0.9, .1), + legend.background = ggplot2::element_rect( + fill = ggplot2::alpha("white", 0) + ), + panel.border = ggplot2::element_blank(), + axis.line = ggplot2::element_line() + ) + + ggplot2::geom_vline( + xintercept = private$intervention, + linetype = "dashed" + ) + + if (length(private$treated_ids) > 1) { + synthetic_plot <- synthetic_plot + + ggplot2::facet_wrap(dplyr::vars(!!private$id)) + } + + return(synthetic_plot) + }, + + #' @field checks returns MCMC checks. + checks = function() { + private$mcmc_checks + }, + + #' @field lift draws from the posterior distribution of the lift. + lift = function() { + private$lift_draws + } + ), + public = list( + #' @description + #' Create a new bayesianSynth object. + #' @param gp Logical that indicates whether or not to include a + #' Gaussian Process as part of the model. + #' @return A new `bayesianSynth` object. + initialize = function(data, time, id, treated, outcome, ci_width = 0.75, + gp = FALSE, covariates = NULL, + predictor_match = FALSE, + predictor_match_covariates0 = NULL, + predictor_match_covariates1 = NULL, vs = NULL) { + stopifnot((ci_width > 0 & ci_width < 1)) + private$time <- rlang::enquo(time) + private$id <- rlang::enquo(id) + private$treated <- rlang::enquo(treated) + private$predictor_match <- predictor_match + private$predictor_match_covariates0 <- predictor_match_covariates0 + private$predictor_match_covariates1 <- predictor_match_covariates1 + private$vs <- vs # Predictor weights + + message("Transforming data") + + # Check treated has the right input. + if (!setequal( + data %>% + dplyr::select(!!private$treated) %>% dplyr::distinct() %>% + dplyr::pull({{ treated }}), + c(1, 0) + )) { + stop("Treated identifier not binary 1 - 0.") + } + + private$data <- data %>% + dplyr::mutate( + status = dplyr::case_when( + {{ treated }} == 1 ~ "Treated", + {{ treated }} == 0 ~ "Untreated", + is.na({{ treated }}) ~ "N/A" + ), + status = factor(status, levels = c( + "Treated", + "Untreated", + "N/A" + )) + ) + + if (!(data %>% dplyr::pull({{ id }}) %>% is.factor())) { + private$data <- private$data %>% + dplyr::mutate(!!rlang::quo_name(private$id) := factor({{ id }})) + } + + private$treated_ids <- private$data %>% + dplyr::filter(!!private$treated == 1) %>% + dplyr::select({{ id }}) %>% + dplyr::distinct() %>% + dplyr::pull({{ id }}) + + private$time_tiles <- time_tiles( + data = private$data, + time = !!private$time, + id = !!private$id, + status = status + ) + + private$intervention <- private$data %>% + dplyr::filter(status == "Treated") %>% + dplyr::summarise(!!rlang::as_label(private$time) := min(!!private$time)) %>% + dplyr::pull(!!private$time) + + private$outcome <- rlang::enquo(outcome) + private$ci_width <- ci_width + + if (length(private$treated_ids) == 1) { + if (is.null(covariates)) { + if (gp) { + private$stan_model <- bsynth.models::model2 # stanmodels$model2 + } else { + if (predictor_match) { + private$stan_model <- bsynth.models::model1_gammaOmega # gamma BCS + } else { + private$stan_model <- bsynth.models::model1 # stanmodels$model1 + } + } + } else { + if (gp) { + private$stan_model <- bsynth.models::model4 # stanmodels$model4 + } else { + private$stan_model <- bsynth.models::model3 # stanmodels$model3 + } + private$covariates <- covariates %>% + dplyr::arrange(!!private$time) + } + } else { + if (is.null(covariates)) { + if (gp) { + private$stan_model <- bsynth.models::model8 # stanmodels$model8 + } else { + private$stan_model <- bsynth.models::model5 # stanmodels$model5 + } + } else { + if (gp) { + private$stan_model <- bsynth.models::model7 # stanmodels$model7 + } else { + private$stan_model <- bsynth.models::model6 # stanmodels$model6 + } + private$covariates <- covariates %>% + dplyr::arrange(!!private$time) + } + } + }, + + #' @description + #' Fit Stan model. + #' @param ... other arguments passed to [rstan::sampling()]. + fit = function(...) { + if (length(private$treated_ids) == 1) { ## Only one treated unit + # TODO(jvives): Create utility function to make pre/post data frames. + wide_df <- .makeWide( + data = private$data, + id = private$id, + time = private$time, + outcome = private$outcome, + treatment = private$treated + ) + pre_data <- wide_df %>% + dplyr::filter(!!private$time < private$intervention) + + post_data <- wide_df %>% + dplyr::filter(!!private$time >= private$intervention) + + X <- pre_data %>% + dplyr::select(-!!private$time, -!!private$treated, -!!private$outcome) + X_pred <- post_data %>% + dplyr::select(-!!private$time, -!!private$treated, -!!private$outcome) + + if (!is.null(private$covariates)) { + M <- private$covariates %>% + dplyr::filter(!!private$time < private$intervention) + + M_pred <- private$covariates %>% + dplyr::filter(!!private$time >= private$intervention) + + stan_data <- list( + N = nrow(pre_data), + y = pre_data %>% dplyr::pull(!!private$outcome), + K = ncol(X), + X = X, + M_K = ncol(M), + M = M, + N_pred = nrow(X_pred), + X_pred = X_pred, + M_pred = M_pred + ) + } else { + if (private$predictor_match) { + X1 <- pre_data %>% dplyr::pull(!!private$outcome) + if (!is.null(private$predictor_match_covariates0)) { + # covariates are a k x J matrix + # TODO(jvives): Make it general and add check for input order. + X <- rbind(X, setNames( + private$predictor_match_covariates0, + names(X) + )) + X1 <- c( + pre_data %>% dplyr::pull(!!private$outcome), + private$predictor_match_covariates1 + ) + } + # TODO(jvives): Move to initialize. + # Set predictor weights if not given. + if (is.null(private$vs)) { + private$vs <- rep(1, nrow(X)) + } + + stan_data <- list( + K = nrow(X), + X1 = X1, + J = ncol(X), + X0 = X, + T_post = nrow(X_pred), + X0_pred = X_pred, + vs = private$vs + ) + } else { + stan_data <- list( + N = nrow(pre_data), + y = pre_data %>% dplyr::pull(!!private$outcome), + K = ncol(X), + X = X, + N_pred = nrow(X_pred), + X_pred = X_pred + ) + } + } + } else { ## More than 1 treated unit + + Y <- private$data %>% + dplyr::filter(!!private$id %in% private$treated_ids) %>% + dplyr::select( + !!private$id, + !!private$time, + !!private$outcome + ) %>% + dplyr::filter(!!private$time < private$intervention) %>% + tidyr::pivot_wider( + names_from = !!private$id, + values_from = !!private$outcome + ) %>% + dplyr::arrange(!!private$time) %>% + dplyr::select(-!!private$time) %>% + as.matrix() %>% + t() + + donors <- private$data %>% + dplyr::filter(!(!!private$id %in% private$treated_ids)) %>% + dplyr::select(time, Y, id) + + X <- donors %>% + dplyr::filter(!!private$time < !!private$intervention) %>% + dplyr::arrange(!!private$time) %>% + tidyr::pivot_wider( + names_from = !!private$id, + values_from = !!private$outcome + ) %>% + dplyr::select(-!!private$time) + + X_pred <- donors %>% + dplyr::filter(!!private$time >= !!private$intervention) %>% + dplyr::arrange(!!private$time) %>% + tidyr::pivot_wider( + names_from = !!private$id, + values_from = !!private$outcome + ) %>% + dplyr::select(-!!private$time) + + if (!is.null(private$covariates)) { + M <- private$covariates %>% + dplyr::filter( + !!private$id %in% private$treated_ids, + !!private$time < private$intervention + ) %>% + dplyr::arrange(!!private$id, !!private$time) %>% + dplyr::select(-!!private$time) %>% + dplyr::group_by(!!private$id) %>% + tidyr::nest() %>% + dplyr::pull(data) %>% + purrr::map(as.matrix) + + M_pred <- private$covariates %>% + dplyr::filter( + !!private$id %in% private$treated_ids, + !!private$time >= private$intervention + ) %>% + dplyr::arrange(!!private$id, !!private$time) %>% + dplyr::select(-!!private$time) %>% + dplyr::group_by(!!private$id) %>% + tidyr::nest() %>% + dplyr::pull(data) %>% + purrr::map(as.matrix) + + stan_data <- list( + N = ncol(Y), + I = nrow(Y), + y = Y, + K = ncol(X), + X = X, + M_K = ncol(M[[1]]), + M = M, + N_pred = nrow(X_pred), + X_pred = X_pred, + M_pred = M_pred + ) + } else { + stan_data <- list( + N = ncol(Y), + I = nrow(Y), + y = Y, + K = ncol(X), + X = X, + N_pred = nrow(X_pred), + X_pred = X_pred + ) + } + } + + private$fitted <- + rstan::sampling( + private$stan_model, + data = stan_data, + ... + ) + + if (length(private$treated_ids) == 1) { + if (private$predictor_match) { + private$y_synth_draws <- .get_synth_draws_predictor_match( + fit = private$fitted, + pre_data = pre_data, + post_data = post_data, + time = private$time, + outcome = private$outcome + ) + } else { + private$y_synth_draws <- .get_synth_draws( + fit = private$fitted, + pre_data = pre_data, + post_data = post_data, + time = private$time, + outcome = private$outcome + ) + } + + private$plot_data <- + .get_plot_df( + y_synth_draws = private$y_synth_draws, + pre_data = pre_data, + post_data = post_data, + ci = private$ci_width, + time = private$time, + outcome = private$outcome + ) + } else { + private$y_synth_draws <- .get_synth_draws3d( + fit = private$fitted, + data = private$data, + id = private$id, + treated_ids = private$treated_ids, + time = private$time, + outcome = private$outcome, + intervention = private$intervention + ) + + # add plot data + private$plot_data <- .get_plot_df2( + y_synth_draws = private$y_synth_draws, + data = private$data, + treated_ids = private$treated_ids, + id = private$id, + time = private$time, + outcome = private$outcome, + ci = private$ci_width + ) + } + }, + + #' @description + #' Update the width of the credible interval. + #' @param ci_width New width for the credible interval. This number + #' should be in the (0,1) interval. + updateWidth = function(ci_width = 0.75) { + stopifnot(exprs = { + ci_width > 0 + ci_width < 1 + }) + private$ci_width <- ci_width + + wide_df <- .makeWide( + data = private$data, + id = private$id, + time = private$time, + outcome = private$outcome, + treatment = private$treated + ) + + pre_data <- wide_df %>% + dplyr::filter(!!private$time < private$intervention) + + post_data <- wide_df %>% + dplyr::filter(!!private$time >= private$intervention) + + private$plot_data <- + .get_plot_df( + y_synth_draws = private$y_synth_draws, + pre_data = pre_data, + post_data = post_data, + ci = private$ci_width, + time = private$time, + outcome = private$outcome + ) + }, + #' @description returns descriptive statistics of the lift estimate. + #' + summarizeLift = function() { + if (is.null(private$lift_draws)) { + stop("You first need to run the `liftDraws()` method.") + } + + return({ + c( + point = mean(private$lift_draws$lift), + lower_bound = quantile( + private$lift_draws$lift, + (1 - private$ci_width) / 2 + ), + upper_bound = quantile( + private$lift_draws$lift, + 1 - (1 - private$ci_width) / 2 + ) + ) + }) + }, + #' @description effect ggplot2 object that shows the + #' effect of the intervention over time. + #' @param facet Boolean that is TRUE if we want to divide the plot for each + #' unit. + #' @param subset Set of units to use in the effect plot. + effectPlot = function(facet = TRUE, subset = NULL) { + if (length(private$treated_ids) == 1) { + tau_plot <- .plot_tau( + data = private$plot_data, + x = private$time, + y = tau, + ymin = tau_LB, + ymax = tau_UB, + xintercept = private$intervention + ) + } else { + if (is.null(subset)) { + tau_plot <- .plot_tau( + data = private$plot_data, + x = private$time, + y = tau, + ymin = tau_LB, + ymax = tau_UB, + xintercept = private$intervention, + facet = private$id + ) + } else { + if (facet) { + tau_plot <- .plot_tau( + data = private$plot_data, + x = private$time, + y = tau, + ymin = tau_LB, + ymax = tau_UB, + xintercept = private$intervention, + facet = private$id, + id = private$id, + subset = subset + ) + } else { + tau_plot <- .plot_tau( + data = private$plot_data, + x = private$time, + y = tau, + ymin = tau_LB, + ymax = tau_UB, + xintercept = private$intervention, + id = private$id, + subset = subset + ) + } + } + } + + return(tau_plot) + }, + + #' @description + #' Plot placebo intervention. + #' @param periods Positive number of periods for the placebo intervention. + #' @param ... other arguments passed to [rstan::sampling()]. + #' @return ggplot2 object for placebo treatment effect. + placeboPlot = function(periods, ...) { + stopifnot(periods > 0) + + treated <- private$data %>% + dplyr::filter(status == "Treated") %>% + dplyr::select(!!private$id) %>% + dplyr::distinct() %>% + dplyr::pull(!!private$id) + + keep <- private$data %>% + dplyr::filter(!!private$time < private$intervention) %>% + dplyr::select(!!private$time) %>% + dplyr::distinct() %>% + dplyr::slice_max(n = periods, order_by = !!private$time) %>% + dplyr::pull(!!private$time) + + wide_df <- .makeWide( + data = private$data, + id = private$id, + time = private$time, + outcome = private$outcome, + treatment = private$treated + ) %>% + dplyr::filter(!!private$time < private$intervention) %>% + dplyr::mutate( + !!rlang::as_label(private$treated) := + dplyr::case_when(!!private$time %in% keep ~ 1, TRUE ~ 0) + ) + + pre_data <- wide_df %>% + dplyr::filter(!!private$treated == 0) + post_data <- wide_df %>% + dplyr::filter(!!private$treated == 1) + + X <- pre_data %>% + dplyr::select(-!!private$time, -!!private$treated, -!!private$outcome) + X_pred <- post_data %>% + dplyr::select(-!!private$time, -!!private$treated, -!!private$outcome) + + if (!is.null(private$covariates)) { + max_t_pre <- pre_data %>% + dplyr::pull(!!private$time) %>% + max() + max_t_post <- post_data %>% + dplyr::pull(!!private$time) %>% + max() + + M <- private$covariates %>% + dplyr::filter(!!private$time <= max_t_pre) + M_pred <- private$covariates %>% + dplyr::filter( + !!private$time > max_t_pre, + !!private$time <= max_t_post + ) + + stan_data <- list( + N = nrow(pre_data), + y = pre_data %>% dplyr::pull(!!private$outcome), + K = ncol(X), + X = X, + M_K = ncol(M), + M = M, + N_pred = nrow(X_pred), + X_pred = X_pred, + M_pred = M_pred + ) + } else { + stan_data <- list( + N = nrow(pre_data), + y = pre_data %>% dplyr::pull(!!private$outcome), + K = ncol(X), + X = X, + N_pred = nrow(X_pred), + X_pred = X_pred + ) + } + + placebo_fitted <- + rstan::sampling( + private$stan_model, + data = stan_data, + ... + ) + + placebo_y_synth_draws <- .get_synth_draws( + fit = placebo_fitted, + pre_data = pre_data, + post_data = post_data, + time = private$time, + outcome = private$outcome + ) + + plot_placebo_data <- + .get_plot_df( + y_synth_draws = placebo_y_synth_draws, + pre_data = pre_data, + post_data = post_data, + ci = private$ci_width, + time = private$time, + outcome = private$outcome + ) + + tau_plot <- .plot_tau( + data = plot_placebo_data, + x = private$time, + y = tau, + ymin = tau_LB, + ymax = tau_UB, + xintercept = min(keep) + ) + return(tau_plot) + }, + #' @description + #' Plots relative upper bias / tau for a time period [firstT, lastT]. + #' + #' @return vizdraw object with the posterior distribution of relative bias. + #' Bias is scaled by the time periods. + #' + #' @param small_bias Threshold value for considering the bias "small". + #' @param firstT,lastT Time periods to compute relative bias over, they + #' must be after the intervention. + biasDraws = function(small_bias = 0.3, firstT, lastT) { + # Calculate the gaps. This won't work with covariates. + mad_pre_gaps <- private$y_synth_draws %>% + dplyr::filter(!!private$time < private$intervention) %>% + dplyr::mutate(gap = y_synth - !!private$outcome) %>% + dplyr::group_by(draw) %>% + dplyr::summarise(mad = stats::mad(gap)) + + # Get all time periods in TL. + Ts <- private$y_synth_draws %>% + dplyr::filter(!!private$time <= lastT, !!private$time >= firstT) %>% + dplyr::select(!!private$time) %>% + unique() %>% + dplyr::pull() + + # Get time periods between intervention and first_T. + Ts_from_T0 <- private$y_synth_draws %>% + dplyr::filter( + !!private$time <= firstT, + !!private$time >= private$intervention + ) %>% + dplyr::select(!!private$time) %>% + dplyr::n_distinct() %>% + as.numeric() + T_from_T0 <- Ts_from_T0 + 1 + + # Calculate the estimated Treatment Effects (taus). + taus_TL <- private$y_synth_draws %>% + dplyr::filter(!!private$time >= private$intervention) %>% + dplyr::filter(!!private$time == Ts) %>% + dplyr::group_by(draw) %>% + dplyr::mutate(gapTauTL = abs(sum(y_synth - !!private$outcome))) + + # Adjust the relative bias by the number of time periods that passed + # since the intervention + TL <- length(Ts) + if (TL >= 2) { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate(bias_ratio = T_from_T0 * 0.5 * TL * (TL - 1) * mad / gapTauTL) %>% + dplyr::filter(bias_ratio < 5.) + } else { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate(bias_ratio = T_from_T0 * mad / gapTauTL) %>% + dplyr::filter(bias_ratio < 5.) + } + + # Return the relative bias vizdraw plot. + vizdraws::vizdraws( + posterior = rel_bias$bias_ratio, + percentage = TRUE, + quantity = TRUE, + tense = "past", + display_mode_name = TRUE, + title = "Counterfactual Upper Bias", + xlab = "Relative Bias", + units = "Relative Bias", + breaks = c(small_bias, 1.), + break_names = c( + "Small Bias", + "Some Bias", + "Sign Change" + ), + colors = c("#4daf4a", "#377eb8", "#e41a1c") + ) %>% return() + }, + #' @description + #' Plots lift. + #' @param from First period to consider when calculating lift. If infinite, + #' set to the time of the intervention. + #' @param to Last period to consider when calculating lift. If infinite, set + #' to the last period. + #' @return vizdraws object with the posterior distribution of the lift. + #' @param ... other arguments passed to vizdraws::vizdraws(). + liftDraws = function(from, to, ...) { + if (inherits(from, "Date")) { + if (is.infinite(abs(from))) { + from <- private$intervention + } + if (is.infinite(to)) { + to <- private$data %>% + pull(!!private$time) %>% + max() + } + } + # TODO(jvives): Add check for small values of sum_y0. + private$lift_draws <- private$y_synth_draws %>% + dplyr::filter(!!private$time >= from, !!private$time <= to) %>% + dplyr::group_by(draw) %>% + dplyr::summarise( + sum_y0 = sum(y_synth), + sum_y1 = sum(!!private$outcome), + lift = ifelse(abs(sum_y0) > 1e-9, (sum_y1 - sum_y0) / sum_y0, 0.) + ) + + vizdraws::vizdraws( + posterior = private$lift_draws$lift, + percentage = TRUE, + quantity = TRUE, + tense = "past", + display_mode_name = TRUE, + title = "Contrafactual Lift", + xlab = "Effect of the Intervention", + units = "the contrafactual lift", + ... + ) %>% return() + + return(posterior) + }, + # TODO(jvives): Add utility functions to make liftBias and Draws + # more compact. + #' @description + #' Plot Bias magnitude in terms of lift for period [firstT, lastT] + #' pre_MADs / y0 relative to lift thresholds. + #' @param firstT start of the time period to compute relative bias + #' over. They must be after the intervention. + #' @param lastT end of the Time period to compute relative bias + #' over. They must be after the intervention. + #' @param offset Target lift %. + #' @param ... other arguments passed to vizdraws::vizdraws(). + #' @returns vizdraws object with the relative bias with offset. + liftBias = function(firstT, lastT, offset, ...) { + # Calculate the gaps. // This wont work with covariates + mad_pre_gaps <- private$y_synth_draws %>% + dplyr::filter(!!private$time < private$intervention) %>% + dplyr::mutate(gap = y_synth - !!private$outcome) %>% + dplyr::group_by(draw) %>% + dplyr::summarise(mad = stats::mad(gap)) + + # Get all time periods in TL + Ts <- private$y_synth_draws %>% + dplyr::filter(!!private$time <= lastT, !!private$time >= firstT) %>% + dplyr::select(!!private$time) %>% + unique() %>% + dplyr::pull() + + # Get time periods between intervention and first_T + Ts_from_T0 <- private$y_synth_draws %>% + dplyr::filter( + !!private$time <= firstT, + !!private$time >= private$intervention + ) %>% + dplyr::select(!!private$time) %>% + dplyr::n_distinct() %>% + as.numeric() + T_from_T0 <- Ts_from_T0 + 1 + + # Calculate the estimated Treatment Effects (taus) + taus_TL <- private$y_synth_draws %>% + dplyr::filter(!!private$time >= private$intervention) %>% + dplyr::filter(!!private$time == Ts) %>% + dplyr::group_by(draw) %>% + dplyr::mutate(gapTauTL = abs(sum(y_synth - !!private$outcome))) %>% + dplyr::mutate(sumTauTL = abs(sum(!!private$outcome))) + + # Divide by first period tau. For now just first period. + TL <- length(Ts) + if (TL >= 2) { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate(bias_ratio = abs(T_from_T0 * 0.5 * TL * (TL - 1) * mad - sumTauTL * offset) / gapTauTL) %>% + dplyr::filter(bias_ratio < 2.) + } else { + rel_bias <- dplyr::inner_join(mad_pre_gaps, taus_TL, by = "draw") %>% + dplyr::mutate(bias_ratio = abs(T_from_T0 * mad - sumTauTL * offset) / gapTauTL) %>% + dplyr::filter(bias_ratio < 2.) + } + + # Return the relative bias vizdraw plot + vizdraws::vizdraws( + posterior = rel_bias$bias_ratio, + percentage = TRUE, + quantity = TRUE, + tense = "past", + display_mode_name = TRUE, + title = sprintf( + "Upper bias changing a %#.2f %% Lift", + offset * 100 + ), + xlab = "Bias Relative to Lift", + units = "Bias Relative to Lift", + breaks = c(0.3, 1.), + break_names = c( + "Unlikely", + "Close to change", + "Changes bucket" + ), + colors = c("#4daf4a", "#377eb8", "#e41a1c"), + ... + ) %>% return() + } + ) +) diff --git a/R/get_par_long.R b/R/get_par_long.R new file mode 100644 index 0000000..6591e90 --- /dev/null +++ b/R/get_par_long.R @@ -0,0 +1,35 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' @description +#' Helper function to get the long dataset of draws given a stan fit object. +#' @param fit Stan object with the fitted model. +#' @param par Variable to do the long table for. +.get_par_long <- function(fit, par) { + par <- rlang::enquo(par) + long_tlb <- fit %>% + as.data.frame(pars = rlang::quo_text(par)) %>% + dplyr::mutate(draw = 1:dplyr::n()) %>% + tidyr::pivot_longer( + names_to = "idx", + values_to = rlang::quo_text(par), -draw + ) %>% + dplyr::mutate(idx = as.numeric(gsub( + glue::glue("{rlang::quo_text(par)}\\[(\\d+)\\]"), + "\\1", + idx + ))) %>% + dplyr::as_tibble() + return(long_tlb) +} diff --git a/R/get_plot_df.R b/R/get_plot_df.R new file mode 100644 index 0000000..a876a25 --- /dev/null +++ b/R/get_plot_df.R @@ -0,0 +1,97 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' +#' @description +#' Returns data.frame ready for plotting with confidence intervals. +#' @param y_synth_draws Data.frame with each draw from the stan fit object. +#' @param pre_data Data.frame with data before the intervention. +#' @param post_data Data.frame with data after the intervention. +#' @param time Name of the time period variable. +#' @param oucome Name of the outcome variable. +#' @param ci Width of the credible confidence interval. +.get_plot_df <- function(y_synth_draws, pre_data, + post_data, time, outcome, ci = 0.75) { + y_synth <- y_synth_draws %>% + dplyr::group_by(!!time) %>% + dplyr::summarise( + LB = quantile(y_synth, (1 - ci) / 2), + UB = quantile(y_synth, 1 - (1 - ci) / 2), + y_synth = mean(y_synth) + ) + + all_data <- dplyr::bind_rows(pre_data, post_data) + + df_plot_all <- dplyr::inner_join(y_synth, all_data, + by = rlang::as_name(time) + ) %>% + dplyr::mutate( + tau = !!outcome - y_synth, + tau_LB = !!outcome - UB, + tau_UB = !!outcome - LB + ) %>% + dplyr::select(!!time, !!outcome, y_synth, LB, UB, tau, tau_LB, tau_UB) + return(df_plot_all) +} + +#' @description +#' Helper function to transform the data into a data.frame that can be +#' plotted for the case with more than one treated unit. +#' @param treated_ids Identifiers for the treated units. +#' @param data Data.frame with the input data. +#' @details other params same as get_plot_df() function. +.get_plot_df2 <- function(y_synth_draws, data, treated_ids, + id, time, outcome, ci = 0.75) { + data_treated <- data %>% + dplyr::filter(!!id %in% treated_ids) %>% + dplyr::select(!!id, !!time, !!outcome) %>% + dplyr::mutate(!!rlang::as_label(id) := as.character(!!id)) + + ate <- data_treated %>% + dplyr::inner_join(y_synth_draws, by = c( + rlang::as_name(id), + rlang::as_name(time) + )) %>% + dplyr::mutate(diff = !!outcome - y_hat) %>% + dplyr::group_by(!!time, draw) %>% + dplyr::summarise(diff_draw = mean(diff)) %>% + dplyr::group_by(!!time) %>% + dplyr::summarise( + tau = mean(diff_draw), + tau_LB = quantile(diff_draw, (1 - ci) / 2), + tau_UB = quantile(diff_draw, 1 - (1 - ci) / 2) + ) %>% + dplyr::mutate(id = "Average") + + y_synth_i <- y_synth_draws %>% + dplyr::group_by(!!time, !!id) %>% + dplyr::summarise( + LB = quantile(y_hat, (1 - ci) / 2), + UB = quantile(y_hat, 1 - (1 - ci) / 2), + y_synth = mean(y_hat) + ) + + df_plot_i <- y_synth_i %>% + dplyr::inner_join(data_treated, by = c( + rlang::as_name(id), + rlang::as_name(time) + )) %>% + dplyr::mutate( + tau = !!outcome - y_synth, + tau_LB = !!outcome - UB, + tau_UB = !!outcome - LB + ) + df_plot <- dplyr::bind_rows(df_plot_i, ate) + return(df_plot) +} diff --git a/R/get_synth_draws.R b/R/get_synth_draws.R new file mode 100644 index 0000000..5977419 --- /dev/null +++ b/R/get_synth_draws.R @@ -0,0 +1,181 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' @description +#' Helper function (not exported) to get the draws as a tidy data frame when +#' you only have a single treated unit. +#' @param fit Stan object with the fitted model. +#' @param pre_data Data.frame with data before the intervention. +#' @param post_data Data.frame with data after the intervention. +#' @param time Name of the time period variable. +#' @param oucome Name of the outcome variable. +.get_synth_draws <- function(fit, pre_data, post_data, time, outcome) { + y_sim_draws <- .get_par_long(fit = fit, par = y_sim) + dateXwalk <- pre_data %>% + dplyr::mutate(idx = 1:dplyr::n()) %>% + dplyr::select(idx, !!time) + y_hat <- dplyr::inner_join(y_sim_draws, dateXwalk, by = "idx") %>% + dplyr::rename(y_synth = y_sim) + + y_pred_draws <- .get_par_long(fit = fit, par = y_pred) + + dateXwalk <- post_data %>% + dplyr::mutate(idx = 1:dplyr::n()) %>% + dplyr::select(idx, !!time) + y_pred_hat <- dplyr::inner_join(y_pred_draws, dateXwalk, by = "idx") %>% + dplyr::rename(y_synth = y_pred) + y_synth <- dplyr::bind_rows(y_hat, y_pred_hat) %>% + dplyr::select(-idx) + + pre_outcome <- pre_data %>% + dplyr::select(!!outcome, !!time) + post_outcome <- post_data %>% + dplyr::select(!!outcome, !!time) + + y <- dplyr::bind_rows(pre_outcome, post_outcome) + y_synth <- dplyr::full_join(y_synth, y, by = rlang::as_name(time)) + return(y_synth) +} + +# TODO(jvives): Unify get_synth_draws functions into one. +#' @description +#' Helper function (not exported) to get the draws as a tidy data frame when +#' you only have a single treated unit. +#' This function uses the variable definitions of the predictor_match +# stan model. Otherwise it is the same as get_synth_draws. +#' @details params are the same as in get_synth_draws(). +.get_synth_draws_predictor_match <- function(fit, pre_data, + post_data, time, outcome) { + X1_sim_draws <- .get_par_long(fit = fit, par = X1_sim) + dateXwalk <- pre_data %>% + dplyr::mutate(idx = 1:dplyr::n()) %>% + dplyr::select(idx, !!time) + y_hat <- dplyr::inner_join(X1_sim_draws, dateXwalk, by = "idx") %>% + dplyr::rename(y_synth = X1_sim) + + X1_pred_draws <- .get_par_long(fit = fit, par = X1_pred) + dateXwalk <- post_data %>% + dplyr::mutate(idx = 1:dplyr::n()) %>% + dplyr::select(idx, !!time) + X1_pred_hat <- dplyr::inner_join(X1_pred_draws, dateXwalk, by = "idx") %>% + dplyr::rename(y_synth = X1_pred) + + y_synth <- dplyr::bind_rows(y_hat, X1_pred_hat) %>% + dplyr::select(-idx) + + pre_outcome <- pre_data %>% + dplyr::select(!!outcome, !!time) + post_outcome <- post_data %>% + dplyr::select(!!outcome, !!time) + + y <- dplyr::bind_rows(pre_outcome, post_outcome) + y_synth <- dplyr::full_join(y_synth, y, by = rlang::as_name(time)) + return(y_synth) +} + +#' @description +#' Helper function (not exported) to get the draws as a tidy data frame when +#' you have multiple treated units. +#' @param data Data.frame with the input data. +#' @param intervention Name of the variable that identifies the intervention +#' time. +#' @param treated_ids Name of the variable that identifies the treated units. +#' @details other params are the same as in get_synth_draws(). +.get_synth_draws3d <- function(fit, data, id, treated_ids, time, outcome, + intervention) { + y_sim_draws <- + .get_draws3d( + fit = fit, + data = data, + id = id, + treated_ids = treated_ids, + time = time, + outcome = outcome, + intervention = intervention, + period = "pre" + ) + + y_pred_draws <- + .get_draws3d( + fit = fit, + data = data, + id = id, + treated_ids = treated_ids, + time = time, + outcome = outcome, + intervention = intervention, + period = "post" + ) + + y_draws <- dplyr::bind_rows(y_sim_draws, y_pred_draws) + + return(y_draws) +} + +.get_draws3d <- function(fit, data, id, treated_ids, time, outcome, + intervention, period = c("pre", "post")) { + period <- match.arg(period) + if (period == "pre") { + y_sim_draws <- rstan::extract(fit, + pars = "y_sim" + )[[1]] + data <- data %>% + dplyr::filter(!!time < intervention) + } else { + y_sim_draws <- rstan::extract(fit, + pars = "y_pred" + )[[1]] + data <- data %>% + dplyr::filter(!!time >= intervention) + } + + dimnames(y_sim_draws) <- list( + "draw" = seq(1, dim(y_sim_draws)[[1]]), + "i_idx" = seq(1, dim(y_sim_draws)[[2]]), + "t_idx" = seq(1, dim(y_sim_draws)[[3]]) + ) + + y_sim_draws <- y_sim_draws %>% + cubelyr::as.tbl_cube() %>% + tibble::as_tibble() %>% + dplyr::rename(y_hat = 4) + + wide_df_treated <- data %>% + dplyr::filter(!!id %in% treated_ids) %>% + dplyr::select( + !!id, + !!time, + !!outcome + ) %>% + tidyr::pivot_wider(names_from = !!id, values_from = !!outcome) %>% + dplyr::arrange(time) + + + iXwalk <- wide_df_treated %>% + dplyr::select(-!!time) %>% + colnames() %>% + dplyr::tibble(id = .) %>% + dplyr::mutate(i_idx = 1:dplyr::n()) + + tXwalk <- wide_df_treated %>% + dplyr::select(!!time) %>% + dplyr::mutate(t_idx = 1:dplyr::n()) + + y_sim_draws <- y_sim_draws %>% + dplyr::inner_join(iXwalk, by = "i_idx") %>% + dplyr::inner_join(tXwalk, by = "t_idx") %>% + dplyr::select(-i_idx, -t_idx) + + return(y_sim_draws) +} diff --git a/R/makeWide.R b/R/makeWide.R new file mode 100644 index 0000000..b5f80c9 --- /dev/null +++ b/R/makeWide.R @@ -0,0 +1,46 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' @description +#' Converts data to wide format. +#' @param data Data.frame with the input data. +#' @param time Variable name for the time period. +#' @param id Variable name for the units. +#' @param oucome Name of the outcome variable. +.makeWide <- function(data, id, time, outcome, treatment) { + data <- data %>% + dplyr::mutate(.tmp_id = as.integer(as.factor(!!id))) + + treated <- data %>% + dplyr::filter(status == "Treated") %>% + dplyr::select(.tmp_id) %>% + dplyr::distinct() %>% + dplyr::pull(.tmp_id) + + wide_df_treated <- data %>% + dplyr::filter(.tmp_id %in% treated) %>% + dplyr::select(!!time, !!treatment, !!outcome) + + wide_df_untreated <- data %>% + dplyr::filter(!(.tmp_id %in% treated)) %>% + dplyr::select(!!time, !!outcome, !!id) %>% + tidyr::pivot_wider( + names_from = !!id, + values_from = !!outcome + ) + + dplyr::inner_join(wide_df_treated, wide_df_untreated, + by = rlang::as_name(time) + ) %>% return() +} diff --git a/R/plot_tau.R b/R/plot_tau.R new file mode 100644 index 0000000..55b823b --- /dev/null +++ b/R/plot_tau.R @@ -0,0 +1,49 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' @description +#' Plots the treatment effect estimate given the model. +#' @param data Data.frame with the input data. +#' @param x Name of the x axis variable (e.g. time period). +#' @param y Name of the y axis (e.g. treatement effect). +#' @param ymin, ymax Range values of the y variable. +#' @param xintercept Value of the time of the intervention to plot dashed line. +#' @param facet Variable to split the plots by. +#' @param id Variable name of the units. +#' @param subset Set of units to use for plot, all if NULL. +.plot_tau <- function(data, x, y, ymin, ymax, xintercept, + facet, id, subset = NULL) { + if (!is.null(subset)) { + data <- data %>% + dplyr::filter(!!id %in% subset) + } + tau_plot <- ggplot2::ggplot(data = data, ggplot2::aes(x = !!x)) + + ggplot2::geom_line(ggplot2::aes(y = {{ y }})) + + ggplot2::geom_ribbon(ggplot2::aes(ymin = {{ ymin }}, ymax = {{ ymax }}), + color = "gray", + alpha = 0.2 + ) + + ggplot2::theme_bw(base_size = 14) + + ggplot2::theme( + legend.position = "none", + panel.border = ggplot2::element_blank(), + axis.line = ggplot2::element_line() + ) + + ggplot2::geom_vline(xintercept = xintercept, linetype = "dashed") + + if (!missing(facet)) { + tau_plot <- tau_plot + ggplot2::facet_grid(cols = dplyr::vars(!!facet)) + } + return(tau_plot) +} diff --git a/R/time_tiles.R b/R/time_tiles.R new file mode 100644 index 0000000..9260dbd --- /dev/null +++ b/R/time_tiles.R @@ -0,0 +1,38 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' @description function to return time tiles plot summarizing when and which +#' units are affected by the intervention. +#' @param data Data.frame with the input data. +#' @param time Variable name for the time period. +#' @param id Variable name for the units. +#' @param status Variable that identifies the treatment status. +time_tiles <- function(data, time, id, status) { + tiles_plot <- ggplot2::ggplot(data, ggplot2::aes( + x = {{ time }}, + y = {{ id }}, + fill = {{ status }} + )) + + ggplot2::geom_tile(color = "white", size = 1) + + ggplot2::scale_y_discrete(limits = rev) + + ggplot2::scale_fill_manual(values = c("#4285F4", "#F4B400", "#DB4437")) + + ggplot2::theme_classic(base_size = 14) + + ggplot2::theme( + legend.position = "bottom", + panel.border = ggplot2::element_blank(), + panel.grid.major = ggplot2::element_blank(), + legend.title = ggplot2::element_blank() + ) + return(tiles_plot) +} diff --git a/R/utils-pipe.R b/R/utils-pipe.R new file mode 100644 index 0000000..b58984a --- /dev/null +++ b/R/utils-pipe.R @@ -0,0 +1,25 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +#' Pipe operator +#' +#' See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. +#' +#' @name %>% +#' @rdname pipe +#' @keywords internal +#' @export +#' @importFrom magrittr %>% +#' @usage lhs \%>\% rhs +NULL diff --git a/README.md b/README.md new file mode 100644 index 0000000..d4941f1 --- /dev/null +++ b/README.md @@ -0,0 +1,19 @@ +# bsynth + +`bsynth` provides causal inference with a Bayesian synthetic control method. + +## Contributing + +See [`CONTRIBUTING.md`](CONTRIBUTING.md) for details. + +## License + +Apache 2.0; see [`LICENSE`](LICENSE) for details. + +## Disclaimer + +This project is not an official Google project. It is not supported by Google +and Google specifically disclaims all warranties as to its quality, +merchantability, or fitness for a particular purpose. + +As of 2021-08-21, this package is under development. Please report bugs. diff --git a/data/germany.rda b/data/germany.rda new file mode 100644 index 0000000..459eed3 Binary files /dev/null and b/data/germany.rda differ diff --git a/data/prop99_df.rda b/data/prop99_df.rda new file mode 100644 index 0000000..7fcf195 Binary files /dev/null and b/data/prop99_df.rda differ diff --git a/inst/stan/factor_functions.stan b/inst/stan/factor_functions.stan new file mode 100644 index 0000000..140341e --- /dev/null +++ b/inst/stan/factor_functions.stan @@ -0,0 +1,72 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Auxiliary functions for stan factor model. +// Functions: +// make_F Creates the factor matrix F of size T x L. +// make_beta Creates the unit loadings of size L x 1 for each J+1 unit. +functions{ + // @description Given the factor loadings creates the T x L factor matrix. + // @param T Number of time periods. + // @param diagonal_loadings Loadings for each factor. + // @param lower_tri_loadings Cross-factor loadings. + matrix make_F(int T, + vector diagonal_loadings, + vector lower_tri_loadings) { + int L = num_elements(diagonal_loadings); + int M = num_elements(lower_tri_loadings); + matrix[T, L] F; + + int idx = 0; // Index for the lower diagonal + + for (j in 1:L) { + F[j, j] = diagonal_loadings[j]; + for (i in (j + 1):T) { + idx += 1; + F[i, j] = lower_tri_loadings[idx]; + } + } + for (j in 1:(L - 1)) { + for (i in (j + 1):L) F[j, i] = 0; + } + + return F; + } + + // @description Given prior parameters generates the J+1xL matrix of unit + // weights for each factor from the prior model. + // @param J Number of donor units. + // @param off Initial values of beta. + // @param off Beta matrix to be updated. + // @param eta Mixing parameter of half-cauchy distribution for each unit. + // @param lambda Local shrinkage parameter common for each unit. + // @param tau Global shrinkage parameter. + matrix make_beta (int J, + matrix off, + vector lambda, + real eta, + vector tau) { + int L = cols(off); + vector[L] cache = (tan(0.5 * pi() * lambda) * + tan(0.5 * pi() * eta)); + + vector[J] tau_ = tan(0.5 * pi() * tau); + matrix[J, L] out; + + for (j in 1:J) + out[j] = off[j] * tau_[j]; + + return diag_pre_multiply(cache, out'); + } +} diff --git a/inst/stan/factor_model_with_covariates.stan b/inst/stan/factor_model_with_covariates.stan new file mode 100644 index 0000000..8dc6f5d --- /dev/null +++ b/inst/stan/factor_model_with_covariates.stan @@ -0,0 +1,138 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Bayesian Factor Model with observed covariates +// Follows from https://arxiv.org/abs/2103.16244 (Pinkney 2021) +// Counterfactual Model: y = F'beta + X'gamma + Delta + kappa +// - y Observed outcome +// - F Latent factor components +// - beta Latent factor unit traits +// - Delta Time specific component +// - kappa Unit specific component + +// Load factor functions +#include factor_functions.stan + +data{ + int L; // number of factors + int N; // number of pre-intervention periods + row_vector[N] y_treated_pre; // pre-intervention outcome for the + // treated unit + int J; // number of donors + row_vector[N] y_donors_pre[J]; // matrix of order [J,N] with + // pre-intervention outcome for donors + int N_pred; // number of post-intervention periods + row_vector[N_pred] y_donors_post[J]; // post-intervention outcome for donors +} + +transformed data{ + // Define dimensions of the factor model. + int T = N + N_pred; + int j_plus_1 = J+1; + int M = L * (T - L) + L * (L - 1) / 2; + row_vector[j_plus_1] j_ones = rep_row_vector(1, j_plus_1); + vector[T] t_ones = rep_vector(1.0, T); + + row_vector[T] y_donors[J]; // Outcome variable for donor units. + + // Standardize the outcome variables. + row_vector[N] y_donors_pre_std[J]; + row_vector[N_pred] y_donors_post_std[J]; + vector[J] mean_y_donors_pre; + vector[J] sd_y_donors_pre; + real mean_y = mean(y_treated_pre); + real sd_y = sd(y_treated_pre); + row_vector[N] y_std = (y_treated_pre - mean_y) / sd_y; + + for (j in 1:J) { + mean_y_donors_pre[j] = mean(y_donors_pre[j]); + sd_y_donors_pre[j] = sd(y_donors_pre[j]); + y_donors_pre_std[j] = + (y_donors_pre[j] - mean_y_donors_pre[j]) / sd_y_donors_pre[j]; + y_donors_post_std[j] = + (y_donors_post[j] - mean_y_donors_pre[j]) / sd_y_donors_pre[j]; + y_donors[j] = append_col(y_donors_pre_std[j], y_donors_post_std[j]); + } +} + +parameters{ + // Initialize model parameters. + vector[T] raw_b; + real sigma_b; + row_vector [j_plus_1] raw_c; + real sigma_c; + + matrix[j_plus_1, L] beta_off; + vector[L] lambda; + real eta; + vector[j_plus_1] tau; + + row_vector[N_pred] y_missing; + + real sigma; + + vector[L] F_diag; + vector[M] F_lower; +} + +transformed parameters{ + // Create the beta factor weights for each unit. + matrix[L, j_plus_1] beta = make_beta(j_plus_1, + beta_off, + lambda, + eta, + tau); + vector[T] b = raw_b * sigma_b; // time random effects + row_vector [j_plus_1] c = raw_c * sigma_c; // unit random effects +} + +model{ + // Priors. + to_vector(beta_off) ~ std_normal(); + F_diag ~ std_normal(); + F_lower ~ normal(0, 2); + raw_b ~ std_normal(); + sigma_b ~ std_normal(); + raw_c ~ std_normal(); + sigma_c ~ std_normal(); + sigma ~ std_normal(); + { + matrix[T, L] F = make_F(T, F_diag, F_lower); // Create the Factor matrix. + row_vector[T] Y_target[1]; + row_vector[T] Y_temp[j_plus_1]; + Y_target[1] = append_col(y_std, + y_missing); // Outcome data to be updated. + + Y_temp = append_array(Y_target, // Target data with a prior. + y_donors); // Donor data used for updating. + for (j in 1:j_plus_1) + Y_temp[j]' ~ normal_id_glm(F, + b + c[j], + beta[ , j], + sigma); + } +} + +generated quantities{ + vector[T] synth_out; + { + matrix[T, L] F_ = make_F(T, F_diag, F_lower); + matrix[T, j_plus_1] Synth_ = F_ * beta + + b * j_ones + + t_ones * c ; + + for (t in 1:T) + synth_out[t] = normal_rng(Synth_[t,1], sigma) * sd_y + mean_y ; + } +} diff --git a/inst/stan/factor_model_without_covariates.stan b/inst/stan/factor_model_without_covariates.stan new file mode 100644 index 0000000..f1d713d --- /dev/null +++ b/inst/stan/factor_model_without_covariates.stan @@ -0,0 +1,134 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Bayesian Factor Model without observed covariates +// Follows from https://arxiv.org/abs/2103.16244 (Pinkney 2021) +// Counterfactual Model: y = F'beta + X'gamma + Delta + kappa +// - y Observed outcome +// - F Latent factor components +// - beta Latent factor unit traits +// - Delta Time specific component +// - kappa Unit specific component + +// Load factor functions +#include factor_functions.stan + +data{ + int L; // number of factors + int N; // number of pre-intervention periods + row_vector[N] y_treated_pre; // pre intervention outcome for the + // treated unit + int J; // number of donors + row_vector[N] y_donors_pre[J]; // matrix of order [J,N] with + // pre-intervention outcome for donors + int N_pred; // number of post-intervention periods + row_vector[N_pred] y_donors_post[J]; // post-intervention outcome for donors +} + +transformed data{ + int T = N + N_pred; + int j_plus_1 = J+1; + int M = L * (T - L) + L * (L - 1) / 2; + row_vector[j_plus_1] j_ones = rep_row_vector(1, j_plus_1); + vector[T] t_ones = rep_vector(1.0, T); + + row_vector[T] y_donors[J]; + + row_vector[N] y_donors_pre_std[J]; + row_vector[N_pred] y_donors_post_std[J]; + vector[J] mean_y_donors_pre; + vector[J] sd_y_donors_pre; + real mean_y = mean(y_treated_pre); + real sd_y = sd(y_treated_pre); + row_vector[N] y_std = (y_treated_pre - mean_y) / sd_y; + + for (j in 1:J) { + mean_y_donors_pre[j] = mean(y_donors_pre[j]); + sd_y_donors_pre[j] = sd(y_donors_pre[j]); + y_donors_pre_std[j] = + (y_donors_pre[j] - mean_y_donors_pre[j]) / sd_y_donors_pre[j]; + y_donors_post_std[j] = + (y_donors_post[j] - mean_y_donors_pre[j]) / sd_y_donors_pre[j]; + y_donors[j] = append_col(y_donors_pre_std[j], y_donors_post_std[j]); + } +} + +parameters{ + vector[T] raw_b; + real sigma_b; + row_vector [j_plus_1] raw_c; + real sigma_c; + + matrix[j_plus_1, L] beta_off; + vector[L] lambda; + real eta; + vector[j_plus_1] tau; + + row_vector[N_pred] y_missing; + + real sigma; + + vector[L] F_diag; + vector[M] F_lower; +} + +transformed parameters{ + matrix[L, j_plus_1] beta = make_beta(j_plus_1, + beta_off, + lambda, + eta, + tau); + vector[T] b = raw_b * sigma_b; // time random effects + row_vector [j_plus_1] c = raw_c * sigma_c; // unit random effects +} + +model{ + to_vector(beta_off) ~ std_normal(); + F_diag ~ std_normal(); + F_lower ~ normal(0, 2); + raw_b ~ std_normal(); + sigma_b ~ std_normal(); + raw_c ~ std_normal(); + sigma_c ~ std_normal(); + sigma ~ std_normal(); + { + matrix[T, L] F = make_F(T, F_diag, F_lower); + row_vector[T] Y_target[1]; + row_vector[T] Y_temp[j_plus_1]; + Y_target[1] = append_col(y_std, + y_missing); + + Y_temp = append_array(Y_target, + y_donors); + + for (j in 1:j_plus_1) + Y_temp[j]' ~ normal_id_glm(F, + b + c[j], + beta[ , j], + sigma); + } +} + +generated quantities{ + vector[T] synth_out; + { + matrix[T, L] F_ = make_F(T, F_diag, F_lower); + matrix[T, j_plus_1] Synth_ = F_ * beta + + b * j_ones + + t_ones * c ; + + for (t in 1:T) + synth_out[t] = normal_rng(Synth_[t,1], sigma) * sd_y + mean_y ; + } +} diff --git a/inst/stan/model1.stan b/inst/stan/model1.stan new file mode 100644 index 0000000..d086a97 --- /dev/null +++ b/inst/stan/model1.stan @@ -0,0 +1,67 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model. +// Model follows this paper: ... +// TODO(jvives): Document the paper. +// This implementation does not include multiple treated units or +// staggered adoption or additional covariates. +data { + int N; // number of pre-intervention periods + vector[N] y; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors +} + +transformed data{ // normalize using pre-treatment values + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[K] mean_X; + vector[K] sd_X; + real mean_y = mean(y); + real sd_y = sd(y); + vector[N] y_std = (y - mean_y) / sd_y; + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } +} + +parameters { + real sigma; + simplex[K] beta; +} + +model { + // Priors. + sigma ~ normal(0,1); + target += normal_lpdf(y_std | X_std*beta, + sigma); +} + +generated quantities { + vector[N] y_sim; + vector[N_pred] y_pred; + for (i in 1:N) { + y_sim[i] = normal_rng(X_std[i,]*beta, sigma) * sd_y + mean_y; // + } + for (j in 1:N_pred) { + y_pred[j] = normal_rng(X_pred_std[j,]*beta, sigma) * sd_y + mean_y; // + } +} diff --git a/inst/stan/model1_gammaOmega.stan b/inst/stan/model1_gammaOmega.stan new file mode 100644 index 0000000..32b9622 --- /dev/null +++ b/inst/stan/model1_gammaOmega.stan @@ -0,0 +1,86 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with predictor match. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include multiple treated units or +// staggered adoption, but allows for predictors/covariates. +data { + // Data requirements + int K; // number of predictors + vector[K] X1; // treated unit outcome predictors + int J; // number of donors + matrix[K,J] X0; // donor units outcome predictors + int T_post; // number of post-intervention periods + matrix[T_post,J] X0_pred; // post-intervention outcome for donors + vector[K] vs; // predictor weights +} + +transformed data{ + // Normalize design matrices using pre-treatment values. + matrix[K, J] X0_std; + matrix[T_post, J] X0_pred_std; + vector[J] mean_X0; + vector[J] sd_X0; + real mean_X1 = mean(X1); + real sd_X1 = sd(X1); + vector[K] X1_std = (X1 - mean_X1) / sd_X1; // standarized X1 + vector[K] vs_std; // standarize the V weights + for (k in 1:K) { + vs_std[k] = pow(sd(X0[k,]), -1); + } + for (j in 1:J) { + mean_X0[j] = mean(X0[,j]); + sd_X0[j] = sd(X0[,j]); + X0_std[,j] = (X0[,j] - mean_X0[j]) / sd_X0[j]; // standarized X0 + X0_pred_std[,j] = (X0_pred[,j] - mean_X0[j]) / sd_X0[j]; // standarized X0pred + } +} + +parameters { + real sigma; // std. dev predictors + simplex[J] w; // synthetic control weights, Dir(1) prior + simplex[K] gamma; // scaling vector for predictor weights +} + +transformed parameters{ + // reparametrization + // Transform variance parameters + vector[K] Omega; + vector[K] Gamma; + for (k in 1:K) { + Gamma[k] = pow(gamma[k], -1); + } + Omega = sigma*Gamma; +} + +model { + // Priors. + sigma ~ normal(0,1); + gamma ~ dirichlet(vs_std); + target += normal_lpdf(X1_std | X0_std*w, Omega); +} + +generated quantities { + vector[K] X1_sim; // conditional posterior for X1 + vector[T_post] X1_pred; // prediction for post-treatment period + for (i in 1:K) { + X1_sim[i] = normal_rng(X0_std[i,]*w, sigma) * sd_X1 + mean_X1; // + } + // TODO(jvives): Decide whether to use vs also in generated quantities. + for (j in 1:T_post) { + X1_pred[j] = normal_rng(X0_pred_std[j,]*w, sigma) * sd_X1 + mean_X1; // + } +} diff --git a/inst/stan/model2.stan b/inst/stan/model2.stan new file mode 100644 index 0000000..74734c4 --- /dev/null +++ b/inst/stan/model2.stan @@ -0,0 +1,89 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with a Gaussian Process. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// staggered adoption, but allows for predictors/covariates. +data { + int N; // number of pre-intervention periods + vector[N] y; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[K] mean_X; + vector[K] sd_X; + real mean_y = mean(y); + real sd_y = sd(y); + real time[N + N_pred] ; + vector[N] y_std = (y - mean_y) / sd_y; + int sumN = N + N_pred; + + for(t in 1:sumN){ + time[t] = t; + } + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } +} + +parameters { + real sigma; + simplex[K] beta; + real rho; + real alpha; + vector[sumN] eta; +} + +transformed parameters { + vector[sumN] f; + { + matrix[sumN,sumN] K_matrix = cov_exp_quad(time,alpha,rho) + + diag_matrix(rep_vector(1e-9, sumN)); + matrix[sumN,sumN] L_K = cholesky_decompose(K_matrix); + f = L_K * eta; + } +} + +model { + // Priors. + rho ~ normal(0,3); + alpha ~ normal(0,1); + sigma ~ normal(0,1); + eta ~ normal(0,1); + target += normal_lpdf(y_std | X_std*beta + f[1:N], + sigma); +} + +generated quantities { + vector[N] y_sim; + vector[N_pred] y_pred; + for (i in 1:N) { + y_sim[i] = normal_rng(X_std[i,]*beta + f[i], sigma) * sd_y + mean_y; // + } + for (j in 1:N_pred) { + y_pred[j] = normal_rng(X_pred_std[j,]*beta + f[N + j], sigma) * sd_y + mean_y; // + } +} + diff --git a/inst/stan/model3.stan b/inst/stan/model3.stan new file mode 100644 index 0000000..5e16543 --- /dev/null +++ b/inst/stan/model3.stan @@ -0,0 +1,106 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with covariates. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include multiple treated units ors +// staggered adoption, but allows for predictors/covariates. +data { + int N; // number of pre-intervention periods + vector[N] y; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int M_K; // number of covariates + matrix[N,M_K] M; // covariates + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors + matrix[N_pred,M_K] M_pred; // covariates +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[K] mean_X; + vector[K] sd_X; + matrix[N, M_K] M_std; + matrix[N_pred, M_K] M_pred_std; + vector[M_K] mean_M; + vector[M_K] sd_M; + real mean_y = mean(y); + real sd_y = sd(y); + // real time[N + N_pred] ; + vector[N] y_std = (y - mean_y) / sd_y; + int sumN = N + N_pred; + + // for(t in 1:sumN){ + // time[t] = t; + // } + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } + + for (j in 1:M_K) { + mean_M[j] = mean(M[,j]); + sd_M[j] = sd(M[,j]); + M_std[,j] = (M[,j] - mean_M[j]) / sd_M[j]; + M_pred_std[,j] = (M_pred[,j] - mean_M[j]) / sd_M[j]; + } + +} + +parameters { + real sigma; + simplex[K] beta; + // real rho; + // real alpha; + // vector[sumN] eta; + vector[M_K] gamma; +} + +// transformed parameters { +// vector[sumN] f; +// { +// matrix[sumN,sumN] K_matrix = cov_exp_quad(time,alpha,rho) + +// diag_matrix(rep_vector(1e-9, sumN)); +// matrix[sumN,sumN] L_K = cholesky_decompose(K_matrix); +// f = L_K * eta; +// } +// } + +model { + // Priors. + // rho ~ normal(0,3); + // alpha ~ normal(0,1); + sigma ~ normal(0,1); + // eta ~ normal(0,1); + gamma ~ normal(0,1); + target += normal_lpdf(y_std | X_std*beta + M_std*gamma, + sigma); +} + +generated quantities { + vector[N] y_sim; + vector[N_pred] y_pred; + for (i in 1:N) { + y_sim[i] = normal_rng(X_std[i,]*beta + M_std[i,]*gamma, sigma) * sd_y + mean_y; // + } + for (j in 1:N_pred) { + y_pred[j] = normal_rng(X_pred_std[j,]*beta + M_pred_std[j,]*gamma, sigma) * sd_y + mean_y; // + } +} diff --git a/inst/stan/model4.stan b/inst/stan/model4.stan new file mode 100644 index 0000000..86a75b9 --- /dev/null +++ b/inst/stan/model4.stan @@ -0,0 +1,109 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with a Gaussian Process and +// covariates. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include multiple treated units or +// staggered adoption. +data { + int N; // number of pre-intervention periods + vector[N] y; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int M_K; // number of covariates + matrix[N,M_K] M; // covariates + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors + matrix[N_pred,M_K] M_pred; // covariates +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[K] mean_X; + vector[K] sd_X; + matrix[N, M_K] M_std; + matrix[N_pred, M_K] M_pred_std; + vector[M_K] mean_M; + vector[M_K] sd_M; + real mean_y = mean(y); + real sd_y = sd(y); + real time[N + N_pred] ; + vector[N] y_std = (y - mean_y) / sd_y; + int sumN = N + N_pred; + + for(t in 1:sumN){ + time[t] = t; + } + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } + + for (j in 1:M_K) { + mean_M[j] = mean(M[,j]); + sd_M[j] = sd(M[,j]); + M_std[,j] = (M[,j] - mean_M[j]) / sd_M[j]; + M_pred_std[,j] = (M_pred[,j] - mean_M[j]) / sd_M[j]; + } + +} + +parameters { + real sigma; + simplex[K] beta; + real rho; + real alpha; + vector[sumN] eta; + vector[M_K] gamma; +} + +transformed parameters { + vector[sumN] f; + { + matrix[sumN,sumN] K_matrix = cov_exp_quad(time,alpha,rho) + + diag_matrix(rep_vector(1e-9, sumN)); + matrix[sumN,sumN] L_K = cholesky_decompose(K_matrix); + f = L_K * eta; + } +} + +model { + // Priors. + rho ~ normal(0,3); + alpha ~ normal(0,1); + sigma ~ normal(0,1); + eta ~ normal(0,1); + gamma ~ normal(0,1); + target += normal_lpdf(y_std | X_std*beta + M_std*gamma + f[1:N], + sigma); +} + +generated quantities { + vector[N] y_sim; + vector[N_pred] y_pred; + for (i in 1:N) { + y_sim[i] = normal_rng(X_std[i,]*beta + M_std[i,]*gamma + + f[i], sigma) * sd_y + mean_y; // + } + for (j in 1:N_pred) { + y_pred[j] = normal_rng(X_pred_std[j,]*beta + M_pred_std[j,]*gamma + + f[N + j], sigma) * sd_y + mean_y; // + } +} diff --git a/inst/stan/model5.stan b/inst/stan/model5.stan new file mode 100644 index 0000000..b50e902 --- /dev/null +++ b/inst/stan/model5.stan @@ -0,0 +1,79 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with multiple treated units. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include staggered adoption or covariates. +data { + int N; // number of pre-intervention periods + int I; // number of treated units + vector[N] y[I]; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors + +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[N] y_std[I]; + real mean_y[I]; + real sd_y[I]; + { + vector[K] mean_X; + vector[K] sd_X; + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } + for (i in 1:I){ + mean_y[i] = mean(y[i]); + sd_y[i] = sd(y[i]); + y_std[i] = (y[i] - mean_y[i]) / sd_y[i]; + } + } +} + +parameters { + real sigma[I]; + simplex[K] beta[I]; +} + +model { + for(i in 1:I){ + sigma[i] ~ normal(0,1); + target += normal_lpdf(y_std[i] | X_std*beta[i], + sigma[i]); + } +} + +generated quantities { + vector[N] y_sim[I]; + vector[N_pred] y_pred[I]; + for(i in 1:I){ + for (n in 1:N) { + y_sim[i][n] = normal_rng(X_std[n,]*beta[i], sigma[i]) * sd_y[i] + + mean_y[i]; // + } + for (j in 1:N_pred) { + y_pred[i][j] = normal_rng(X_pred_std[j,]*beta[i], sigma[i]) * sd_y[i] + + mean_y[i]; // + } + } +} diff --git a/inst/stan/model6.stan b/inst/stan/model6.stan new file mode 100644 index 0000000..d9cedb2 --- /dev/null +++ b/inst/stan/model6.stan @@ -0,0 +1,99 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with multiple treated units and +// covariates. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include staggered adoption. +data { + int N; // number of pre-intervention periods + int I; // number of treated units + vector[N] y[I]; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int M_K; // number of covariates + matrix[N,M_K] M[I]; // covariates + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors + matrix[N_pred,M_K] M_pred[I]; // covariates + +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[N] y_std[I]; + real mean_y[I]; + real sd_y[I]; + matrix[N, M_K] M_std[I]; + matrix[N_pred, M_K] M_pred_std[I]; + { + vector[K] mean_X; + vector[K] sd_X; + vector[M_K] mean_M[I]; + vector[M_K] sd_M[I]; + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } + + for (i in 1:I){ + mean_y[i] = mean(y[i]); + sd_y[i] = sd(y[i]); + y_std[i] = (y[i] - mean_y[i]) / sd_y[i]; + + for (j in 1:M_K) { + mean_M[i][j] = mean(M[i][,j]); + sd_M[i][j] = sd(M[i][,j]); + M_std[i][,j] = (M[i][,j] - mean_M[i][j]) / sd_M[i][j]; + M_pred_std[i][,j] = (M_pred[i][,j] - mean_M[i][j]) / sd_M[i][j]; + } + } + } +} + +parameters { + real sigma[I]; + simplex[K] beta[I]; + vector[M_K] gamma[I]; +} + +model { + for(i in 1:I){ + sigma[i] ~ normal(0,1); + gamma[i] ~ normal(0,1); + target += normal_lpdf(y_std[i] | X_std*beta[i] + M_std[i]*gamma[i], + sigma[i]); + } +} + +generated quantities { + vector[N] y_sim[I]; + vector[N_pred] y_pred[I]; + for(i in 1:I){ + for (n in 1:N) { + y_sim[i][n] = normal_rng(X_std[n,]*beta[i] + M_std[i][n,]*gamma[i], + sigma[i]) * sd_y[i] + mean_y[i]; // + } + for (j in 1:N_pred) { + y_pred[i][j] = normal_rng(X_pred_std[j,]*beta[i] + + M_pred_std[i][j,]*gamma[i], sigma[i]) * sd_y[i] + + mean_y[i]; // + } + } +} diff --git a/inst/stan/model7.stan b/inst/stan/model7.stan new file mode 100644 index 0000000..1c83296 --- /dev/null +++ b/inst/stan/model7.stan @@ -0,0 +1,123 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with multiple treated units, a +// Gaussian process and covariates. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include staggered adoption. +data { + int N; // number of pre-intervention periods + int I; // number of treated units + vector[N] y[I]; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int M_K; // number of covariates + matrix[N,M_K] M[I]; // covariates + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors + matrix[N_pred,M_K] M_pred[I]; // covariates + +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[N] y_std[I]; + real mean_y[I]; + real sd_y[I]; + matrix[N, M_K] M_std[I]; + matrix[N_pred, M_K] M_pred_std[I]; + real time[N + N_pred] ; + int sumN = N + N_pred; + + for(t in 1:sumN){ + time[t] = t; + } + + { + vector[K] mean_X; + vector[K] sd_X; + vector[M_K] mean_M[I]; + vector[M_K] sd_M[I]; + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } + + for (i in 1:I){ + mean_y[i] = mean(y[i]); + sd_y[i] = sd(y[i]); + y_std[i] = (y[i] - mean_y[i]) / sd_y[i]; + + for (j in 1:M_K) { + mean_M[i][j] = mean(M[i][,j]); + sd_M[i][j] = sd(M[i][,j]); + M_std[i][,j] = (M[i][,j] - mean_M[i][j]) / sd_M[i][j]; + M_pred_std[i][,j] = (M_pred[i][,j] - mean_M[i][j]) / sd_M[i][j]; + } + } + } +} + +parameters { + real sigma[I]; + simplex[K] beta[I]; + vector[M_K] gamma[I]; + real rho[I]; + real alpha[I]; + vector[sumN] eta[I]; +} + +transformed parameters { + vector[sumN] f[I]; + for(i in 1:I){ + matrix[sumN,sumN] K_matrix = cov_exp_quad(time,alpha[i],rho[i]) + + diag_matrix(rep_vector(1e-9, sumN)); + matrix[sumN,sumN] L_K = cholesky_decompose(K_matrix); + f[i] = L_K * eta[i]; + } +} + +model { + for(i in 1:I){ + sigma[i] ~ normal(0,1); + gamma[i] ~ normal(0,1); + rho[i] ~ normal(0,3); + alpha[i] ~ normal(0,1); + eta[i] ~ normal(0,1); + target += normal_lpdf(y_std[i] | X_std*beta[i] + M_std[i]*gamma[i] + + f[i][1:N], sigma[i]); + } +} + +generated quantities { + vector[N] y_sim[I]; + vector[N_pred] y_pred[I]; + for(i in 1:I){ + for (n in 1:N) { + y_sim[i][n] = normal_rng(X_std[n,]*beta[i] + M_std[i][n,]*gamma[i] + + f[i][n], sigma[i]) * sd_y[i] + mean_y[i]; // + } + for (j in 1:N_pred) { + y_pred[i][j] = normal_rng(X_pred_std[j,]*beta[i] + + M_pred_std[i][j,]*gamma[i] + + f[i][N + j], sigma[i]) * sd_y[i] + + mean_y[i]; // + } + } +} diff --git a/inst/stan/model8.stan b/inst/stan/model8.stan new file mode 100644 index 0000000..10f71da --- /dev/null +++ b/inst/stan/model8.stan @@ -0,0 +1,105 @@ +// Copyright 2021 Google LLC +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// https://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// +// Simple Bayesian Synthetic control model with multiple treated units and +// a Gaussian Process. +// Model follows this paper: ... +// TODO(jvives): Specify the paper. +// This implementation does not include staggered adoption or covariates. +data { + int N; // number of pre-intervention periods + int I; // number of treated units + vector[N] y[I]; // outcome + int K; // number of donors + matrix[N,K] X; // pre-intervention outcome for donors + int N_pred; // number of post-intervention periods + matrix[N_pred,K] X_pred; // post-intervention outcome for donors +} + +transformed data{ + matrix[N, K] X_std; + matrix[N_pred, K] X_pred_std; + vector[N] y_std[I]; + real mean_y[I]; + real sd_y[I]; + real time[N + N_pred] ; + int sumN = N + N_pred; + + for(t in 1:sumN){ + time[t] = t; + } + + { + vector[K] mean_X; + vector[K] sd_X; + + for (k in 1:K) { + mean_X[k] = mean(X[,k]); + sd_X[k] = sd(X[,k]); + X_std[,k] = (X[,k] - mean_X[k]) / sd_X[k]; + X_pred_std[,k] = (X_pred[,k] - mean_X[k]) / sd_X[k]; + } + + for (i in 1:I){ + mean_y[i] = mean(y[i]); + sd_y[i] = sd(y[i]); + y_std[i] = (y[i] - mean_y[i]) / sd_y[i]; + } + } +} + +parameters { + real sigma[I]; + simplex[K] beta[I]; + real rho[I]; + real alpha[I]; + vector[sumN] eta[I]; +} + +transformed parameters { + vector[sumN] f[I]; + for(i in 1:I){ + matrix[sumN,sumN] K_matrix = cov_exp_quad(time,alpha[i],rho[i]) + + diag_matrix(rep_vector(1e-9, sumN)); + matrix[sumN,sumN] L_K = cholesky_decompose(K_matrix); + f[i] = L_K * eta[i]; + } +} + +model { + for(i in 1:I){ + sigma[i] ~ normal(0,1); + rho[i] ~ normal(0,3); + alpha[i] ~ normal(0,1); + eta[i] ~ normal(0,1); + target += normal_lpdf(y_std[i] | X_std*beta[i] + + f[i][1:N], sigma[i]); + } +} + +generated quantities { + vector[N] y_sim[I]; + vector[N_pred] y_pred[I]; + for(i in 1:I){ + for (n in 1:N) { + y_sim[i][n] = normal_rng(X_std[n,]*beta[i] + + f[i][n], sigma[i]) * sd_y[i] + mean_y[i]; // + } + for (j in 1:N_pred) { + y_pred[i][j] = normal_rng(X_pred_std[j,]*beta[i] + + f[i][N + j], sigma[i]) * sd_y[i] + + mean_y[i]; // + } + } +} diff --git a/man/bayesianFactor.Rd b/man/bayesianFactor.Rd new file mode 100644 index 0000000..0d7a761 --- /dev/null +++ b/man/bayesianFactor.Rd @@ -0,0 +1,260 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/bfactor.R +\name{bayesianFactor} +\alias{bayesianFactor} +\title{Create a Bayesian Synthetic Control Object Using Panel Data} +\value{ +vizdraws object with the relative bias with offset. +} +\description{ +A Bayesian Factor Model has raw data and draws from the posterior +distribution. This is represented by an R6 Class. + +Code and theory based on +\href{https://arxiv.org/abs/2103.16244}{Pinkney 2021}. + +public methods: +\itemize{ +\item \code{initialize()} initializes the variables and model parameters +\item \code{fit()} fits the stan model and returns a fit object +\item \code{updateWidth} updates the width of the credible interval +\item \code{placeboPlot} generates a counterfactual placebo plot +\item \code{effectPlot} returns a plot of the treatment effect over time +\item \code{summarizeLift}returns descriptive statistics of the lift estimate +\item \code{biasDraws} returns a plot of the relative bias in a LFM +\item \code{liftDraws} returns a plot of the posterior lift distribution +\item \code{liftBias} returns a plot of the relative bias given a lift offset +} +} +\section{Active bindings}{ +\if{html}{\out{
}} +\describe{ +\item{\code{timeTiles}}{ggplot2 object that shows when +the intervention happened.} + +\item{\code{plotData}}{tibble with the observed outcome and the +counterfactual data.} + +\item{\code{interventionTime}}{returns the intervention time period.} + +\item{\code{synthetic}}{ggplot2 object that shows the +observed and counterfactual outcomes over time.} +} +\if{html}{\out{
}} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-new}{\code{bayesianFactor$new()}} +\item \href{#method-fit}{\code{bayesianFactor$fit()}} +\item \href{#method-updateWidth}{\code{bayesianFactor$updateWidth()}} +\item \href{#method-summarizeLift}{\code{bayesianFactor$summarizeLift()}} +\item \href{#method-effectPlot}{\code{bayesianFactor$effectPlot()}} +\item \href{#method-liftDraws}{\code{bayesianFactor$liftDraws()}} +\item \href{#method-liftBias}{\code{bayesianFactor$liftBias()}} +\item \href{#method-biasDraws}{\code{bayesianFactor$biasDraws()}} +\item \href{#method-clone}{\code{bayesianFactor$clone()}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-new}{}}} +\subsection{Method \code{new()}}{ +Create a new bayesianFactor object. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$new( + data, + time, + id, + treated, + outcome, + ci_width = 0.75, + covariates +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{data}}{Long data.frame object with fields outcome, time, id, +and treatment indicator.} + +\item{\code{time}}{Name of the variable in the data frame that} + +\item{\code{id}}{Name of the variable in the data frame that +identifies the units (e.g. country, region etc).} + +\item{\code{treated}}{Name of the variable in the data frame that contains the +treatment assignment of the intervention.} + +\item{\code{outcome}}{Name of the outcome variable.} + +\item{\code{ci_width}}{Credible interval's width. This number is in the +(0,1) interval.} + +\item{\code{covariates}}{Dataframe with a column for {id} and the other columns +Defaults to NULL if no covariates should be included in the model.} +} +\if{html}{\out{
}} +} +\subsection{Details}{ +params described in the data structure section of the +documentation of the R6 class at the top of the file. +} + +\subsection{Returns}{ +A new \code{bayesianFactor} object. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-fit}{}}} +\subsection{Method \code{fit()}}{ +Fit Stan model. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$fit(L = 8, ...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{L}}{Number of factors.} + +\item{\code{...}}{other arguments passed to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-updateWidth}{}}} +\subsection{Method \code{updateWidth()}}{ +Update the width of the credible interval. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$updateWidth(ci_width = 0.75)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{ci_width}}{New width for the credible interval. This number +should be in the (0,1) interval.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-summarizeLift}{}}} +\subsection{Method \code{summarizeLift()}}{ +summarizeLift returns descriptive statistics of +the lift estimate. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$summarizeLift()}\if{html}{\out{
}} +} + +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-effectPlot}{}}} +\subsection{Method \code{effectPlot()}}{ +effectPlot returns ggplot2 object that shows the +effect of the intervention over time. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$effectPlot()}\if{html}{\out{
}} +} + +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-liftDraws}{}}} +\subsection{Method \code{liftDraws()}}{ +Plots lift. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$liftDraws(from, to, ...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{from}}{First period to consider when calculating lift. If infinite, +set to the time of the intervention.} + +\item{\code{to}}{Last period to consider when calculating lift. If infinite, set +to the last period.} + +\item{\code{...}}{other arguments passed to vizdraws::vizdraws().} +} +\if{html}{\out{
}} +} +\subsection{Returns}{ +vizdraws object with the posterior distribution of the lift. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-liftBias}{}}} +\subsection{Method \code{liftBias()}}{ +Plot bias magnitude in terms of lift for period \link{firstT, lastT} +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$liftBias(firstT, lastT, offset, ...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{firstT}}{start of the time period to compute relative bias +over. They must be after the intervention.} + +\item{\code{lastT}}{end of the Time period to compute relative bias +over. They must be after the intervention.} + +\item{\code{offset}}{Target lift \%.} + +\item{\code{...}}{other arguments passed to vizdraws::vizdraws().} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-biasDraws}{}}} +\subsection{Method \code{biasDraws()}}{ +Plots relative upper bias / tau for a time period \link{firstT, lastT}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$biasDraws(small_bias = 0.3, firstT, lastT)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{small_bias}}{Threshold value for considering the bias "small".} + +\item{\code{firstT, lastT}}{Time periods to compute relative bias over, they must +after the intervention.} +} +\if{html}{\out{
}} +} +\subsection{Returns}{ +vizdraw object with the posterior distribution of relative bias. +Bias is scaled by the time periods. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianFactor$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/bayesianSynth.Rd b/man/bayesianSynth.Rd new file mode 100644 index 0000000..32bc8ba --- /dev/null +++ b/man/bayesianSynth.Rd @@ -0,0 +1,314 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/factory.R +\name{bayesianSynth} +\alias{bayesianSynth} +\title{Create a Bayesian Synthetic Control Object Using Panel Data} +\value{ +vizdraws object with the relative bias with offset. +} +\description{ +A Bayesian Synthetic Control has raw data and draws from the posterior +distribution. This is represented by an R6 Class. + +public methods: +\itemize{ +\item \code{initialize()} initializes the variables and model parameters +\item \code{fit()} fits the stan model and returns a fit object +\item \code{updateWidth} updates the width of the credible interval +\item \code{placeboPlot} generates a counterfactual placebo plot +\item \code{effectPlot} returns a plot of the treatment effect over time +\item \code{summarizeLift}returns descriptive statistics of the lift estimate +\item \code{biasDraws} returns a plot of the relative bias in a LFM +\item \code{liftDraws} returns a plot of the posterior lift distribution +\item \code{liftBias} returns a plot of the relative bias given a lift offset +Data structure: +} +} +\section{Active bindings}{ +\if{html}{\out{
}} +\describe{ +\item{\code{timeTiles}}{ggplot2 object that shows when +the intervention happened.} + +\item{\code{plotData}}{returns tibble with the observed outcome and the +counterfactual data.} + +\item{\code{interventionTime}}{returns intervention time period (e.g., year) +in which the treatment occurred.} + +\item{\code{synthetic}}{returns ggplot2 object that shows the +observed and counterfactual outcomes over time.} + +\item{\code{checks}}{returns MCMC checks.} + +\item{\code{lift}}{draws from the posterior distribution of the lift.} +} +\if{html}{\out{
}} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-new}{\code{bayesianSynth$new()}} +\item \href{#method-fit}{\code{bayesianSynth$fit()}} +\item \href{#method-updateWidth}{\code{bayesianSynth$updateWidth()}} +\item \href{#method-summarizeLift}{\code{bayesianSynth$summarizeLift()}} +\item \href{#method-effectPlot}{\code{bayesianSynth$effectPlot()}} +\item \href{#method-placeboPlot}{\code{bayesianSynth$placeboPlot()}} +\item \href{#method-biasDraws}{\code{bayesianSynth$biasDraws()}} +\item \href{#method-liftDraws}{\code{bayesianSynth$liftDraws()}} +\item \href{#method-liftBias}{\code{bayesianSynth$liftBias()}} +\item \href{#method-clone}{\code{bayesianSynth$clone()}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-new}{}}} +\subsection{Method \code{new()}}{ +Create a new bayesianSynth object. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$new( + data, + time, + id, + treated, + outcome, + ci_width = 0.75, + gp = FALSE, + covariates = NULL, + predictor_match = FALSE, + predictor_match_covariates0 = NULL, + predictor_match_covariates1 = NULL, + vs = NULL +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{data}}{Long data.frame object with fields outcome, time, id, +and treatment indicator.} + +\item{\code{time}}{Name of the variable in the data frame that +identifies the time period (e.g. year, month, week etc).} + +\item{\code{id}}{Name of the variable in the data frame that +identifies the units (e.g. country, region etc).} + +\item{\code{treated}}{Name of the variable in the data frame that contains +the treatment assignment of the intervention.} + +\item{\code{outcome}}{Name of the outcome variable.} + +\item{\code{ci_width}}{Credible interval's width. This number is in the +\link{0,1} interval.} + +\item{\code{gp}}{Logical that indicates whether or not to include a +Gaussian Process as part of the model.} + +\item{\code{covariates}}{Data.frame with time dependent covariates for +for each unit and time field. +Defaults to NULL if no covariates should be included in the model.} + +\item{\code{predictor_match}}{Logical that indicates whether or not to run +the matching version of the Bayesian Synthetic Control. This option can +not be used with gp, covariates or multiple treated units.} + +\item{\code{predictor_match_covariates0}}{data.frame with time independent +covariates on each row and column indicating the control unit names +(dim k x J+1).} + +\item{\code{predictor_match_covariates1}}{Vector with time independent +covariates for the treated unit (dim k x 1).} + +\item{\code{vs}}{Vector of weights for the importance of the predictors used +in creating the synthetic control. +Defaults to equal weight for all predictors.} +} +\if{html}{\out{
}} +} +\subsection{Returns}{ +A new \code{bayesianSynth} object. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-fit}{}}} +\subsection{Method \code{fit()}}{ +Fit Stan model. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$fit(...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{...}}{other arguments passed to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-updateWidth}{}}} +\subsection{Method \code{updateWidth()}}{ +Update the width of the credible interval. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$updateWidth(ci_width = 0.75)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{ci_width}}{New width for the credible interval. This number +should be in the (0,1) interval.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-summarizeLift}{}}} +\subsection{Method \code{summarizeLift()}}{ +returns descriptive statistics of the lift estimate. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$summarizeLift()}\if{html}{\out{
}} +} + +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-effectPlot}{}}} +\subsection{Method \code{effectPlot()}}{ +effect ggplot2 object that shows the +effect of the intervention over time. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$effectPlot(facet = TRUE, subset = NULL)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{facet}}{Boolean that is TRUE if we want to divide the plot for each +unit.} + +\item{\code{subset}}{Set of units to use in the effect plot.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-placeboPlot}{}}} +\subsection{Method \code{placeboPlot()}}{ +Plot placebo intervention. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$placeboPlot(periods, ...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{periods}}{Positive number of periods for the placebo intervention.} + +\item{\code{...}}{other arguments passed to \code{\link[rstan:stanmodel-method-sampling]{rstan::sampling()}}.} +} +\if{html}{\out{
}} +} +\subsection{Returns}{ +ggplot2 object for placebo treatment effect. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-biasDraws}{}}} +\subsection{Method \code{biasDraws()}}{ +Plots relative upper bias / tau for a time period \link{firstT, lastT}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$biasDraws(small_bias = 0.3, firstT, lastT)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{small_bias}}{Threshold value for considering the bias "small".} + +\item{\code{firstT, lastT}}{Time periods to compute relative bias over, they +must be after the intervention.} +} +\if{html}{\out{
}} +} +\subsection{Returns}{ +vizdraw object with the posterior distribution of relative bias. +Bias is scaled by the time periods. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-liftDraws}{}}} +\subsection{Method \code{liftDraws()}}{ +Plots lift. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$liftDraws(from, to, ...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{from}}{First period to consider when calculating lift. If infinite, +set to the time of the intervention.} + +\item{\code{to}}{Last period to consider when calculating lift. If infinite, set +to the last period.} + +\item{\code{...}}{other arguments passed to vizdraws::vizdraws().} +} +\if{html}{\out{
}} +} +\subsection{Returns}{ +vizdraws object with the posterior distribution of the lift. +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-liftBias}{}}} +\subsection{Method \code{liftBias()}}{ +Plot Bias magnitude in terms of lift for period \link{firstT, lastT} +pre_MADs / y0 relative to lift thresholds. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$liftBias(firstT, lastT, offset, ...)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{firstT}}{start of the time period to compute relative bias +over. They must be after the intervention.} + +\item{\code{lastT}}{end of the Time period to compute relative bias +over. They must be after the intervention.} + +\item{\code{offset}}{Target lift \%.} + +\item{\code{...}}{other arguments passed to vizdraws::vizdraws().} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{bayesianSynth$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/bsynth-package.Rd b/man/bsynth-package.Rd new file mode 100644 index 0000000..75d84e5 --- /dev/null +++ b/man/bsynth-package.Rd @@ -0,0 +1,14 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/bsynth-package.R +\docType{package} +\name{bsynth-package} +\alias{bsynth-package} +\alias{bsynth} +\title{The 'bsynth' package.} +\description{ +Provides causal inference with a Bayesian synthetic +control method. +} +\references{ +Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org +} diff --git a/man/pipe.Rd b/man/pipe.Rd new file mode 100644 index 0000000..0eec752 --- /dev/null +++ b/man/pipe.Rd @@ -0,0 +1,12 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils-pipe.R +\name{\%>\%} +\alias{\%>\%} +\title{Pipe operator} +\usage{ +lhs \%>\% rhs +} +\description{ +See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. +} +\keyword{internal} diff --git a/src/Makevars.in b/src/Makevars.in new file mode 100644 index 0000000..e48b4c9 --- /dev/null +++ b/src/Makevars.in @@ -0,0 +1,23 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +# Generated by rstantools. Do not edit by hand. + +STANHEADERS_SRC = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "message()" -e "cat(system.file('include', 'src', package = 'StanHeaders', mustWork = TRUE))" -e "message()" | grep "StanHeaders") + +PKG_CPPFLAGS = -I"../inst/include" -I"$(STANHEADERS_SRC)" -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -DBOOST_MATH_OVERFLOW_ERROR_POLICY=errno_on_error +PKG_CXXFLAGS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "RcppParallel::CxxFlags()") $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "StanHeaders:::CxxFlags()") +PKG_LIBS = $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "RcppParallel::RcppParallelLibs()") $(shell "$(R_HOME)/bin$(R_ARCH_BIN)/Rscript" -e "StanHeaders:::LdFlags()") + +CXX_STD = CXX14 diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 0000000..20f9d63 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,18 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +library(testthat) +library(bsynth) + +test_check("bsynth") diff --git a/tests/testthat/test-factory.R b/tests/testthat/test-factory.R new file mode 100644 index 0000000..f667160 --- /dev/null +++ b/tests/testthat/test-factory.R @@ -0,0 +1,26 @@ +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +test_that("the factory works", { + # Create a new factory + germany_synth <- bayesianSynth$new( + data = germany, + time = year, + id = country, + treated = D, + outcome = gdp, + ci_width = 0.95 + ) + expect_is(germany_synth, "bayesianSynth") +}) diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 0000000..097b241 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/germany.Rmd b/vignettes/germany.Rmd new file mode 100644 index 0000000..951e1de --- /dev/null +++ b/vignettes/germany.Rmd @@ -0,0 +1,284 @@ +--- +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +title: "Germany Reunification" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Germany Reunification} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + echo = TRUE, + warning = FALSE, + message = FALSE, + out.width = '90%', + out.height = '600px', + fig.align='center' +) +set.seed(123) +``` + +## Background + +In this vignette I show how a simple Bayesian model can be used to generate a synthetic control to estimate the effect of the 1990 German reunification on per-capita GDP in West Germany. Figure 1 (Figure 1a, [Abadie, Diamond, and Hainmueller 2015](https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12116)) shows the trajectory of GDP per-capita for West Germany and for a synthetic control generated using a the canonical approach described in ([Abadie and Gardeazabal, 2003](https://www.aeaweb.org/articles?id=10.1257/000282803321455188); [Abadie et al., 2010](https://www.aeaweb.org/articles?id=10.1257/000282803321455188)). + + +```{r original_figure, echo = FALSE, fig.cap = "Figure 1: Trends in per Capita GDP: West Germany versus Synthetic West Germany"} +knitr::include_graphics(path = "https://ignacio.martinez.fyi/synthetic_control/german_reunification.png") + +``` + +## Bayesian Synthetic Control + +Let $y^\star_t$ be the standardized per Capita GDP of West Germany in period $t$, +and $x^\star_t$ be a vector of the standardized per Capita GDP of +the donor countries in period $t$. + +$$ +\begin{aligned} +y^\star_t &\sim N(x^\star_t\beta , \sigma^2) \\ +\beta &\sim \text{Dir}(1)\\ +\sigma &\sim N^+(0,1) +\end{aligned} +$$ + +Notice that $\beta$ is a simplex. Therefore, the weights ares bound to be +positive and sum to 1. Further more, notice that I use a non-informative prior +for the weights. + + +```{r time_tiles} +library(dplyr) +library(ggplot2) +library(bsynth) +# Create a new factory +germany_synth <- bayesianSynth$new(data = germany, + time = year, + id = country, + treated = D, + outcome = gdp, + ci_width = 0.95) +# Visualize the timeline +germany_synth$timeTiles + + ggplot2::xlab("Year") + + ggplot2::ylab("Country") + +``` + +```{r results="hide"} +# Fit the model +germany_synth$fit() + +# Visualize the synthetic control +germany_synth$synthetic + + ggplot2::xlab("Year") + + ggplot2::ylab("Per Capita GDP (PPP, 2002 USD)") + + ggplot2::scale_y_continuous(labels=scales::dollar_format()) + +# Visualize the treatment effect +germany_synth$effect() + + ggplot2::xlab("Year") + + ggplot2::ylab("Gap in Per Capita GDP (PPP, 2002 USD)") + + ggplot2::scale_y_continuous(labels = scales::dollar_format()) + +``` + +## GSynth + +In this section of the vignette I use the generalized synthetic control method based on interactive fixed effect models and implemented in [{gsynth}](https://github.com/xuyiqing/gsynth) to provide an additional comparison. + +```{r gsynth} + +library(gsynth) + +out <- + gsynth( + gdp ~ D, + data = germany, + index = c("country", "year"), + force = "two-way", + CV = TRUE, + r = c(0, 5), + se = TRUE, + inference = "parametric", + nboots = 10000, + parallel = FALSE, + cores = 4 + ) + +gsynth_plot <- plot(out, + xlab = "Period", + main = "GSynth: German Reunification", + ylab = "Effect") + +``` + +## Side by Side + +```{r side_by_side} + +intervention <- germany %>% + filter(D == 1) %>% + summarise(year = min(year)) %>% + pull(year) + + +bsynth_plot_data <- + germany_synth$plotData %>% + select(x = year, y = tau, ymin = tau_LB, ymax = tau_UB) %>% + mutate(method = "bsynth") + +timeXwalk <- germany %>% + select(year) %>% + distinct() %>% + arrange(year) %>% + mutate(t = 1:n()) + +gsynth_plot_data <- gsynth_plot$data %>% + mutate(t = time + 30) %>% + inner_join(timeXwalk, by = "t") %>% + select(x = year, y = ATT, ymin = CI.lower, ymax = CI.upper) %>% + mutate(method = "GSynth") + +plot_data <- bind_rows(gsynth_plot_data, bsynth_plot_data) + +ggplot(data = plot_data, aes(x = x)) + + geom_line(aes(y = y)) + + geom_ribbon(aes(ymin = ymin, ymax = ymax), + color = "gray", + alpha = 0.2) + + theme_bw(base_size = 14) + + theme( + legend.position = "none", + panel.border = element_blank(), + axis.line = element_line() + ) + + geom_vline(xintercept = intervention, linetype = "dashed") + + facet_grid(cols = dplyr::vars(method)) + + xlab("Year") + + ylab("Gap in Per Capita GDP (PPP, 2002 USD)") + + scale_y_continuous(labels = scales::dollar_format()) + + +``` + +## Placebo + +### GSynth + +```{r} + +placebo_germany <- germany %>% + filter(year < intervention) %>% + mutate(D = case_when( + (country == "West Germany" & + year >= (intervention - lubridate::years(5)) )~ 1, + TRUE ~ 0 + )) + +out <- + gsynth( + gdp ~ D, + data = placebo_germany, + index = c("country", "year"), + force = "two-way", + CV = TRUE, + r = c(0, 5), + se = TRUE, + inference = "parametric", + nboots = 10000, + parallel = FALSE, + cores = 4 + ) + +gsynth_plot <- plot(out, + xlab = "Period", + main = "GSynth: Placebo German Reunification", + ylab = "Effect") +``` + + +### bsynth + +```{r results="hide"} +germany_synth_placebo <- bayesianSynth$new( + data = placebo_germany, + time = year, + id = country, + treated = D, + outcome = gdp, + ci_width = 0.95 +) + +germany_synth_placebo$fit() + +germany_synth_placebo$timeTiles + + +germany_synth_placebo$effect() + + ggplot2::xlab("Year") + + ggplot2::ylab("Gap in Per Capita GDP (PPP, 2002 USD)") + + ggplot2::scale_y_continuous(labels = scales::dollar_format()) + +``` + + +### Side by Side + +```{r} +bsynth_plot_data <- + germany_synth_placebo$plotData %>% + select(x = year, y = tau, ymin = tau_LB, ymax = tau_UB) %>% + mutate(method = "bsynth") + +timeXwalk <- germany %>% + select(year) %>% + distinct() %>% + arrange(year) %>% + mutate(t = 1:n()) + +gsynth_plot_data <- gsynth_plot$data %>% + mutate(t = time + 25) %>% + inner_join(timeXwalk, by = "t") %>% + select(x = year, y = ATT, ymin = CI.lower, ymax = CI.upper) %>% + mutate(method = "GSynth") + +plot_data <- bind_rows(gsynth_plot_data, bsynth_plot_data) + +ggplot(data = plot_data, aes(x = x)) + + geom_line(aes(y = y)) + + geom_ribbon(aes(ymin = ymin, ymax = ymax), + color = "gray", + alpha = 0.2) + + theme_bw(base_size = 14) + + theme( + legend.position = "none", + panel.border = element_blank(), + axis.line = element_line() + ) + + geom_vline(xintercept = intervention - lubridate::years(5), + linetype = "dashed") + + facet_grid(cols = dplyr::vars(method)) + + xlab("Year") + + ylab("Gap in Per Capita GDP (PPP, 2002 USD)") + + scale_y_continuous(labels = scales::dollar_format()) +``` + diff --git a/vignettes/multiple_treated.Rmd b/vignettes/multiple_treated.Rmd new file mode 100644 index 0000000..4240d62 --- /dev/null +++ b/vignettes/multiple_treated.Rmd @@ -0,0 +1,99 @@ +--- +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +title: "Multiple Treated" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Multiple Treated} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + echo = TRUE, + warning = FALSE, + message = FALSE, + out.width = '90%', + out.height = '600px', + fig.align='center' +) +set.seed(123) +``` + +## Background + +In this vignette I show how {bsynth} can be used for causal estimation when you + have multiple treated accounts. In particular, I use the synthetic data exaple + from [Xu, Yiqing, 2017](http://dx.doi.org/10.1017/pan.2016.2). In this example, + the treatment starts in period 21 and increases by one each period + (e.g. the effect is 5 period 25 and 10 in period 30). + +```{r original_figure, echo = FALSE, fig.cap = "Figure 1: Xu, Yiqing, 2017", out.height = '400px'} +knitr::include_graphics(path = "https://ignacio.martinez.fyi/synthetic_control/gsynth.png") + +``` + +## Bayesian Synthetic Control with Covariates + +```{r setup, out.height = '800px'} +library(bsynth) +ci_width <- 0.95 +data(gsynth, package = "gsynth") +dplyr::glimpse(simdata) + +outcome_data <- simdata %>% + dplyr::select(time, id, D, Y) + +covariates <- simdata %>% + dplyr::select(time, id, X1, X2) + +synth <- + bsynth::bayesianSynth$new( + data = outcome_data, + time = time, + id = id, + treated = D, + outcome = Y, + ci_width = ci_width, + covariates = covariates + ) + +synth$timeTiles + + ggplot2::theme(text = ggplot2::element_text(size=6)) + +``` + +### Fit + +```{r fit, results="hide"} +synth$fit(cores = 4) +``` + +### Visualize the synthetic controls for each treated unit + +```{r} +synth$synthetic +``` +### Visualize the treatment effect + +```{r} +synth$effect(subset = c("Average"), facet = FALSE) + + ggplot2::scale_y_continuous(breaks=seq(-2,12,2)) +``` + diff --git a/vignettes/prop99.Rmd b/vignettes/prop99.Rmd new file mode 100644 index 0000000..7b56f43 --- /dev/null +++ b/vignettes/prop99.Rmd @@ -0,0 +1,110 @@ +--- +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +title: "Proposition 99" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Proposition 99} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + echo = TRUE, + warning = FALSE, + message = FALSE, + out.width = '90%', + out.height = '600px', + fig.align='center' +) +set.seed(123) +``` + + +```{r time_tiles, out.height = '900px'} +library(bsynth) + +prop99_synth <- + bayesianSynth$new( + data = prop99_df, + time = year, + id = state, + treated = D, + outcome = packs, + GP = FALSE, + ci_width = 0.95 + ) + +prop99_synth$timeTiles + + ggplot2::xlab("Year") + + ggplot2::ylab("State") + + ggplot2::theme(text = ggplot2::element_text(size=12)) + + +``` + +## Fit without Gaussian Process + +```{r results="hide"} +prop99_synth$fit(control = list(adapt_delta = 0.999, max_treedepth = 15), + cores = 4, + refresh = 0, + seed = 1982, + warmup = 4000, + iter = 6000) + +prop99_synth$synthetic + + ggplot2::xlab("Year") + + ggplot2::ylab("Per-Capita Cigarrette Sales (in Packs)") + +prop99_synth$effect() + + ggplot2::xlab("Year") + + ggplot2::ylab("Change in Per-Capita Cigarrette Sales (in Packs)") +``` + +## Fit with Gaussian Process + +```{r results="hide"} +prop99_synth_gp <- + bayesianSynth$new( + data = prop99_df, + time = year, + id = state, + treated = D, + outcome = packs, + GP = TRUE, + ci_width = 0.95 + ) + +prop99_synth_gp$fit(control = list(adapt_delta = 0.999, max_treedepth = 15), + cores = 4, + refresh = 0, + seed = 1982, + warmup = 4000, + iter = 6000) + +prop99_synth_gp$synthetic + + ggplot2::xlab("Year") + + ggplot2::ylab("Per-Capita Cigarrette Sales (in Packs)") + +prop99_synth_gp$effect() + + ggplot2::xlab("Year") + + ggplot2::ylab("Change in Per-Capita Cigarrette Sales (in Packs)") +``` + diff --git a/vignettes/with_covariates.Rmd b/vignettes/with_covariates.Rmd new file mode 100644 index 0000000..24c77a1 --- /dev/null +++ b/vignettes/with_covariates.Rmd @@ -0,0 +1,216 @@ +--- +## Copyright 2021 Google LLC +## +## Licensed under the Apache License, Version 2.0 (the "License"); +## you may not use this file except in compliance with the License. +## You may obtain a copy of the License at +## +## https://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## +--- +title: "With Covariates" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{With Covariates} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + echo = TRUE, + warning = FALSE, + message = FALSE, + out.width = '90%', + out.height = '600px', + fig.align='center' +) +``` + +## Generate Synthetic Data + +```{r} +set.seed(1) +library(dplyr) +library(ggplot2) + +true_lift <- tibble(t = 1:100) %>% + mutate( + x = t - 71, + lift = case_when(x < 0 ~ 0, + TRUE ~ dgamma(x = x, shape = 2, scale = 10)), + lift = 3*lift) %>% + select(-x) + + + + +fake_data <- + tibble(x1 = 100 + arima.sim(model = list(ar = 0.99), n = 100), + x2 = 300 + arima.sim(model = list(ar = 0.57), n = 100), + x3 = 50 + arima.sim(model = list(ar = 0.79, ma = 0.8), n = 100), + m1 = rnorm(n = 100, mean = -2, sd = 1), + m2 = rnorm(n = 100, mean = 0.5, sd = 1), + m3 = rnorm(n = 100, mean = 0.25, sd = 1), + m4 = rnorm(n = 100, mean = 0.15, sd = 1)) %>% + mutate( + y = as.numeric(0.2 * x1 + + 0.3 * x2 + + 0.5 * x3 - + 6.5*m1 - + 3.1*m2 - + 2.3*m3 - + 1.1*m4 + + rnorm(100)), + t = 1:n() + ) %>% + inner_join(true_lift, by = "t") %>% + mutate( + y0 = y, + d = y * lift, + y = y * (1+lift) + ) + +true_lift <- fake_data %>% + select(t, lift, d) + +true_lift_table <- fake_data %>% + filter(t >= 71) %>% + summarise(sum_y0 = sum(y0), + sum_y1 = sum(y), + lift = (sum_y1 - sum_y0)/sum_y0) + +truth <- true_lift_table %>% + pull(lift) + +fake_data <- fake_data%>% + select(-lift, -d, -y0) + +ggplot(data = fake_data, aes(x=t, y=y)) + + geom_line() + + xlab("Time") + + ylab("y") + + theme_bw() + + geom_vline(xintercept = 71, linetype = "dashed") + + ggtitle(label = "Time Series of Interest", + subtitle = "Intervention happens in period 71") + +ggplot(data = true_lift, aes(x=t,y=lift)) + + geom_line() + + xlab("Time") + + ylab("Lift") + + theme_bw() + + scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + + geom_vline(xintercept = 71, linetype = "dashed") + + ggtitle(label = "True Lift", + subtitle = "Intervention happens in period 71") +``` + +* In these synthetic data the true lift between periods 71 and 100 is `r scales::percent(truth, accuracy = 0.1)`. + + +## Simplest Model + + +```{r} +library(bsynth) + +ci_width <- 0.95 + +long_data <- fake_data %>% + select(-m1, -m2, -m3, -m4) %>% + tidyr::pivot_longer(-t, names_to = "id", values_to = "y") %>% + mutate(D = case_when(id == "y" & t >= 71 ~ 1, + TRUE ~ 0)) + +synth <- bsynth::bayesianSynth$new( + data = long_data, + time = t, + id = id, + treated = D, + outcome = y, + ci_width = ci_width +) + + +synth$fit(cores = 4) + +synth$effect() + +synth$synthetic + +synth$liftDraws( + from = 71, + to = 100, + breaks = c(0.01, 0.06, 0.09), + break_names = c("Not worth it", + "Worth it", + "Very worth it", + "Amazing") +) + +point <- synth$summarizeLift() + + +``` +* We estimate a counterfactual lift between periods 71 and 100 of +`r scales::percent(point[[1]], accuracy = 0.1)` with a +`r scales::percent(ci_width, accuracy = 1)` probability that is as low as +`r scales::percent(point[[2]], accuracy = 0.1)` and as high as +`r scales::percent(point[[3]], accuracy = 0.1)`. + + +## With Covariates + +```{r} +covariates <- fake_data %>% + select(t, m1, m2, m3, m4) + +synth <- bsynth::bayesianSynth$new( + data = long_data, + time = t, + id = id, + treated = D, + outcome = y, + ci_width = ci_width, + covariates = covariates +) + + +synth$fit(cores = 4) + +synth$effect() + +synth$synthetic + +synth$liftDraws( + from = 71, + to = 100, + breaks = c(0.01, 0.06, 0.09), + break_names = c("Not worth it", + "Worth it", + "Very worth it", + "Amazing") +) + +point <- synth$summarizeLift() + +``` + +* We estimate a counterfactual lift between periods 71 and 100 of +`r scales::percent(point[[1]], accuracy = 0.1)` with a +`r scales::percent(ci_width, accuracy = 1)` probability that is as low as +`r scales::percent(point[[2]], accuracy = 0.1)` and as high as +`r scales::percent(point[[3]], accuracy = 0.1)`. + + + +