-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix S-Learner's leakage #79
Conversation
@@ -32,12 +32,12 @@ on ground truth CATEs: | |||
|
|||
| S-learner | causalml_in_sample | causalml_oos | econml_in_sample | econml_oos | metalearners_in_sample | metalearners_oos | | |||
| :------------------------------------------------------------ | -----------------: | -----------: | ---------------: | ---------: | ---------------------: | ---------------: | | |||
| synthetic_data_continuous_outcome_binary_treatment_linear_te | 14.5706 | 14.6248 | 14.5706 | 14.6248 | 14.5729 | 14.6248 | | |||
| synthetic_data_binary_outcome_binary_treatment_linear_te | 0.229101 | 0.228616 | nan | nan | 0.229231 | 0.2286 | | |||
| twins_pandas | 0.314253 | 0.318554 | nan | nan | 0.371613 | 0.319028 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the benchmarks were actually quite indicative beforehand! We were doing quite a bit worse than causalml in the in-sample scenario before this change.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #79 +/- ##
==========================================
- Coverage 94.43% 94.41% -0.02%
==========================================
Files 15 15
Lines 1779 1774 -5
==========================================
- Hits 1680 1675 -5
Misses 99 99 ☔ View full report in Codecov by Sentry. |
This PR seeks to address @ArseniyZvyagintsevQC 's finding that the current implementation of the S-Learner's estimation of the conditional average outcomes is not quite kosher in the in-sample scenario.
Concretely, having observed$X_i, Y_i, W_i=k$ , we currently consider $i$ to be unseen when estimating $\mathbb{E}[Y_i|X_i,W_i=k']$ if $k' \neq k$ . Yet, the estimator has seen $Y_i$ , which may lead to some leakage.
Checklist
CHANGELOG.rst
entry