Fix S-Learner's leakage #79

kklein · 2024-08-10T16:55:08Z

This PR seeks to address @ArseniyZvyagintsevQC 's finding that the current implementation of the S-Learner's estimation of the conditional average outcomes is not quite kosher in the in-sample scenario.

Concretely, having observed $X_i, Y_i, W_i=k$, we currently consider $i$ to be unseen when estimating $\mathbb{E}[Y_i|X_i,W_i=k']$ if $k' \neq k$. Yet, the estimator has seen $Y_i$, which may lead to some leakage.

Checklist

Added a CHANGELOG.rst entry

kklein · 2024-08-10T16:57:45Z

benchmarks/readme.md

@@ -32,12 +32,12 @@ on ground truth CATEs:

 | S-learner                                                     | causalml_in_sample | causalml_oos | econml_in_sample | econml_oos | metalearners_in_sample | metalearners_oos |
 | :------------------------------------------------------------ | -----------------: | -----------: | ---------------: | ---------: | ---------------------: | ---------------: |
-| synthetic_data_continuous_outcome_binary_treatment_linear_te  |            14.5706 |      14.6248 |          14.5706 |    14.6248 |                14.5729 |          14.6248 |
-| synthetic_data_binary_outcome_binary_treatment_linear_te      |           0.229101 |     0.228616 |              nan |        nan |               0.229231 |           0.2286 |
-| twins_pandas                                                  |           0.314253 |     0.318554 |              nan |        nan |               0.371613 |         0.319028 |


Note that the benchmarks were actually quite indicative beforehand! We were doing quite a bit worse than causalml in the in-sample scenario before this change.

codecov · 2024-08-10T17:16:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.41%. Comparing base (d00947a) to head (7124492).
Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
- Coverage   94.43%   94.41%   -0.02%     
==========================================
  Files          15       15              
  Lines        1779     1774       -5     
==========================================
- Hits         1680     1675       -5     
  Misses         99       99

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kklein added 3 commits August 10, 2024 18:47

Update benchmark values.

802c2b7

Fix S-Learner's leakage.

84e8c3d

Add changelog entry.

5bedd71

kklein commented Aug 10, 2024

View reviewed changes

kklein marked this pull request as ready for review August 10, 2024 16:59

Fix date in changelog.

7124492

kklein requested review from ArseniyZvyagintsevQC and MatthiasLoefflerQC August 10, 2024 17:12

MatthiasLoefflerQC approved these changes Aug 10, 2024

View reviewed changes

ArseniyZvyagintsevQC approved these changes Aug 12, 2024

View reviewed changes

kklein merged commit 4409cc5 into main Aug 12, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix S-Learner's leakage #79

Fix S-Learner's leakage #79

kklein commented Aug 10, 2024 •

edited

Loading

kklein Aug 10, 2024

codecov bot commented Aug 10, 2024 •

edited

Loading

Fix S-Learner's leakage #79

Fix S-Learner's leakage #79

Conversation

kklein commented Aug 10, 2024 • edited Loading

Checklist

kklein Aug 10, 2024

Choose a reason for hiding this comment

codecov bot commented Aug 10, 2024 • edited Loading

Codecov Report

kklein commented Aug 10, 2024 •

edited

Loading

codecov bot commented Aug 10, 2024 •

edited

Loading