-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fixed propensity example * Fix import * Fix import * Update docs/examples/example_propensity.ipynb Co-authored-by: Kevin Klein <[email protected]> * Update docs/examples/example_propensity.ipynb Co-authored-by: Kevin Klein <[email protected]> * Fix formating * Update docs/examples/example_propensity.ipynb Co-authored-by: Kevin Klein <[email protected]> * Update docs/examples/example_propensity.ipynb Co-authored-by: Kevin Klein <[email protected]> * Fix formating * Modify note * Update docs/examples/example_propensity.ipynb Co-authored-by: Kevin Klein <[email protected]> --------- Co-authored-by: Kevin Klein <[email protected]>
- Loading branch information
1 parent
a1ee27c
commit 2f428fb
Showing
2 changed files
with
215 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# What if I know the propensity score?\n", | ||
"\n", | ||
"In some experiment settings we may know beforehand the probabilities of treatment assignments,\n", | ||
"e.g. if we have data from a {term}`RCT<Randomized Control Trial (RCT)>` with known treatment\n", | ||
"probabilities.\n", | ||
"\n", | ||
"In that case we may not want to learn a propensity model rather just use the known probabilities.\n", | ||
"\n", | ||
"Loading the data\n", | ||
"----------------\n", | ||
"\n", | ||
"Just like in our {ref}`example on estimating CATEs with a MetaLearner\n", | ||
"<example-basic>`, we will first load some experiment data:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "plaintext" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import pandas as pd\n", | ||
"from pathlib import Path\n", | ||
"from git_root import git_root\n", | ||
"\n", | ||
"df = pd.read_csv(git_root(\"data/learning_mindset.zip\"))\n", | ||
"outcome_column = \"achievement_score\"\n", | ||
"treatment_column = \"intervention\"\n", | ||
"feature_columns = [\n", | ||
" column for column in df.columns if column not in [outcome_column, treatment_column]\n", | ||
"]\n", | ||
"categorical_feature_columns = [\n", | ||
" \"ethnicity\",\n", | ||
" \"gender\",\n", | ||
" \"frst_in_family\",\n", | ||
" \"school_urbanicity\",\n", | ||
" \"schoolid\",\n", | ||
"]\n", | ||
"# Note that explicitly setting the dtype of these features to category\n", | ||
"# allows both lightgbm as well as shap plots to\n", | ||
"# 1. Operate on features which are not of type int, bool or float\n", | ||
"# 2. Correctly interpret categoricals with int values to be\n", | ||
"# interpreted as categoricals, as compared to ordinals/numericals.\n", | ||
"for categorical_feature_column in categorical_feature_columns:\n", | ||
" df[categorical_feature_column] = df[categorical_feature_column].astype(\"category\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Creating our own estimator\n", | ||
"--------------------------\n", | ||
"\n", | ||
"In this tutorial we will assume that we know that all observations were assigned to the\n", | ||
"treatment with a fixed probability of 0.3, which is close to the fraction of the observations\n", | ||
"assigned to the treatment group:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "plaintext" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"df[treatment_column].mean()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"```{note}\n", | ||
"The fact that we have a fixed propensity score for all observations is not true for this\n", | ||
"dataset, we just use it for illustrational purposes.\n", | ||
"```\n", | ||
"\n", | ||
"Now we can define our custom ``sklearn``-like classifier. We recommend inheriting from\n", | ||
"the ``sklearn`` base classes and following the rules explained in the\n", | ||
"[sklearn documentation](https://scikit-learn.org/stable/developers/develop.html) to avoid\n", | ||
"having to define helper functions and ensure the correct functionality of the ``metalearners``\n", | ||
"library." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "plaintext" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from sklearn.base import BaseEstimator, ClassifierMixin\n", | ||
"from typing import Any\n", | ||
"from typing_extensions import Self\n", | ||
"import numpy as np\n", | ||
"import pandas as pd\n", | ||
"\n", | ||
"\n", | ||
"class FixedPropensityModel(ClassifierMixin, BaseEstimator):\n", | ||
" def __init__(self, propensity_score: float) -> None:\n", | ||
" self.propensity_score = propensity_score\n", | ||
"\n", | ||
" def fit(self, X: pd.DataFrame, y: pd.Series) -> Self:\n", | ||
" self.classes_ = np.unique(y.to_numpy()) # sklearn requires this\n", | ||
" return self\n", | ||
"\n", | ||
" def predict(self, X: pd.DataFrame) -> np.ndarray[Any, Any]:\n", | ||
" return np.argmax(self.predict_proba(X), axis=1)\n", | ||
"\n", | ||
" def predict_proba(self, X: pd.DataFrame) -> np.ndarray[Any, Any]:\n", | ||
" return np.full((len(X), 2), [1 - self.propensity_score, self.propensity_score])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Fitting the MetaLearner\n", | ||
"-----------------------\n", | ||
"\n", | ||
"Finally we can instantiate and fit our MetaLearner using our own custom propensity model:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "plaintext" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from metalearners import RLearner\n", | ||
"from lightgbm import LGBMRegressor\n", | ||
"\n", | ||
"rlearner = RLearner(\n", | ||
" nuisance_model_factory=LGBMRegressor,\n", | ||
" propensity_model_factory=FixedPropensityModel,\n", | ||
" treatment_model_factory=LGBMRegressor,\n", | ||
" nuisance_model_params={\"verbose\": -1},\n", | ||
" propensity_model_params={\"propensity_score\": 0.3},\n", | ||
" treatment_model_params={\"verbose\": -1},\n", | ||
" is_classification=False,\n", | ||
" n_variants=2,\n", | ||
")\n", | ||
"rlearner.fit(\n", | ||
" X=df[feature_columns],\n", | ||
" y=df[outcome_column],\n", | ||
" w=df[treatment_column],\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can check that the propensity estimates correspond to our expectation:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "plaintext" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"rlearner.predict_nuisance(\n", | ||
" X=df[feature_columns], model_kind=\"propensity_model\", model_ord=0, is_oos=False\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Further comments\n", | ||
"----------------\n", | ||
"\n", | ||
"* This example shows how we can use the same propensity score for all observations in the\n", | ||
" binary treatment setting, the class could be easily extended for multiple treatment\n", | ||
" variants a. Moreover, customizing the propensity score according to some simple \n", | ||
" extracted from the input features could easily be accommodated analogously." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters