Skip to content

Commit

Permalink
Fixed propensity example (#70)
Browse files Browse the repository at this point in the history
* Fixed propensity example

* Fix import

* Fix import

* Update docs/examples/example_propensity.ipynb

Co-authored-by: Kevin Klein <[email protected]>

* Update docs/examples/example_propensity.ipynb

Co-authored-by: Kevin Klein <[email protected]>

* Fix formating

* Update docs/examples/example_propensity.ipynb

Co-authored-by: Kevin Klein <[email protected]>

* Update docs/examples/example_propensity.ipynb

Co-authored-by: Kevin Klein <[email protected]>

* Fix formating

* Modify note

* Update docs/examples/example_propensity.ipynb

Co-authored-by: Kevin Klein <[email protected]>

---------

Co-authored-by: Kevin Klein <[email protected]>
  • Loading branch information
FrancescMartiEscofetQC and kklein authored Jul 24, 2024
1 parent a1ee27c commit 2f428fb
Show file tree
Hide file tree
Showing 2 changed files with 215 additions and 0 deletions.
214 changes: 214 additions & 0 deletions docs/examples/example_propensity.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What if I know the propensity score?\n",
"\n",
"In some experiment settings we may know beforehand the probabilities of treatment assignments,\n",
"e.g. if we have data from a {term}`RCT<Randomized Control Trial (RCT)>` with known treatment\n",
"probabilities.\n",
"\n",
"In that case we may not want to learn a propensity model rather just use the known probabilities.\n",
"\n",
"Loading the data\n",
"----------------\n",
"\n",
"Just like in our {ref}`example on estimating CATEs with a MetaLearner\n",
"<example-basic>`, we will first load some experiment data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from pathlib import Path\n",
"from git_root import git_root\n",
"\n",
"df = pd.read_csv(git_root(\"data/learning_mindset.zip\"))\n",
"outcome_column = \"achievement_score\"\n",
"treatment_column = \"intervention\"\n",
"feature_columns = [\n",
" column for column in df.columns if column not in [outcome_column, treatment_column]\n",
"]\n",
"categorical_feature_columns = [\n",
" \"ethnicity\",\n",
" \"gender\",\n",
" \"frst_in_family\",\n",
" \"school_urbanicity\",\n",
" \"schoolid\",\n",
"]\n",
"# Note that explicitly setting the dtype of these features to category\n",
"# allows both lightgbm as well as shap plots to\n",
"# 1. Operate on features which are not of type int, bool or float\n",
"# 2. Correctly interpret categoricals with int values to be\n",
"# interpreted as categoricals, as compared to ordinals/numericals.\n",
"for categorical_feature_column in categorical_feature_columns:\n",
" df[categorical_feature_column] = df[categorical_feature_column].astype(\"category\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating our own estimator\n",
"--------------------------\n",
"\n",
"In this tutorial we will assume that we know that all observations were assigned to the\n",
"treatment with a fixed probability of 0.3, which is close to the fraction of the observations\n",
"assigned to the treatment group:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"df[treatment_column].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{note}\n",
"The fact that we have a fixed propensity score for all observations is not true for this\n",
"dataset, we just use it for illustrational purposes.\n",
"```\n",
"\n",
"Now we can define our custom ``sklearn``-like classifier. We recommend inheriting from\n",
"the ``sklearn`` base classes and following the rules explained in the\n",
"[sklearn documentation](https://scikit-learn.org/stable/developers/develop.html) to avoid\n",
"having to define helper functions and ensure the correct functionality of the ``metalearners``\n",
"library."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from sklearn.base import BaseEstimator, ClassifierMixin\n",
"from typing import Any\n",
"from typing_extensions import Self\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"\n",
"class FixedPropensityModel(ClassifierMixin, BaseEstimator):\n",
" def __init__(self, propensity_score: float) -> None:\n",
" self.propensity_score = propensity_score\n",
"\n",
" def fit(self, X: pd.DataFrame, y: pd.Series) -> Self:\n",
" self.classes_ = np.unique(y.to_numpy()) # sklearn requires this\n",
" return self\n",
"\n",
" def predict(self, X: pd.DataFrame) -> np.ndarray[Any, Any]:\n",
" return np.argmax(self.predict_proba(X), axis=1)\n",
"\n",
" def predict_proba(self, X: pd.DataFrame) -> np.ndarray[Any, Any]:\n",
" return np.full((len(X), 2), [1 - self.propensity_score, self.propensity_score])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fitting the MetaLearner\n",
"-----------------------\n",
"\n",
"Finally we can instantiate and fit our MetaLearner using our own custom propensity model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from metalearners import RLearner\n",
"from lightgbm import LGBMRegressor\n",
"\n",
"rlearner = RLearner(\n",
" nuisance_model_factory=LGBMRegressor,\n",
" propensity_model_factory=FixedPropensityModel,\n",
" treatment_model_factory=LGBMRegressor,\n",
" nuisance_model_params={\"verbose\": -1},\n",
" propensity_model_params={\"propensity_score\": 0.3},\n",
" treatment_model_params={\"verbose\": -1},\n",
" is_classification=False,\n",
" n_variants=2,\n",
")\n",
"rlearner.fit(\n",
" X=df[feature_columns],\n",
" y=df[outcome_column],\n",
" w=df[treatment_column],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can check that the propensity estimates correspond to our expectation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"rlearner.predict_nuisance(\n",
" X=df[feature_columns], model_kind=\"propensity_model\", model_ord=0, is_oos=False\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Further comments\n",
"----------------\n",
"\n",
"* This example shows how we can use the same propensity score for all observations in the\n",
" binary treatment setting, the class could be easily extended for multiple treatment\n",
" variants a. Moreover, customizing the propensity score according to some simple \n",
" extracted from the input features could easily be accommodated analogously."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
1 change: 1 addition & 0 deletions docs/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ Examples
Tuning hyperparameters of a MetaLearner with MetaLearnerGridSearch <example_gridsearch.ipynb>
Generating data <example_data_generation.ipynb>
Estimating CATEs for survival analysis <example_survival.ipynb>
What if I know the propensity score? <example_propensity.ipynb>

0 comments on commit 2f428fb

Please sign in to comment.