diff --git a/docs/examples/example_basic.ipynb b/docs/examples/example_basic.ipynb index f7135f8..33426c0 100644 --- a/docs/examples/example_basic.ipynb +++ b/docs/examples/example_basic.ipynb @@ -115,7 +115,9 @@ "* We need to specify the observed treatment assignment ``w`` in the call to the\n", " ``fit`` method.\n", "* We need to specify whether we want in-sample or out-of-sample\n", - " CATE estimates in the {meth}`~metalearners.TLearner.predict` call via ``is_oos``." + " CATE estimates in the {meth}`~metalearners.TLearner.predict` call via ``is_oos``. In the\n", + " case of in-sample predictions, the data passed to {meth}`~metalearners.TLearner.predict`\n", + " must be exactly the same as the data that was used to call {meth}`~metalearners.TLearner.fit`." ] }, { diff --git a/docs/examples/example_feature_importance_shap.ipynb b/docs/examples/example_feature_importance_shap.ipynb index 6e2a662..4a4a2b8 100644 --- a/docs/examples/example_feature_importance_shap.ipynb +++ b/docs/examples/example_feature_importance_shap.ipynb @@ -326,7 +326,8 @@ "source": [ "Note that the method {meth}`~metalearners.explainer.Explainer.feature_importances`\n", "returns a list of length {math}`n_{variats} -1` that indicates the feature importance for\n", - "each variant against control.\n", + "each variant against control. Remember that a higher value means that the corresponding\n", + "feature is more important for the CATE prediction.\n", "\n", "### Computing and plotting the SHAP values\n", "\n", @@ -367,7 +368,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For guidelines on how to interpret such SHAP plots please see the [SHAP documentation](https://github.com/shap/shap).\n", + "In these SHAP summary plots, the color and orientation of the plotted values help understand\n", + "their impact on model predictions.\n", + "\n", + "Each dot in the plot represents a single instance of the given feature present in the data set.\n", + "The x-axis conveys the Shapley value, signifying the strength and directionality of the\n", + "feature's impact. The y-axis displays a subset of the features in the model.\n", + "\n", + "The Shapley value, exhibited on the horizontal axis, is oriented such that values on the\n", + "right of the center line (0 mark) contribute to a positive shift in the predicted outcome,\n", + "while those on the left indicate a negative impact.\n", + "\n", + "The color coding implemented in these plots is straightforward: red implies a high feature value,\n", + "while blue denotes a low feature value. This color scheme assists in identifying whether\n", + "high or low values of a certain feature influence the model's output positively or negatively.\n", + "The categorical variables are colored in grey.\n", + "\n", + "For more guidelines on how to interpret such SHAP plots please see the [SHAP documentation](https://github.com/shap/shap).\n", "\n", "Note that the method {meth}`~metalearners.explainer.Explainer.shap_values`\n", "returns a list of length {math}`n_{variats} -1` that indicates the SHAP values for\n", diff --git a/docs/examples/example_lime.ipynb b/docs/examples/example_lime.ipynb index cf22a6f..4a8a516 100644 --- a/docs/examples/example_lime.ipynb +++ b/docs/examples/example_lime.ipynb @@ -54,7 +54,7 @@ "* {math}`f`, the original model -- in our case the MetaLearner\n", "* {math}`G`, the class of possible, interpretable surrogate models\n", "* {math}`\\Omega(g)`, a measure of complexity for {math}`g \\in G`\n", - "* {math}`\\pi_x(z)` a proximity measure of {math}`z` with respect to data point {math}`x`\n", + "* {math}`\\pi_x(z)` a proximity measure of an instance {math}`z` with respect to data point {math}`x`\n", "* {math}`\\mathcal{L}(f, g, \\pi_x)` a measure of how unfaithful a {math}`g \\in G` is to {math}`f` in the locality defined by {math}`\\pi_x`\n", "\n", "Given all of these objects as well as a to be explained data point {math}`x`, the authors suggest that the most appropriate surrogate {math}`g`, also referred to as explanation for {math}`x`, {math}`\\xi(x)`, can be expressed as follows:\n", @@ -74,10 +74,10 @@ "* showcase the features with highest global importance\n", "\n", "In line with this ambition, they define a notion of 'coverage' -- to\n", - "be maximized --as follows:\n", + "be maximized -- with respect to a set of explanations {math}`V`, as follows:\n", "\n", "```{math}\n", - " c(V, W, \\mathcal{I}) = \\sum_{j=1}^{d} I[\\exists i \\in V: W_{i,j} > 0] \\mathcal{I}_j\n", + " c(V, W, \\mathcal{I}) = \\sum_{j=1}^{d} \\mathbb{I}\\{\\exists i \\in V: W_{i,j} > 0\\} \\mathcal{I}_j\n", "````\n", "\n", "where\n", @@ -85,7 +85,8 @@ "* {math}`d` is the number of features\n", "* {math}`V` is the candidate set of explanations to be shown to\n", " humans, within a fixed budget -- this is the variable to be optimized\n", - "* {math}`W` is a {math}`n \\times d` local feature importance matrix and\n", + "* {math}`W` is a {math}`n \\times d` local feature importance matrix that represents\n", + " the local importance of each feature for each instance, and\n", "* {math}`\\mathcal{I}` is a {math}`d`-dimensional vector of global\n", " feature importances\n", "\n", @@ -359,7 +360,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For guidelines on how to interpret such lime plots please see the [lime documentation](https://github.com/marcotcr/lime)." + "In these plots, the green bars signify that the corresponding feature referenced on the\n", + "y-axis, increases the CATE estimate. On the other side, the red bars, represent\n", + "features that reduce CATE estimates.\n", + "Furthermore, the length of these colored bars corresponds to the magnitude of each feature's\n", + "contribution towards the model prediction. Therefore, the longer the bar, the more\n", + "significant the impact of that feature on the model prediction.\n", + "\n", + "For more guidelines on how to interpret such lime plots please see the [lime documentation](https://github.com/marcotcr/lime)." ] } ], diff --git a/docs/faq.rst b/docs/faq.rst index 13d1de2..b149eab 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -29,7 +29,8 @@ FAQ Double machine learning is an ATE estimation technique, pioneered by `Chernozhukov et al. (2016) `_. It is 'double' in the sense that it relies on two preliminary models: one for the probability of - receiving treatment given covariates (the propensity score), and one for the outcome given treatment and covariates. + receiving treatment given covariates (the propensity score), and one for the outcome covariates and + optionally the treatment. Double ML is also referred to as 'debiased' ML, since the propensity score model is used to 'debias' a naive estimator that uses the outcome model to predict the expected outcome under treatment, and under no treatment, diff --git a/docs/glossary.rst b/docs/glossary.rst index d9232cd..cd3bf77 100644 --- a/docs/glossary.rst +++ b/docs/glossary.rst @@ -24,8 +24,8 @@ Glossary Similar to the R-Learner, the Double Machine Learning blueprint relies on estimating two nuisance models in its first stage: a propensity model as well as an outcome model. Unlike the - R-Learner, the last-stage or treatment effect model might not - be any estimator. + R-Learner, the last-stage or treatment effect model might need to be a + specific type of estimator. See `Chernozhukov et al. (2016) `_. Heterogeneous Treatment Effect (HTE)