diff --git a/docs/examples/example_basic.ipynb b/docs/examples/example_basic.ipynb
index f7135f8..33426c0 100644
--- a/docs/examples/example_basic.ipynb
+++ b/docs/examples/example_basic.ipynb
@@ -115,7 +115,9 @@
     "* We need to specify the observed treatment assignment ``w`` in the call to the\n",
     "  ``fit`` method.\n",
     "* We need to specify whether we want in-sample or out-of-sample\n",
-    " CATE estimates in the {meth}`~metalearners.TLearner.predict` call via ``is_oos``."
+    " CATE estimates in the {meth}`~metalearners.TLearner.predict` call via ``is_oos``. In the\n",
+    " case of in-sample predictions, the data passed to {meth}`~metalearners.TLearner.predict`\n",
+    " must be exactly the same as the data that was used to call {meth}`~metalearners.TLearner.fit`."
    ]
   },
   {
diff --git a/docs/examples/example_feature_importance_shap.ipynb b/docs/examples/example_feature_importance_shap.ipynb
index 6e2a662..4a4a2b8 100644
--- a/docs/examples/example_feature_importance_shap.ipynb
+++ b/docs/examples/example_feature_importance_shap.ipynb
@@ -326,7 +326,8 @@
    "source": [
     "Note that the method {meth}`~metalearners.explainer.Explainer.feature_importances`\n",
     "returns a list of length {math}`n_{variats} -1` that indicates the feature importance for\n",
-    "each variant against control.\n",
+    "each variant against control. Remember that a higher value means that the corresponding\n",
+    "feature is more important for the CATE prediction.\n",
     "\n",
     "### Computing and plotting the SHAP values\n",
     "\n",
@@ -367,7 +368,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For guidelines on how to interpret such SHAP plots please see the [SHAP documentation](https://github.com/shap/shap).\n",
+    "In these SHAP summary plots, the color and orientation of the plotted values help understand\n",
+    "their impact on model predictions.\n",
+    "\n",
+    "Each dot in the plot represents a single instance of the given feature present in the data set.\n",
+    "The x-axis conveys the Shapley value, signifying the strength and directionality of the\n",
+    "feature's impact. The y-axis displays a subset of the features in the model.\n",
+    "\n",
+    "The Shapley value, exhibited on the horizontal axis, is oriented such that values on the\n",
+    "right of the center line (0 mark) contribute to a positive shift in the predicted outcome,\n",
+    "while those on the left indicate a negative impact.\n",
+    "\n",
+    "The color coding implemented in these plots is straightforward: red implies a high feature value,\n",
+    "while blue denotes a low feature value. This color scheme assists in identifying whether\n",
+    "high or low values of a certain feature influence the model's output positively or negatively.\n",
+    "The categorical variables are colored in grey.\n",
+    "\n",
+    "For more guidelines on how to interpret such SHAP plots please see the [SHAP documentation](https://github.com/shap/shap).\n",
     "\n",
     "Note that the method {meth}`~metalearners.explainer.Explainer.shap_values`\n",
     "returns a list of length {math}`n_{variats} -1` that indicates the SHAP values for\n",
diff --git a/docs/examples/example_lime.ipynb b/docs/examples/example_lime.ipynb
index cf22a6f..4a8a516 100644
--- a/docs/examples/example_lime.ipynb
+++ b/docs/examples/example_lime.ipynb
@@ -54,7 +54,7 @@
     "* {math}`f`, the original model -- in our case the MetaLearner\n",
     "* {math}`G`, the class of possible, interpretable surrogate models\n",
     "* {math}`\\Omega(g)`, a measure of complexity for {math}`g \\in G`\n",
-    "* {math}`\\pi_x(z)` a proximity measure of {math}`z` with respect to data point {math}`x`\n",
+    "* {math}`\\pi_x(z)` a proximity measure of an instance {math}`z` with respect to data point {math}`x`\n",
     "* {math}`\\mathcal{L}(f, g, \\pi_x)` a measure of how unfaithful a {math}`g \\in G` is to {math}`f` in the locality defined by {math}`\\pi_x`\n",
     "\n",
     "Given all of these objects as well as a to be explained data point {math}`x`, the authors suggest that the most appropriate surrogate {math}`g`, also referred to as explanation for {math}`x`, {math}`\\xi(x)`, can be expressed as follows:\n",
@@ -74,10 +74,10 @@
     "* showcase the features with highest global importance\n",
     "\n",
     "In line with this ambition, they define a notion of 'coverage' -- to\n",
-    "be maximized --as follows:\n",
+    "be maximized -- with respect to a set of explanations {math}`V`, as follows:\n",
     "\n",
     "```{math}\n",
-    "  c(V, W, \\mathcal{I}) = \\sum_{j=1}^{d} I[\\exists i \\in V: W_{i,j} > 0] \\mathcal{I}_j\n",
+    "  c(V, W, \\mathcal{I}) = \\sum_{j=1}^{d} \\mathbb{I}\\{\\exists i \\in V: W_{i,j} > 0\\} \\mathcal{I}_j\n",
     "````\n",
     "\n",
     "where\n",
@@ -85,7 +85,8 @@
     "* {math}`d` is the number of features\n",
     "* {math}`V` is the candidate set of explanations to be shown to\n",
     "  humans, within a fixed budget -- this is the variable to be optimized\n",
-    "* {math}`W` is a {math}`n \\times d` local feature importance matrix and\n",
+    "* {math}`W` is a {math}`n \\times d` local feature importance matrix that represents\n",
+    "  the local importance of each feature for each instance, and\n",
     "* {math}`\\mathcal{I}` is a {math}`d`-dimensional vector of global\n",
     "  feature importances\n",
     "\n",
@@ -359,7 +360,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For guidelines on how to interpret such lime plots please see the [lime documentation](https://github.com/marcotcr/lime)."
+    "In these plots, the green bars signify that the corresponding feature referenced on the\n",
+    "y-axis, increases the CATE estimate. On the other side, the red bars, represent\n",
+    "features that reduce CATE estimates.\n",
+    "Furthermore, the length of these colored bars corresponds to the magnitude of each feature's\n",
+    "contribution towards the model prediction. Therefore, the longer the bar, the more\n",
+    "significant the impact of that feature on the model prediction.\n",
+    "\n",
+    "For more guidelines on how to interpret such lime plots please see the [lime documentation](https://github.com/marcotcr/lime)."
    ]
   }
  ],
diff --git a/docs/faq.rst b/docs/faq.rst
index 13d1de2..b149eab 100644
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -29,7 +29,8 @@ FAQ
     Double machine learning is an ATE estimation technique, pioneered by
     `Chernozhukov et al. (2016) <https://arxiv.org/abs/1608.00060>`_.
     It is 'double' in the sense that it relies on two preliminary models: one for the probability of
-    receiving treatment given covariates (the propensity score), and one for the outcome given treatment and covariates.
+    receiving treatment given covariates (the propensity score), and one for the outcome covariates and
+    optionally the treatment.
 
     Double ML is also referred to as 'debiased' ML, since the propensity score model is used to 'debias'
     a naive estimator that uses the outcome model to predict the expected outcome under treatment, and under no treatment,
diff --git a/docs/glossary.rst b/docs/glossary.rst
index d9232cd..cd3bf77 100644
--- a/docs/glossary.rst
+++ b/docs/glossary.rst
@@ -24,8 +24,8 @@ Glossary
     Similar to the R-Learner, the Double Machine Learning blueprint
     relies on estimating two nuisance models in its first stage: a
     propensity model as well as an outcome model. Unlike the
-    R-Learner, the last-stage or treatment effect model might not
-    be any estimator.
+    R-Learner, the last-stage or treatment effect model might need to be a
+    specific type of estimator.
     See `Chernozhukov et al. (2016) <https://arxiv.org/abs/1608.00060>`_.
 
   Heterogeneous Treatment Effect (HTE)