Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LIME example numpy - pandas conversion #57

Merged
merged 2 commits into from
Jul 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 14 additions & 27 deletions docs/examples/example_lime.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -217,17 +217,21 @@
"source": [
"### Generating lime plots\n",
"\n",
"``lime`` will expect a function which consumes an ``X`` and returns\n",
"``lime`` will expect a function which consumes a ``np.ndarray`` ``X`` and returns\n",
"a one-dimensional vector of the same length as ``X``. We'll have to\n",
"adapt the {meth}`~metalearners.rlearner.RLearner.predict` method of\n",
"our {class}`~metalearners.rlearner.RLearner` in two ways:\n",
"our {class}`~metalearners.rlearner.RLearner` in three ways:\n",
"\n",
"* We need to pass a value for the necessary parameter ``is_oos`` to {meth}`~metalearners.rlearner.RLearner.predict`.\n",
"\n",
"* We need to reshape the output of\n",
" {meth}`~metalearners.rlearner.RLearner.predict` to be one-dimensional. This\n",
" we can easily achieve via {func}`metalearners.utils.simplify_output`.\n",
"\n",
"* We need to reconvert the ``np.ndarray`` to a ``pd.DataFrame`` to work with categoricals\n",
" and specify the correct categories so the categorical codes are the same (which are used internally in LightGBM),\n",
" see [this issue](https://github.com/microsoft/LightGBM/issues/5162) for more context.\n",
"\n",
"This we can do as follows:"
]
},
Expand All @@ -244,7 +248,11 @@
"from metalearners.utils import simplify_output\n",
"\n",
"def predict(X):\n",
" return simplify_output(rlearner.predict(X, is_oos=True))"
" X_pd = pd.DataFrame(X, copy=True)\n",
" for c in X_pd.columns:\n",
" # This line sets the cat.categories correctly (even if not all are present in X)\n",
" X_pd[c] = X_pd[c].astype(df[feature_columns].iloc[:, c].dtype)\n",
kklein marked this conversation as resolved.
Show resolved Hide resolved
" return simplify_output(rlearner.predict(X_pd, is_oos=True))"
]
},
{
Expand All @@ -254,26 +262,7 @@
"where we set ``is_oos=True`` since ``lime`` will call\n",
"{meth}`~metalearners.rlearner.RLearner.predict`\n",
"with various inputs which will not be able to be recognized as\n",
"in-sample data.\n",
"\n",
"Since ``lime`` expects ``numpy`` datastructures, we'll have to\n",
"manually encode the categorical features of our ``pandas`` data\n",
"structure, see [this issue](https://github.com/microsoft/LightGBM/issues/5162) for more context."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"X = df[feature_columns].copy()\n",
"for categorical_feature_column in categorical_feature_columns:\n",
" X[categorical_feature_column] = X[categorical_feature_column].cat.codes"
"in-sample data."
]
},
{
Expand Down Expand Up @@ -332,10 +321,8 @@
"from lime.lime_tabular import LimeTabularExplainer\n",
"from lime.submodular_pick import SubmodularPick\n",
"\n",
"X = X.to_numpy()\n",
"\n",
"explainer = LimeTabularExplainer(\n",
" X,\n",
" df[feature_columns].to_numpy(),\n",
" feature_names=feature_columns,\n",
" categorical_features=categorical_feature_indices,\n",
" categorical_names=categorical_names,\n",
Expand All @@ -345,7 +332,7 @@
")\n",
"\n",
"sp = SubmodularPick(\n",
" data=X,\n",
" data=df[feature_columns].to_numpy(),\n",
" explainer=explainer,\n",
" predict_fn=predict,\n",
" method=\"sample\",\n",
Expand Down
Loading