You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For data scientists in the industry, explainability is crucial to build trust with other stakeholders. Marginal explainability with, e.g., permutation_importance helps, but stakeholders often ask for personalised, per-prediction feature importances, which help understand what stands out in a test sample for the model.
The two most popular librairies for this are LIME and SHAP, both having theoretical and practical pros and cons. Additionally, LIME is not maintained anymore, although fairly simple conceptually (fitting Ridge models while varying sample weights).
This raises several questions:
Would having an easy way to get marginal and prediction-level feature importance make sense in the recipe or as a standalone helper in skrub?
If we don't want extra dependancies, should we implement a simple version of prediction-level feat imp in skrub? @glemaitre mentioned one SLEP with a similar objective, IIUC
We should probably do some benchmarking and litterature reading to choose a method which can be interpreted and explained in plain english, with clear documentation and guidelines on what can or cannot be said.
WDYT?
Feature Description
.
Alternative Solutions
.
Additional Context
.
The text was updated successfully, but these errors were encountered:
I agree that this raises the question of the scope, too. Skrub and the recipe already ease HP tuning, so it could be valuable for users to also facilitate model evaluation IMHO, with pragmatic guidelines. I'd also be happy to have it somewhere else, though.
Problem Description
For data scientists in the industry, explainability is crucial to build trust with other stakeholders. Marginal explainability with, e.g.,
permutation_importance
helps, but stakeholders often ask for personalised, per-prediction feature importances, which help understand what stands out in a test sample for the model.The two most popular librairies for this are LIME and SHAP, both having theoretical and practical pros and cons. Additionally, LIME is not maintained anymore, although fairly simple conceptually (fitting Ridge models while varying sample weights).
This raises several questions:
WDYT?
Feature Description
.
Alternative Solutions
.
Additional Context
.
The text was updated successfully, but these errors were encountered: