Add more model evaluation metrics #99

Marktus · 2022-07-15T17:39:04Z

Description

The current BorutaShap evaluates model using either shapely importance or "gini". BorutaShap does not provide means for a user to control how shap should be run within it. E.g. allow user to set feature_perturbation="interventional", model_output="probability", etc.

Also, if a machine learning algorithm that does not produce "gini" as importance_measure, it cannot be used in BorutaShap. This is another limitation that can be improved.

Reasoning

I have run into situations where the model errored on using "gini" for randomforest. Other instances, the error was on "shap" that could be control by the user.

def boruta_shap_simulation(model, X, y):
Feature_Selector = BorutaShap(model=model, importance_measure='gini', classification=True)
result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)
return result
model = RandomForestClassifier()
bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled)

AttributeError Traceback (most recent call last)
in <cell line: 16>()
14 return result
15 model = RandomForestClassifier()
---> 16 bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled, num_processors=20)

in boruta_shap_simulation(model, X, y, num_processors)
5 #pool = mp.Pool(num_processors)
6 start = time.time()
----> 7 Feature_Selector = BorutaShap(model=model, importance_measure='gini', classification=True)
8 #result = pool.imap(Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True))
9 result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in init(self, model, importance_measure, classification, percentile, pvalue)
61 self.classification = classification
62 self.model = model
---> 63 self.check_model()
64
65
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in check_model(self)
103
104 elif check_feature_importance is False and self.importance_measure == 'gini':
--> 105 raise AttributeError('Model must contain the feature_importances_ method to use Gini try Shap instead')
106
107 else:

AttributeError: Model must contain the feature_importances_ method to use Gini try Shap instead

def boruta_shap_simulation(model, X, y,):
Feature_Selector = BorutaShap(model=model, importance_measure='shap', classification=True)
result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)
return result

model = RandomForestClassifier()
bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled)

Exception Traceback (most recent call last)
in <cell line: 17>()
15
16 model = RandomForestClassifier()
---> 17 bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled, num_processors=20)

in boruta_shap_simulation(model, X, y, num_processors)
7 Feature_Selector = BorutaShap(model=model, importance_measure='shap', classification=True)
8 #result = pool.imap(Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True))
----> 9 result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)
10 #pool.close()
11 #pool.join()

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in fit(self, X, y, n_trials, random_state, sample, train_or_test, normalize, verbose, stratify)
361 self.Check_if_chose_train_or_test_and_train_model()
362
--> 363 self.X_feature_import, self.Shadow_feature_import = self.feature_importance(normalize=normalize)
364 self.update_importance_history()
365 hits = self.calculate_hits()

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in feature_importance(self, normalize)
606 if self.importance_measure == 'shap':
607
--> 608 self.explain()
609 vals = self.shap_values
610

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in explain(self)
732 if self.classification:
733 # for some reason shap returns values wraped in a list of length 1
--> 734 self.shap_values = np.array(explainer.shap_values(self.X_boruta))
735 if isinstance(self.shap_values, list):
736

/databricks/python/lib/python3.9/site-packages/shap/explainers/_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
406 out = self._get_shap_output(phi, flat_output)
407 if check_additivity and self.model.model_output == "raw":
--> 408 self.assert_additivity(out, self.model.predict(X))
409
410 return out

/databricks/python/lib/python3.9/site-packages/shap/explainers/_tree.py in assert_additivity(self, phi, model_output)
537 if type(phi) is list:
538 for i in range(len(phi)):
--> 539 check_sum(self.expected_value[i] + phi[i].sum(-1), model_output[:,i])
540 else:
541 check_sum(self.expected_value + phi.sum(-1), model_output)

/databricks/python/lib/python3.9/site-packages/shap/explainers/_tree.py in check_sum(sum_val, model_output)
533 " was %f, while the model output was %f. If this difference is acceptable"
534 " you can set check_additivity=False to disable this check." % (sum_val[ind], model_output[ind])
--> 535 raise Exception(err_msg)
536
537 if type(phi) is list:

Exception: Additivity check failed in TreeExplainer! Please ensure the data matrix you passed to the explainer is the same shape that the model was trained on. If your data shape is correct then please report this on GitHub. Consider retrying with the feature_perturbation='interventional' option. This check failed because for one of the samples the sum of the SHAP values was 34.761224, while the model output was 0.070000. If this difference is acceptable you can set check_additivity=False to disable this check.

Implementation

Overview of possible implementations

Tasks

Concrete tasks to be completed, in the order they need to be done in. Include links to specific lines of code where the task should happen at.

Task 1
Task 2
Task 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more model evaluation metrics #99

Add more model evaluation metrics #99

Marktus commented Jul 15, 2022 •

edited

Loading

Add more model evaluation metrics #99

Add more model evaluation metrics #99

Comments

Marktus commented Jul 15, 2022 • edited Loading

Description

Reasoning

Implementation

Tasks

Marktus commented Jul 15, 2022 •

edited

Loading