Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more model evaluation metrics #99

Open
3 tasks
Marktus opened this issue Jul 15, 2022 · 0 comments
Open
3 tasks

Add more model evaluation metrics #99

Marktus opened this issue Jul 15, 2022 · 0 comments

Comments

@Marktus
Copy link

Marktus commented Jul 15, 2022

Description

The current BorutaShap evaluates model using either shapely importance or "gini". BorutaShap does not provide means for a user to control how shap should be run within it. E.g. allow user to set feature_perturbation="interventional", model_output="probability", etc.

Also, if a machine learning algorithm that does not produce "gini" as importance_measure, it cannot be used in BorutaShap. This is another limitation that can be improved.

Reasoning

I have run into situations where the model errored on using "gini" for randomforest. Other instances, the error was on "shap" that could be control by the user.

def boruta_shap_simulation(model, X, y):
Feature_Selector = BorutaShap(model=model, importance_measure='gini', classification=True)
result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)
return result
model = RandomForestClassifier()
bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled)

AttributeError Traceback (most recent call last)
in <cell line: 16>()
14 return result
15 model = RandomForestClassifier()
---> 16 bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled, num_processors=20)

in boruta_shap_simulation(model, X, y, num_processors)
5 #pool = mp.Pool(num_processors)
6 start = time.time()
----> 7 Feature_Selector = BorutaShap(model=model, importance_measure='gini', classification=True)
8 #result = pool.imap(Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True))
9 result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in init(self, model, importance_measure, classification, percentile, pvalue)
61 self.classification = classification
62 self.model = model
---> 63 self.check_model()
64
65
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in check_model(self)
103
104 elif check_feature_importance is False and self.importance_measure == 'gini':
--> 105 raise AttributeError('Model must contain the feature_importances_ method to use Gini try Shap instead')
106
107 else:

AttributeError: Model must contain the feature_importances_ method to use Gini try Shap instead

def boruta_shap_simulation(model, X, y,):
Feature_Selector = BorutaShap(model=model, importance_measure='shap', classification=True)
result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)
return result

model = RandomForestClassifier()
bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled)


Exception Traceback (most recent call last)
in <cell line: 17>()
15
16 model = RandomForestClassifier()
---> 17 bsa_results = boruta_shap_simulation(model, X_train_sampled, y_train_sampled, num_processors=20)

in boruta_shap_simulation(model, X, y, num_processors)
7 Feature_Selector = BorutaShap(model=model, importance_measure='shap', classification=True)
8 #result = pool.imap(Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True))
----> 9 result = Feature_Selector.fit(X=X, y=y, n_trials=1, random_state=0, train_or_test="train", normalize=True, verbose=True)
10 #pool.close()
11 #pool.join()

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in fit(self, X, y, n_trials, random_state, sample, train_or_test, normalize, verbose, stratify)
361 self.Check_if_chose_train_or_test_and_train_model()
362
--> 363 self.X_feature_import, self.Shadow_feature_import = self.feature_importance(normalize=normalize)
364 self.update_importance_history()
365 hits = self.calculate_hits()

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in feature_importance(self, normalize)
606 if self.importance_measure == 'shap':
607
--> 608 self.explain()
609 vals = self.shap_values
610

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9cc5b77-4046-405e-a6bd-a84c65420c14/lib/python3.9/site-packages/BorutaShap.py in explain(self)
732 if self.classification:
733 # for some reason shap returns values wraped in a list of length 1
--> 734 self.shap_values = np.array(explainer.shap_values(self.X_boruta))
735 if isinstance(self.shap_values, list):
736

/databricks/python/lib/python3.9/site-packages/shap/explainers/_tree.py in shap_values(self, X, y, tree_limit, approximate, check_additivity, from_call)
406 out = self._get_shap_output(phi, flat_output)
407 if check_additivity and self.model.model_output == "raw":
--> 408 self.assert_additivity(out, self.model.predict(X))
409
410 return out

/databricks/python/lib/python3.9/site-packages/shap/explainers/_tree.py in assert_additivity(self, phi, model_output)
537 if type(phi) is list:
538 for i in range(len(phi)):
--> 539 check_sum(self.expected_value[i] + phi[i].sum(-1), model_output[:,i])
540 else:
541 check_sum(self.expected_value + phi.sum(-1), model_output)

/databricks/python/lib/python3.9/site-packages/shap/explainers/_tree.py in check_sum(sum_val, model_output)
533 " was %f, while the model output was %f. If this difference is acceptable"
534 " you can set check_additivity=False to disable this check." % (sum_val[ind], model_output[ind])
--> 535 raise Exception(err_msg)
536
537 if type(phi) is list:

Exception: Additivity check failed in TreeExplainer! Please ensure the data matrix you passed to the explainer is the same shape that the model was trained on. If your data shape is correct then please report this on GitHub. Consider retrying with the feature_perturbation='interventional' option. This check failed because for one of the samples the sum of the SHAP values was 34.761224, while the model output was 0.070000. If this difference is acceptable you can set check_additivity=False to disable this check.

Implementation

Overview of possible implementations

Tasks

Concrete tasks to be completed, in the order they need to be done in. Include links to specific lines of code where the task should happen at.

  • Task 1
  • Task 2
  • Task 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant