Sklearn implementation has an error #21

jjuppe · 2021-08-05T14:45:52Z

I think I found an error in the sklearn implementation.

At the moment you add one column to df2 for every iteration that you are doing. And then df2 is joined to df again. Like this many duplicate columns are created that are diluting the mean of the feature importance later on. You can find this if you print out df after every iteration

try:
    importance = clf.feature_importances_
    df2['fscore' + str(i)] = importance
except ValueError:
    print("this clf doesn't have the feature_importances_ method.  Only Sklearn tree based methods allowed")

# importance = sorted(importance.items(), key=operator.itemgetter(1))

# df2 = pd.DataFrame(importance, columns=['feature', 'fscore'+str(i)])
df2['fscore'+str(i)] = df2['fscore'+str(i)] / df2['fscore'+str(i)].sum()
df = pd.merge(df, df2, on='feature', how='outer')
if not silent:
    print("Round: ", this_round, " iteration: ", i)

Here is a suggestion how to fix it:

if len(getattr(clf, 'feature_importances_', [])) == 0:
    raise ValueError(
        "this clf doesn't have the feature_importances_ method. Only Sklearn tree based methods allowed"
    )

if i == 1:
    df = pd.DataFrame({'feature': new_x.columns})

# importance = sorted(importance.items(), key=operator.itemgetter(1))

importance = clf.feature_importances_
importance = np.column_stack([new_x.columns, importance])
df2 = pd.DataFrame(importance, columns=['feature', 'fscore'+str(i)])
df2['fscore'+str(i)] = df2['fscore'+str(i)] / df2['fscore'+str(i)].sum()
df = pd.merge(df, df2, on='feature', how='outer')
if not silent:
    print("Round: ", this_round, " iteration: ", i) ```

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sklearn implementation has an error #21

Sklearn implementation has an error #21

jjuppe commented Aug 5, 2021 •

edited

Loading

Sklearn implementation has an error #21

Sklearn implementation has an error #21

Comments

jjuppe commented Aug 5, 2021 • edited Loading

jjuppe commented Aug 5, 2021 •

edited

Loading