-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoostAGroota works wrong with set_config(transform_output="pandas") #18
Comments
I figured out why exactly this happens. If you use
How to fix it: def get_pandas_cat_codes(X):
dtypes_dic = create_dtype_dict(X, dic_keys="dtypes")
obj_feat = dtypes_dic["cat"] + dtypes_dic["time"] + dtypes_dic["unk"]
if obj_feat:
for obj_column in obj_feat:
column = X[obj_column].astype("str").astype("category")
# performs label encoding
_, inverse = np.unique(column, return_inverse=True)
X[obj_column] = inverse
cat_idx = [X.columns.get_loc(col) for col in obj_feat]
else:
obj_feat = None
cat_idx = None
return X, obj_feat, cat_idx This method will not only fix my issue it will also make your output keep the original order. |
Hello, I've noticed that if you use set_config(transform_output="pandas") your BoostAGroota.transform methods works wrong. It shuffles columns of pandas DataFrame(which left after feature selection).
There is code snipper for reproduction of this problem.
As you would see column CRIM has values which were in column AGE.
requirements used in code
I also tried to understand why such thing happens and figured out that this behavior caused by your implementation of transform method.
As you can see using return X[self.selected_features_] works strange
Hope it will be helpful! If you have any questions I am open for discussion or adding some information.
The text was updated successfully, but these errors were encountered: