-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring the presentation of pandas_categorical
section in model files
#1201
Comments
@StrikerRUS @wxchan Do you have time to implement this feature ? |
The two main requirements are:
|
This issue is nothing (time-)critical for me. It's more about pointing out a minor style issue/inconsistency with LightGBM model file format. |
I add this issue to the #960 TODO list. |
I also notice that when use "predict", we also need to specify the categorical features on the predicting data too. Is it possible to convert the categorical features in the predicted data to match the training data? |
Closing this issue as we have one consolidated issue for pandas refactoring and this one is included there. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
If the LightGBM model was trained using
pandas.DataFrame
that contains categorical columns, then the last section of the model file is apandas_categorical
section.The problem is that this section is 1) formatted differently from other feature-related sections (eg. the
feature_importance
section), and 2) the current representation (list of lists) is difficult to parse for outside applications.For example, consider the Auto-MPG dataset.
Current presentation (https://github.com/jpmml/jpmml-lightgbm/blob/master/src/test/resources/lgbm/RegressionAuto.txt#L574):
Refactored presentation:
Another example of a very messy
pandas_categorical
section (https://github.com/jpmml/jpmml-lightgbm/blob/master/src/test/resources/lgbm/ClassificationAudit.txt#L581):Changing the model file data format would necessitate updating the version number also?
The text was updated successfully, but these errors were encountered: