WIP: Load back parameters from saved model file (fixes #2613) #4802

zyxue · 2021-11-14T17:51:34Z

This is still just an idea to collect feedbacks.

The approach in this PR:

made loaded_parameter_ an attribute of Boosting instead of GBDT so it can be accessed in c_api.cpp.
implemented LGBM_BoosterGetConfig to return loaded_parameter_ as a string.
loaded_parameter_ can be parsed in Python code into proper types for the params dictionary.

test test_booster_load_params_when_passed_model_str passes

This PR is related to #2613

A couple of questions:

why isn't loaded_parameter_ a parameter of Boosting instead of GBDT already? The other boosting types don't have parameters?
Do I understand correctly that c_api.cpp can only deal with gbdt as I see it's hard-coded in

LightGBM/src/c_api.cpp

Line 109 in 874e635

boosting_.reset(Boosting::CreateBoosting("gbdt", filename));

jameslamb

Hi @zyxue , sorry for the very long delay without a review! We really appreciate you working on this. I'm ready to review this and help move it forward.

Please remove all commented-out code.
Please merge in the latest changes from master
Please try to get this to a state where at least one test in the Python package is passing for one parameter...then we can iterate from there based on review comments.

At first glance, I have one suggestion about a change to the approach...instead of passing the raw content of the parameters block from a text format back through LGBM_BoosterGetConfig(), I think it would be better to have code on the C/C++ side parse that information and pass back a JSON string.

That way, every interface to LightGBM (for example, the Python and R packages in this repo) doesn't need to have some version of this code (with one line per parameter):

for line in io.StringIO(_params_str):
    if line.startswith('[boosting: '):
        self.params['boosting'] = line.strip().replace(f"[boosting: ", "").replace("]", "")

and could instead just pass the result of LGBM_BoosterGetConfig() to a JSON parser like json in Python or {jsonlite} in R.

jameslamb · 2022-01-16T20:28:52Z

@zyxue thanks for coming back to this pull request!

As you work on it, please do not rebase + force push. Use merge commits instead (e.g. git merge master). Overwriting the commit history makes it more difficult for reviewers to understand the changes you've made in response to review comments.

This project squashes all pull request commits into a single commit on merge, so you don't need to worry about having too many commits here.

zyxue · 2022-01-16T22:27:03Z

@jameslamb , gotcha, thank you for the tip.

zyxue · 2022-01-17T00:26:41Z

Is https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/utils/json11.h expected to be the same as https://raw.githubusercontent.com/dropbox/json11/master/json11.hpp ?

the example on https://github.com/dropbox/json11 doesn't seem to work with the json11 in lightgbm...

jameslamb · 2022-01-17T00:33:54Z

Is https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/utils/json11.h expected to be the same as https://raw.githubusercontent.com/dropbox/json11/master/json11.hpp ?

That is where the json11 code in LightGBM is originally from, but we consider it "vendored in", meaning that since it was first added to this project, LightGBM-specific modifications have sometimes been made to it.

zyxue · 2022-01-17T04:41:59Z

@jameslamb , I've reimplemented LGBM_BoosterGetConfig to return parameters as a json string, please let me know what you think (I'm still relatively new to c++).

jameslamb

Thanks very much! I left some suggestions for a reorganization of the code on the C/C++ side. But let's see what other maintainers say.

python-package/lightgbm/basic.py

tests/python_package_test/test_basic.py

jameslamb · 2022-01-18T05:42:18Z

src/c_api.cpp

+  char* out_str
+) {
+  API_BEGIN();
+  Booster* ref_booster = reinterpret_cast<Booster*>(handle);


I'd like to propose a different organization of this code, but please don't make any changes to the organization until another maintainer like @StrikerRUS, @shiyu1994, or @tongwu-msft comments.

Instead of all this logic being in src/c_api.cpp, I think most of the implementation for LGBM_BoosterGetConfig() should live somewhere else. Similar to how LGBM_BoosterSaveModelToString() is defined here in src/c_api.cpp.

LightGBM/src/c_api.cpp

Line 2233 in cf38071

int LGBM_BoosterSaveModelToString(BoosterHandle handle,

And then it is just a thin wrapper on GBDT::SaveModelToString().

LightGBM/src/boosting/gbdt_model_text.cpp

Line 410 in a06fadf

bool GBDT::SaveModelToFile(int start_iteration, int num_iteration, int feature_importance_type, const char* filename) const {

I also think it's desirable for any code that needs to list every parameter by name should be in the automatically-generated methods in src/io/config_auto.cpp.

So I think this should be approached as following:

Add a method like GBDT::GetLoadedConfig() which reformats loaded_parameter_ into the appropriate format a parameters string and then initializes a Config object by calling Config::GetMembersFromString()

LightGBM/src/io/config_auto.cpp

Line 321 in cf38071

void Config::GetMembersFromString(const std::unordered_map<std::string, std::string>& params) {

Add a method in src/io/config_auto.cpp like Config::DumpConfigToJson(), which returns a JSON representation of a Config object.

NOTE: all the code there is generated by https://github.com/microsoft/LightGBM/blob/a06fadfb7ac3fdb26da3d4afc061a8b976070c50/helpers/parameter_generator.py. But if I was working on this, my approach would be to just write code directly in src/io/config_auto.cpp, get the tests into a good state, and then make parameter_generator.py write that code.

this will be useful if LightGBM adopts the proposal at [RFC] Unify model format customize string or Json #4887 in the future

Alter LGBM_BoosterGetConfig() to get the output of Config::DumpConfigToJson().

Sorry if this is overly complicated. Let's see what other maintainers say. And we are here to help! Thanks again for your help with this feature.

reformats loaded_parameter_ into the appropriate format a parameters string

@jameslamb , do you actually mean reformatting loaded_parameter_ into a std::unordered_map<std::string, std::string> object?

I'm a bit confused by the name GetMembersFromString, it actually means GetMembersFrom an unordered_map, right?

I mean doing whatever is necessary to turn loaded_parameter_ into a Config object, yes. You're right that the method Config::GetMembersFromString() takes an unordered map with string keys and string values.

zyxue · 2022-01-22T16:31:15Z

please don't make any changes to the organization until another maintainer like @StrikerRUS, @shiyu1994, or @tongwu-msft comments.
Hey @jameslamb , should I update now or wait till more comments come in?

jameslamb · 2022-01-22T16:38:55Z

As I said, please wait until another maintainer offers their opinion. While you wait on that, you can try working through the other suggestions like #4802 (comment).

guolinke · 2022-03-01T13:00:57Z

src/c_api.cpp

+      } else if (line.rfind("[tree_learner: ", 0) == 0) {
+        obj["tree_learner"] = Json{extract_param("tree_learner", line)};
+      } else if (line.rfind("[verbose: ", 0) == 0) {
+        obj["verbose"] = Json{std::stoi(extract_param("verbose", line))};


I think we can maintain a Key-value map, and load parameters in a for-loop, rather than use many if-elsees.

Per #5244, I would find extracting the categorical_feature param helpful as well.

would find extracting the categorical_feature param helpful as well

this is now being addressed in #5424 . @johnpaulett if you're interested in influencing the shape that this support takes in LightGBM, you're welcome to leave review comments on that PR.

guolinke · 2022-03-01T13:01:19Z

@shiyu1994 can you take a look for this PR?

jameslamb · 2022-08-31T21:59:55Z

Due to a lack of activity on this pull request, we've decided to move forward with a separate PR for this feature in #5424, so I'm going to close this.

@zyxue thanks very much for your interest in LightGBM and for attempting this! If you have more time to work with us in the future, we'd welcome additional contributions. I'd be happy to suggest some smaller contributions which might not involve as much discussion and require as much of your time and energy.

github-actions · 2023-11-15T00:21:03Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

zyxue requested review from btrotta, chivee, guolinke, henry0312, hzy46, jameslamb, shiyu1994, StrikerRUS and tongwu-sh as code owners November 14, 2021 17:51

jameslamb added the feature label Nov 16, 2021

jameslamb added the in progress label Nov 30, 2021

This was referenced Nov 30, 2021

Autologging functionality for scikit-learn integration with LightGBM (Part 1) mlflow/mlflow#5130

Merged

[python][sklearn] Add model saving / loading to sklearn models #4841

Open

jameslamb requested changes Jan 1, 2022

View reviewed changes

jameslamb changed the title ~~Load back parameters from saved model file.~~ Load back parameters from saved model file (fixes #2613) Jan 1, 2022

rebase to lightbgm/master

0cd653e

zyxue force-pushed the zx-load-params branch from ef774a3 to 0cd653e Compare January 16, 2022 19:04

Zhuyi Xue added 3 commits January 16, 2022 20:19

made LGBM_BoosterGetConfig generate json

6e1dd40

fixed test_basic.py to ensure all tests pass

f75bd31

cleanup

6607662

jameslamb requested changes Jan 18, 2022

View reviewed changes

Zhuyi Xue added 2 commits January 24, 2022 08:38

renamed loads_params => _load_params

5413474

added type hints to _load_params

0af5de3

Zhuyi Xue added 3 commits January 24, 2022 08:40

removed type hints in tests

701facc

removed weights in tests

83b931c

tested monotone_constraints, learning_rate and boost_from_average

8830ae4

jameslamb changed the title ~~Load back parameters from saved model file (fixes #2613)~~ WIP: Load back parameters from saved model file (fixes #2613) Feb 11, 2022

guolinke reviewed Mar 1, 2022

View reviewed changes

jameslamb mentioned this pull request May 31, 2022

predict() requires DataFrame to have category dtype, but should be able to infer which fields are categorical #5244

Open

johnpaulett mentioned this pull request May 31, 2022

[python] Reapply the trained categorical columns when predicting #5246

Closed

jmoralez mentioned this pull request Jun 27, 2022

Categorical Features in Booster object #5321

Closed

jmoralez mentioned this pull request Aug 16, 2022

[python-package][R-package] load parameters from model file (fixes #2613) #5424

Merged

jameslamb closed this Aug 31, 2022

jameslamb removed the in progress label Aug 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Load back parameters from saved model file (fixes #2613) #4802

WIP: Load back parameters from saved model file (fixes #2613) #4802

zyxue commented Nov 14, 2021

jameslamb left a comment •

edited

Loading

jameslamb commented Jan 16, 2022

zyxue commented Jan 16, 2022

zyxue commented Jan 17, 2022 •

edited

Loading

jameslamb commented Jan 17, 2022

zyxue commented Jan 17, 2022 •

edited

Loading

jameslamb left a comment

jameslamb Jan 18, 2022 •

edited

Loading

zyxue Jun 3, 2022

jameslamb Jun 5, 2022

zyxue commented Jan 22, 2022

jameslamb commented Jan 22, 2022

guolinke Mar 1, 2022

johnpaulett May 31, 2022

jameslamb Aug 31, 2022

guolinke commented Mar 1, 2022

jameslamb commented Aug 31, 2022

github-actions bot commented Nov 15, 2023

WIP: Load back parameters from saved model file (fixes #2613) #4802

WIP: Load back parameters from saved model file (fixes #2613) #4802

Conversation

zyxue commented Nov 14, 2021

jameslamb left a comment • edited Loading

Choose a reason for hiding this comment

jameslamb commented Jan 16, 2022

zyxue commented Jan 16, 2022

zyxue commented Jan 17, 2022 • edited Loading

jameslamb commented Jan 17, 2022

zyxue commented Jan 17, 2022 • edited Loading

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

zyxue Jun 3, 2022

Choose a reason for hiding this comment

jameslamb Jun 5, 2022

Choose a reason for hiding this comment

zyxue commented Jan 22, 2022

jameslamb commented Jan 22, 2022

guolinke Mar 1, 2022

Choose a reason for hiding this comment

johnpaulett May 31, 2022

Choose a reason for hiding this comment

jameslamb Aug 31, 2022

Choose a reason for hiding this comment

guolinke commented Mar 1, 2022

jameslamb commented Aug 31, 2022

github-actions bot commented Nov 15, 2023

jameslamb left a comment •

edited

Loading

zyxue commented Jan 17, 2022 •

edited

Loading

zyxue commented Jan 17, 2022 •

edited

Loading

jameslamb Jan 18, 2022 •

edited

Loading