-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package][R-package] load parameters from model file (fixes #2613) #5424
Conversation
@StrikerRUS @jameslamb I'd appreciate your take on this approach. If you agree I can make the required changes to the R-package as well. The main idea is:
|
I'm afraid some floating point values will not survive float->string->float round trip... |
Yes, this will probably loose precision on some cases. We can address loading the parameters here and maybe have a feature request for making sure we get matches that are as close as possible for floating point values. |
Well, I don't think there can be a better solution other than a proposed one. |
… check interaction constraints are properly loaded
I don't think precision loss in the parameters could be that big of a deal. There's already some on the thresholds and leaf values when writing the model file, so the parameters should be ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one quick comment about testing which might help in your work on this. I'll try to review more thoroughly later this week.
@shiyu1994 does cuda_exp set |
Yes. Some GPU parameters are hardcoded: Lines 336 to 355 in 83627ff
|
@jmoralez |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to review this thoroughly today.
I love that you were able to accomplish this with only private changes to the R and Python package, and one new public entrypoint in c_api
. Nice work! That makes me confident that we could change this implementation in the future without breaking users.
I don't have any specific suggestions at the moment (tests look awesome!), but I asked two questions that I feel I need a better understanding of before I can approve this.
@@ -2764,6 +2764,7 @@ def __init__( | |||
ctypes.byref(out_num_class))) | |||
self.__num_class = out_num_class.value | |||
self.pandas_categorical = _load_pandas_categorical(file_name=model_file) | |||
params = self._get_loaded_param() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params = self._get_loaded_param() | |
params_from_model_file = self._get_loaded_param() | |
params = {**params_from_model_file, **params} |
In this code block, there are two sources of parameter values:
params
keyword argument from the constructor- parameters parsed out of the model file
I think that in the R and Python packages, wherever those conflict, the value passed through the keyword argument params
should be preferred. But I'd like to hear your and @StrikerRUS 's opinions before you make any changes.
Why we might want to support this behavior
That's consistent with how the R and Python packages treat the params
argument generally (#4904 (comment)).
And it'd be useful for non-Dask distributed training (for example), where some of the parameters like machines
might change between the training run that produced an initial model and another one that performs training continuation.
Why we might not want to support this behavior
I believe that the R and Python packages already ignore passed-in params
if you're creating a Booster
from a model file.
So maybe we want to continue to ignore them until a specific user report or reproducible example demonstrates that it's a problem or surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer ignoring them, since they won't be used by the loaded booster and it could cause confusion. We can raise a warning when passing both params and model_file to the Booster constructor to make this more evident. When using the loaded booster as init model users can override the parameters for the new iterations in the train function, so that should be enough.
I haven't done much incremental learning so I hadn't given much thought to this haha, please feel free to correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can raise a warning when passing both params and model_file to the Booster constructor to make this more evident.
I strongly believe we need a warning for users here.
And I think it's better to override loaded params from model string by the params passed in via kwargs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can add the warning. What would be an example use case for overriding the parameters from the model file when loading the booster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything around continued training might result in loading a model from a file and then performing additional training with different parameters.
For examples:
- my example in this thread, about wanting to change
machines
(since the IP addresses of machines used for distributed training might be different from when the model file was created) - wanting to use a different
num_leaves
orlearning_rate
when performing an update like "train a few more iterations on newly-arrived data" (see my explanation in this Stack Overflow answer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm struggling to understand when Booster.update
would be called on a model loaded from file. It requires a bit of setup like defining the training set, and if you pass the training set to the constructor it takes a different path where it calls BoosterCreate with only the training set and the parameters.
The other case would be passing it to lgb.train as init_model and that already allows you to define the parameters for the new iterations.
assert set_params == params | ||
assert bst.params['categorical_feature'] == [1, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are checking that the params
attribute on the Python Booster
object matches what is in the file...but that being true doesn't necessarily guarantee that if you used this Booster
for continued training that it would still actually use those parameters at the C++ side, right?
I'm bringing this up because I'm struggling to understand the relationship between GBDT::loaded_parameter_
LightGBM/src/boosting/gbdt_model_text.cpp
Line 600 in 9e89ee7
loaded_parameter_ = ss.str(); |
and GBDT::config_
Lines 461 to 462 in 9e89ee7
/*! \brief Config of gbdt */ | |
std::unique_ptr<Config> config_; |
In #2613 (comment), I'd recommended using GBDT::config_
as the source of the parameter information, because it seems to me like the parameters section of the model file is loaded up into that property loaded_parameter_
, but not actually used to configure the GBDT
object. But now I'm not sure that that's right either haha.
For example....if you tried continued training after loading a model from text file like this, would LightGBM respect feature_fraction=0.7
? Or would it fall back to the default of 1.0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that being true doesn't necessarily guarantee that if you used this Booster for continued training that it would still actually use those parameters at the C++ side, right?
I believe the loaded_parameter_
attribute was added in #1495 to have the model parameters stored, but it isn't actually used anywhere other than when writing the parameters back to a file when there's no config.
LightGBM/src/boosting/gbdt_model_text.cpp
Lines 393 to 401 in d78b6bc
if (config_ != nullptr) { | |
ss << "\nparameters:" << '\n'; | |
ss << config_->ToString() << "\n"; | |
ss << "end of parameters" << '\n'; | |
} else if (!loaded_parameter_.empty()) { | |
ss << "\nparameters:" << '\n'; | |
ss << loaded_parameter_ << "\n"; | |
ss << "end of parameters" << '\n'; | |
} |
So I see these parameters more as "informative" of how the Booster was trained, rather than to be used to continue training. The original issue wanted to have objective
be loaded back to be used in SHAP and I think having them may help trying to replicate previous trainings or having the categorical_feature
for inference. I actually thought of maybe having an argument to decide whether or not to load them, they may not be useful and that computation could be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke Could you please help to understand the connection between config_
and loaded_parameter_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I think the "loaded_parameter_" is used to check the parameters used for the model file. For continued training, the current (new) "config_" is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got them from loaded_parameter_
because it seemed easier. If the contents of both attributes are equivalent we can document that, i.e. when loading a model from file the contents of the booster's parameter attribute is also the configuration for continued training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmoralez @jameslamb any further comments about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a warning about passing parameters to the booster constructor in 9467814 because in order to be able to call Booster.update
you need a training set, and if you initialize a booster with the training set lgb.Booster(model_file=..., train_set=...)
the model_file
argument gets ignored, so I don't think there's a way to do incremental training based on the Booster
object. Thus, I don't think it really matters what the parameters for the loaded booster are on the cpp side, because the incremental training would be through lgb.train
that already supports overriding the parameters for the new iterations. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jmoralez
It's an excellent point that Booster
's constructor has logic like
if train_set is not None:
...
elif model_file is not None:
...
Meaning that you can't have both. I agree with the warning you've added and appreciate you adding a test for it as well!
I'm sorry for holding up this PR for so long over this. I'm still confused about the relationship between config_
and loaded_parameter_
, but I don't have any more concerns about this specific PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also add, as I mentioned in #5424 (review) ... I think we should merge this because future changes probably wouldn't require any user-facing breaking changes. The fact that the new function you've added in c_api
takes in a BoosterHandle
means that we could swap it to using a different property of the Booster
(e.g. config_
instead of loaded_parameter_
) in the future without an API-level breaking change. Very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome work, and sorry I held this up for so long. The logic for how parameters flow through LightGBM is fairly complicated, and I wanted to be sure I understood the full implications of this change.
I think we should merge this 😀
@jmoralez You put SO MUCH work into this, only seems right that you be the one to push the merge button 😊 |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Given that #4802 seems to have stalled, this intends to supersede it, mainly to make #4323 easier by having access to the features that would have to be turned into factors and also unblock #5246.
Fixes #2613.