-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] add scikit-learn-style API for early stopping #5808
base: master
Are you sure you want to change the base?
[python-package] add scikit-learn-style API for early stopping #5808
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to contribute to LightGBM! I haven't reviewed this yet, but just leaving a blocking review to make it clear to other maintainers that I'd like the opportunity to review before this is merged. The discussion in #3313 talked about waiting for scikit-learn's interface for HistGradientBoostingClassifier
to stabilize and mature and then adapting to the choices they made... I'd like the opportunity to look at that interface thoroughly prior to reviewing this.
Until we have a chance to review, you can improve the chances of this being merged by addressing any CI failures you see. You can safely ignore failing |
Hey @jameslamb thanks for picking this up. I have started to take a look at the ci failures, I think I can solve most of them easily. There is one check for which I need some input. I see that one of the tests checks that the init parameters for the sklearn API and the Dask API have the same arguments (see this one). Early stopping is not available yet for the Dask API, so I do not see how can we easily iron that out. Shall I add some more exceptions to that specific test? |
@ClaudioSalvatoreArcidiacono no, please do not do that. This test you've linked to is there exactly to catch such deviations. If you absolutely need to add arguments to the constructors of the scikit-learn estimators in this project, add the same arguments in the same order to the Dask estimators, with default values of For what it's worth, I am not totally convinced yet that we should take on the exact same interface as If you are committed to getting the CI passing here I'm willing to consider it, but just want to set the right expectation that I expect you to also explain specifically the benefit of adding all this new complexity. |
By the way, it looks like you are not signing your commits with an email address tied to your GitHub account. See #5532 (comment) and the comments linked from it for an explanation of what I mean by that and an explanation of how to fix it. |
It has been about 6 weeks since I last provided a review on this PR, and there has not been any activity on it since then. I'm closing this, assuming it's been abandoned. To make it clear to others onterested in this feature that they shouldn't be waiting on this PR. @ClaudioSalvatoreArcidiacono if you have time in the future to work with maintainers here, we'd welcome future contributions. |
Hey @jameslamb, I did not have much time to take a look at this lately, I should be more available now and in the coming weeks. If you also have some time to help me reviewing it I can pick this PR up again. Regarding your previous comments, thanks for the heads up on signing commits, I will sign the next commits as you mentioned. About the complexity of the proposed implementation, I am definitely open for feedback from the maintainers and I am willing to change the proposed implementation if necessary. In the proposed implementation I tried to stick to what is written in the FAQ:
So, in the proposed implementation I tried to find a common ground between convenience of activating early stopping using only init params and customisability of the splitting strategy. |
f2ccd52
to
7349a33
Compare
Just to set the right expectation...I personally will not be able to look at this for at least another week, and I think it's unlikely it'll make it into LightGBM 4.0 (#5952). I'm sorry, but this is quite complex and will require a significant investment of time to review. Some questions I'll be looking to answer when I start reviewing this:
|
Hey @jameslamb, no problem. Thanks for being transparent on your availability on this PR and for your feedback. I followed it and I made the PR easier to review now. I have removed the functionality to use a custom splitter and I made the changes much smaller.
I have now implemented the same interface as
In this implementation I tried to reuse the splitting function used in
Good observation, I think it is indeed needed to rename the arguments so that they will be more consistent with LightGBM naming conventions.
Correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ClaudioSalvatoreArcidiacono now that LightGBM v4.0 is out, I'd be happy to return to reviewing this.
Can you please update it to the latest master
? Please also see my next round of comments.
Hey @jameslamb thanks a lot for your review. I am still interested in solving this issue, but I will be on holidays for the next two weeks. I will take a look at your comments once I am back. |
No problem, thanks for letting us know! I'll be happy to continue working on this whenever you have time. |
ed6a774
to
340ac3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for picking this up again! We're happy to work with you but please see my comments... the code, your most recent comments, and the PR title still contradict each other about whether or not this PR is enabling early stopping automatically.
Confusion over this is one of the things that has delayed this PR so long. It's clear to me from @borchero and @jmoralez 's most recent comments, and I agree with them: do not enable early stopping by default in this PR.
I hope that just limiting this to "provide a scikit-learn-style for early stopping" will help reduce it's complexity and get it to a state where we can merge it faster.
python-package/lightgbm/sklearn.py
Outdated
if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000: | ||
params["early_stopping_round"] = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #5808 (comment), you said:
I have set early stopping to be disabled by default
But this block of code directly contradicts that... it enables early stopping by default if there are more than 10K rows in the training data.
From all the prior discussion on this PR, it seems clear to me that @borchero and @jmoralez both requested you not do that, and instead limit the scope of this PR to just providing an API for early stopping that matches how scikit-learn
estimators do it. I agree with them... it shouldn't be enabled by default as part of this PR.
Please, do the following:
- change the title of the PR (which will become a commit message and release notes item) to
[python-package] add scikit-learn-style API for early stopping
- remove this logic about
early_stopping_round = "auto"
(and the corresponding changes further down where you moved parameter processing to after input data validation, so it could referenceself._n_rows_train
- match whatever API
HistGradientBoostingClassifier
/HistGradientBoostingRegressor
currently have for enabling early stopping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jameslamb, thanks a lot for your review, I highly appreciate that.
You are right with your comment, it was a mistake on my side, I have fixed it in my latest commits. Thank you for spotting it.
In the latest version i had:
- Renamed the PR as you requested.
- Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.
- I think the APIs of
HistGradientBoostingClassifier
/HistGradientBoostingRegressor
and the current implementation are matching except that:HistGradientBoostingClassifier
/HistGradientBoostingRegressor
supportearly_stopping_round = "auto"
as this is their default parameter value, but as we discussed above we do not want this behaviour.- I have tried to support the parameters alias logic of LightGBM while working around the fact that
early_stopping
is an alias forearly_stopping_round(s)
andn_iter_no_change
. In the current implementation settingearly_stopping_round(s)
to an integer will override the default value forn_iter_no_change
, but settingearly_stopping
to an integer will not enable early stopping and will not override the default value forn_iter_no_change
. I have opted for this implementation to preserve backward compatibility as much as possible. Please let me know if you also agree with it or you have a different opinion, happy to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that explanation.
Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.
Agreed, thank you for removing it.
*
early_stopping
is an alias forearly_stopping_round(s)
andn_iter_no_change
Thanks for being careful about backwards compatibility!
It's unfortunate that we have early_stopping
as an alias for early_stopping_rounds
and that scikit-learn
's estimators now assign a different meaning to that :/
LightGBM/include/LightGBM/config.h
Lines 394 to 397 in 1090a93
// alias = early_stopping_rounds, early_stopping, n_iter_no_change | |
// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds | |
// desc = ``<= 0`` means disable | |
// desc = can be used to speed up training |
Seems like that has been in scikit-learn
for a long time (since v0.23):
I see your implementation of this comment in LGBMModel._process_params()
:
params = _choose_param_value("early_stopping_round", params, self.n_iter_no_change)
I think I see a path to resolving this. It'll be complex, but it would keep LightGBM in compliance with the rules from https://scikit-learn.org/1.5/developers/develop.html#instantiation.
- remove keyword argument
n_iter_no_change
(and propertyself.n_iter_no_change
)... but ensure there are unit tests confirming that passing keyword argumentn_iter_no_change
through the scikit-learn estimators correctly sets the number of early stopping rounds - make keyword argument
early_stopping
have typeUnion[bool, int]
- handle that difference in
LGMBModel._process_params()
, roughly like this:
rewrite this block:
LightGBM/python-package/lightgbm/sklearn.py
Lines 850 to 854 in 1090a93
# use joblib conventions for negative n_jobs, just like scikit-learn | |
# at predict time, this is handled later due to the order of parameter updates | |
if stage == "fit": | |
params = _choose_param_value("num_threads", params, self.n_jobs) | |
params["num_threads"] = self._process_n_jobs(params["num_threads"]) |
to something like this:
if stage == "fit":
# use joblib conventions for negative n_jobs, just like scikit-learn
# at predict time, this is handled later due to the order of parameter updates
params = _choose_param_value("num_threads", params, self.n_jobs)
params["num_threads"] = self._process_n_jobs(params["num_threads"])
if not isinstance(self.early_stopping, bool) and isinstance(self.early_stopping, int):
_log_warning(
f"Found 'early_stopping={self.early_stopping}' passed through keyword arguments.
"Future versions of 'lightgbm' will not allow this, as scikit-learn expects keyword argument "
"'early_stopping' to be a boolean indicating whether or not to perform early stopping with "
"a randomly-sampled validation set. To set the number of early stopping rounds, and suppress "
"this warning, pass early_stopping_rounds={self.early_stopping} instead."
)
params["early_stopping_round"] = _choose_param_value(
main_param_name="early_stopping_round",
params=params,
default_value=self.early_stopping
)
And then any other places in the code where you have self.early_stopping is True
should use isinstance(self.early_stopping, bool) and self.early_stopping is True
, to avoid this SyntaxWarning:
1 is True
# <stdin>:1: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?
I think that can work and provide backwards compatibility while allowing us to match scikit-learn's API. And then eventually (after several lightgbm
releases), that warning could be converted to an error and the type of early_stopping
could be narrowed to just bool
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a great suggestion! thanks a lot for it, I have implemented it in the latest version.
hey @jameslamb, any chance you could take a look at the latest changes ? |
y_train, | ||
group=q_train, | ||
eval_at=[1, 3], | ||
callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))], |
Why is modifying the learning rate necessary for this test? If this is unnecessary and just copied from somwhere else, please remove it.
python-package/lightgbm/sklearn.py
Outdated
early_stopping: bool = False, | ||
n_iter_no_change: int = 10, | ||
validation_fraction: Optional[float] = 0.1, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**kwargs, | |
**kwargs: Any, |
Why was this type hint removed? If it was just an accident, please put it back to reduce the size of the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @jameslamb, any chance you could take a look at the latest changes ?
I reviewed this again tonight. Please see my suggestions. We're getting closer, but there are still some significant unresolved issues. As you respond and change the code, please make every effort possible to keep your changes minimal and only focused on the narrow goal of this PR.
I also marked "resolved" any threads where it seemed that we'd reached a resolution that was reflected in the code. Let's please keep using threads on the "Files changed" tab in this way and resolving them as we move forward.
Thank you for continuing to work on this. I know it can be difficult to get so much critical feedback. That's partially a function of what you've taken on here... you've chosen a very complex problem on a highly-important piece of the library for your first code contribution to LightGBM. If you're willing to keep working with us, I'm willing to keep reviewing and moving this forward.
python-package/lightgbm/sklearn.py
Outdated
n_iter_no_change : int, optional (default=10) | ||
If early stopping is enabled, this parameter specifies the number of iterations with no | ||
improvement after which training will be stopped. | ||
validation_fraction : float or None, optional (default=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are not any tests in test_sklearn.py
that pass validation_fraction
. Please add some, covering both the default behavior and that passing a non-default value (like 0.4
) works as expected.
I don't know the exact code paths off the top of my head, would appreciate if you can investigate... but I think it should be possible to test this by checking the size of the datasets added to valid_sets
and confirming that they're as expected (e.g. that the automatically-aded validation set has 4,000 rows if the input data X
has 40,000 rows and validation_fraction=0.1
is passed).
If that's not observable through the public API, try to use mocking/patching to observe it instead of adding any additional properties to the Booster / estimators' public API.
Comment in-thread here if you have questions or need help with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment, I have added a couple of tests for that, I have used patching to achieve that, let me know if you think there is more that can be improved in those tests.
python-package/lightgbm/sklearn.py
Outdated
) | ||
|
||
valid_sets.append(valid_set) | ||
if self.early_stopping is True and eval_set is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this part of the diff appears to only be whitespace changes. When I hide whitespace, it looks like just this:
To make it easier for reviewers and to shrink the total number of lines touched, please change your approach here. On master
, lightgbm
currently has this:
if eval_set is not None:
# (existing code)
Change it to:
if eval_set is not None:
# (existing code)
elif self.early_stopping is True:
# (new code added in this PR)
@@ -1168,6 +1274,7 @@ def fit_and_check(eval_set_names, metric_names, assumed_iteration, first_metric_ | |||
"verbose": -1, | |||
"seed": 123, | |||
"early_stopping_rounds": 5, | |||
"early_stopping": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this necessary to make this test pass, or is it just being done for convenience?
if it's necessary, could you explain why? If it was just for convenience... please, introduce a new test after this one that tests what you want to test.
This PR should only be adding functionality without breaking any existing user code. I'd be more confident that that's true if the tests showed only additions, with no modifications to existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was a necessary change in order to make the test pass.
With the code in the current state, passing early_stopping_rounds
while leaving early_stopping
to its default value (False
) does not enable early stopping. Before it did.
If we want to keep the previous behaviour and we want to enable auto early stopping when early_stopping_rounds
is set to a value and early_stopping
is left to its default value we need to set the default value of early_stopping_rounds
to None
and to change it in process_parameters
to 10
when early_stopping
is set to True
, which will make the actual default value of early_stopping_rounds
a bit hidden in the code. Let me know what you prefer, both options have their own drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's a problem. And exactly why I have been so strict in asking that we only see new tests added in this PR.
I think that also answers my questions from #5808 (comment)
If the current state of this branch were merged, with this mix of behaviors:
early_stopping
keyword argument defaulting toFalse
early_stopping=False
meaning "do not perform any early stopping"
it would turn off early stopping for all existing code using early stopping with lightgbm.sklearn
estimators.
That's a very big user-facing breaking change, and one I do not support. Please, let's keep the scope of this PR limited to what I've written in the title... "add scikit-learn-style API for early stopping".
Specifically, this is the behavior I'm expecting:
early_stopping=True
:- adds a new validation set to
eval_set
which is a random sample of{validation_fraction} * num_rows
rows from the training data - enables early stopping, with
early_stopping_rounds
defaulting to 10 if not provided by any other mechanism - does not affect any other relevant early stopping behavior... e.g. if you also passed a dataset to
eval_set
infit(...)
, then that dataset is ALSO used for early stopping
- adds a new validation set to
early_stopping=False
- everything
lightgbm
does is identical to what it does today
- everything
The docstring for the early_stopping
keyword argument should also be updated to clearly explain this. For example:
early_stopping : bool, optional (default=False)
Whether to enable scikit-learn-style early stopping. If set to ``True`,
a new validation set will be created by randomly sampling ``validation_fraction`` rows
from the training data ``X`` passed to ``fit()``. Training will stop if the validation score
does not improve for a specific number of rounds (controlled by ``n_iter_no_change``).
This parameter is here for compatibility with ``scikit-learn``'s ``HistGradientBoosting``
estimators... it does not affect other ``lightgbm``-specific early stopping mechanisms,
like passing the ``lgb.early_stopping`` callback and validation sets to the ``eval_set``
argument of `fit()`.
If we want to keep the previous behaviour and we want to enable auto early stopping when
early_stopping_rounds
is set to a value andearly_stopping
is left to its default value we need to set the default value ofearly_stopping_rounds
toNone
I think this statement is also coming from you trying to tie this new .early_stopping
property of the scikit-learn
estimators to "whether or not any early stopping happens", and I guess you are referring to this:
LightGBM/python-package/lightgbm/engine.py
Lines 275 to 288 in 4feee28
if callback._should_enable_early_stopping(params.get("early_stopping_round", 0)): | |
callbacks_set.add( | |
callback.early_stopping( | |
stopping_rounds=params["early_stopping_round"], # type: ignore[arg-type] | |
first_metric_only=first_metric_only, | |
min_delta=params.get("early_stopping_min_delta", 0.0), | |
verbose=_choose_param_value( | |
main_param_name="verbosity", | |
params=params, | |
default_value=1, | |
).pop("verbosity") | |
> 0, | |
) | |
) |
I don't think we should do that. Let's limit the early_stopping
keyword argument to the narrow purpose of "whether or not to add 1 randomly-sampled validation set and perform early stopping on it".
Then, the only thing that has to be figured out is how many rounds to to train without improvement before stopping. I gave a proposal for doing that here: #5808 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ClaudioSalvatoreArcidiacono, I'd like to propose that we take a different approach to getting this PR done.
Could I take over this PR and push to it, to get it into a state that I'd feel comfortable merging?
Then:
- you review the state and tell me if anything that's left concerns you
- after you and I agree on the state, we ask other maintainers to review
This has been taking a really large amount of both your and my time and energy. I think this approach would get us to a resolution faster.
I would only push new commits and merge commits (no force-pushing), so you could see everything I changed and revert any of it.
Can we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jameslamb, the statement
If the current state of this branch were merged, with this mix of behaviours:
- early_stopping keyword argument defaulting to False
- early_stopping=False meaning "do not perform any early stopping"
it would turn off early stopping for all existing code using early stopping with lightgbm.sklearn estimators.
Is not correct. If the current state of this branch were merged it would not enable early stopping automatically. Meaning that passing an early_stopping
callback while leaving early_stopping keyword argument defaulting to False will still enable early stopping as it has always had. See for example this other test which is left unchanged and it passes.
The only non backward compatible change introduced at this stage is passing early_stopping_rounds is not sufficient anymore to enable early stopping (if the early_stopping parameter is left to False). I am definetly open to change that.
Regarding your last suggestion, I am extremely thankful for the effort you have invested in this PR and I am sorry to hear that it is taking more effort than you have anticipated. I am putting my best intentions into this PR and I definetly do not want cause any harm to the library or waste any of your precious time unnecessarily.
I would like to give this PR one last attempt, addressing all of your comments and making sure everything is spot on. I will tag you once I think the PR is ready for a last review. If you think that after this last attempt the gap to be filled is still too big, I will step aside and I will happily let you take over and review your code.
Are you fine with it :)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb, In your previous comment you mentioned
- early_stopping=True:
- adds a new validation set to eval_set which is a random sample of {validation_fraction} * num_rows rows from the training data
- enables early stopping, with early_stopping_rounds defaulting to 10 if not provided by any other mechanism
- does not affect any other relevant early stopping behavior... e.g. if you also passed a dataset to eval_set in fit(...), then that dataset is ALSO used for early stopping.
Do you think that we should create an extra validation set even if a validation set has already been provided in the fit()
?
I think that if a validation set has been provided in the fit()
and the parameter early_stopping=True
then we do not need to create a new validation set for early stopping. What do you think?
q_train = np.loadtxt(str(rank_example_dir / "rank.train.query")) | ||
n_estimators = 5 | ||
gbm = lgb.LGBMRanker(n_estimators=n_estimators, random_state=42, verbose=-1) | ||
gbm.fit(X_train, y_train, group=q_train) # Assuming 10 samples in one group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gbm.fit(X_train, y_train, group=q_train) # Assuming 10 samples in one group | |
gbm.fit(X_train, y_train, group=q_train) |
How does this code comment relate to the code? If it's left over from some earlier debugging, please remove it.
python-package/lightgbm/sklearn.py
Outdated
@@ -507,7 +507,10 @@ def __init__( | |||
random_state: Optional[Union[int, np.random.RandomState, np.random.Generator]] = None, | |||
n_jobs: Optional[int] = None, | |||
importance_type: str = "split", | |||
**kwargs: Any, | |||
early_stopping: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
early_stopping: bool = False, | |
*, | |
early_stopping: bool = False, |
I think we should make these keyword-only arguments, as they are in scikit-learn
: I think we should make these keyword-only arguments, as scikit-learn
does.
Could you please try that (in these estimators and the Dask ones)?
I don't want to do that for other existing parameters, to prevent breaking existing user code, but since these are new parameters, it's safe to be stricter.
python-package/lightgbm/sklearn.py
Outdated
if self.early_stopping is not True: | ||
params["early_stopping_round"] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.early_stopping is not True: | |
params["early_stopping_round"] = None |
This looks to me like it might turn off early stopping enabled other ways (like passing early_stopping_round=15
or the lgb.early_stopping()
callback + some valid_sets
) if keyword argument early_stopping=False
. Since early_stopping=False
is the default, that'd be a backwards-incompatible change.
The early_stopping
keyword argument in this PR is not intended to control ALL early stopping, right? I think it should be limited to controlling the scikit-learn-style early stopping, but that the other mechanisms that people have been using with lightgbm
for years should continue to work.
python-package/lightgbm/sklearn.py
Outdated
if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000: | ||
params["early_stopping_round"] = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that explanation.
Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.
Agreed, thank you for removing it.
*
early_stopping
is an alias forearly_stopping_round(s)
andn_iter_no_change
Thanks for being careful about backwards compatibility!
It's unfortunate that we have early_stopping
as an alias for early_stopping_rounds
and that scikit-learn
's estimators now assign a different meaning to that :/
LightGBM/include/LightGBM/config.h
Lines 394 to 397 in 1090a93
// alias = early_stopping_rounds, early_stopping, n_iter_no_change | |
// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds | |
// desc = ``<= 0`` means disable | |
// desc = can be used to speed up training |
Seems like that has been in scikit-learn
for a long time (since v0.23):
I see your implementation of this comment in LGBMModel._process_params()
:
params = _choose_param_value("early_stopping_round", params, self.n_iter_no_change)
I think I see a path to resolving this. It'll be complex, but it would keep LightGBM in compliance with the rules from https://scikit-learn.org/1.5/developers/develop.html#instantiation.
- remove keyword argument
n_iter_no_change
(and propertyself.n_iter_no_change
)... but ensure there are unit tests confirming that passing keyword argumentn_iter_no_change
through the scikit-learn estimators correctly sets the number of early stopping rounds - make keyword argument
early_stopping
have typeUnion[bool, int]
- handle that difference in
LGMBModel._process_params()
, roughly like this:
rewrite this block:
LightGBM/python-package/lightgbm/sklearn.py
Lines 850 to 854 in 1090a93
# use joblib conventions for negative n_jobs, just like scikit-learn | |
# at predict time, this is handled later due to the order of parameter updates | |
if stage == "fit": | |
params = _choose_param_value("num_threads", params, self.n_jobs) | |
params["num_threads"] = self._process_n_jobs(params["num_threads"]) |
to something like this:
if stage == "fit":
# use joblib conventions for negative n_jobs, just like scikit-learn
# at predict time, this is handled later due to the order of parameter updates
params = _choose_param_value("num_threads", params, self.n_jobs)
params["num_threads"] = self._process_n_jobs(params["num_threads"])
if not isinstance(self.early_stopping, bool) and isinstance(self.early_stopping, int):
_log_warning(
f"Found 'early_stopping={self.early_stopping}' passed through keyword arguments.
"Future versions of 'lightgbm' will not allow this, as scikit-learn expects keyword argument "
"'early_stopping' to be a boolean indicating whether or not to perform early stopping with "
"a randomly-sampled validation set. To set the number of early stopping rounds, and suppress "
"this warning, pass early_stopping_rounds={self.early_stopping} instead."
)
params["early_stopping_round"] = _choose_param_value(
main_param_name="early_stopping_round",
params=params,
default_value=self.early_stopping
)
And then any other places in the code where you have self.early_stopping is True
should use isinstance(self.early_stopping, bool) and self.early_stopping is True
, to avoid this SyntaxWarning:
1 is True
# <stdin>:1: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?
I think that can work and provide backwards compatibility while allowing us to match scikit-learn's API. And then eventually (after several lightgbm
releases), that warning could be converted to an error and the type of early_stopping
could be narrowed to just bool
.
What do you think?
python-package/lightgbm/sklearn.py
Outdated
n_splits = max(int(np.ceil(1 / self.validation_fraction)), 2) | ||
stratified = isinstance(self, LGBMClassifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a huge fan of how the validation set is created from the train set using the _make_n_folds
function.
if 1/validation_fraction is not an integer, the result will be that the actual validation set size will not match the validation fraction specified by the user.
For example, if the validation fraction is 0.4 the number of splits calculated here will be 2, which will result in a fraction of 0.5, instead of 0.4.
Using something like train_test_split from scikit-learn would solve the issue for the classification and the regression case, but for ranking tasks our best option is GroupShuffleSplit, which will inevitably suffer from the same issue expressed above. The options that I thought to solve this issue are:
- Leave the code as-is and raise a warning when 1/validation_fraction is not an integer.
- Use train_test_split for creating the validation set in the classification and regression cases; Raise a warning when 1/validation_fraction is not an integer in the ranking case.
I would lean more towards option 2, but this will make the MR bigger.
@jameslamb I would like to hear your opinion on it, do you perhaps already have something else in mind?
@@ -507,6 +507,9 @@ def __init__( | |||
random_state: Optional[Union[int, np.random.RandomState, np.random.Generator]] = None, | |||
n_jobs: Optional[int] = None, | |||
importance_type: str = "split", | |||
*, | |||
early_stopping: Union[bool, int] = False, | |||
validation_fraction: float = 0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb, I have removed the option of having validation_fraction=None
and performing early stopping on the train set.
Early stopping in lightgbm is disabled when the validation set equals the train set and I did not want to alter that behavior in this PR, so I have decided to deviate a little from the scikit-learn implementation of auto early stopping and to not allow validation_fraction=None
.
Let me know your thoughts on it.
Fixes #3313
Implements Scikit-learn like interface for early stopping.