[python-package] add scikit-learn-style API for early stopping #5808

ClaudioSalvatoreArcidiacono · 2023-03-26T18:06:23Z

Implements Scikit-learn like interface for early stopping.

jameslamb

Thanks for taking the time to contribute to LightGBM! I haven't reviewed this yet, but just leaving a blocking review to make it clear to other maintainers that I'd like the opportunity to review before this is merged. The discussion in #3313 talked about waiting for scikit-learn's interface for HistGradientBoostingClassifier to stabilize and mature and then adapting to the choices they made... I'd like the opportunity to look at that interface thoroughly prior to reviewing this.

jameslamb · 2023-03-26T20:01:23Z

Until we have a chance to review, you can improve the chances of this being merged by addressing any CI failures you see. You can safely ignore failing R-package jobs, there are some known issues with those (fixed in #5807 ).

ClaudioSalvatoreArcidiacono · 2023-03-27T12:06:24Z

Hey @jameslamb thanks for picking this up. I have started to take a look at the ci failures, I think I can solve most of them easily.

There is one check for which I need some input.

I see that one of the tests checks that the init parameters for the sklearn API and the Dask API have the same arguments (see this one). Early stopping is not available yet for the Dask API, so I do not see how can we easily iron that out. Shall I add some more exceptions to that specific test?

jameslamb · 2023-04-19T01:52:34Z

Shall I add some more exceptions to that specific test?

@ClaudioSalvatoreArcidiacono no, please do not do that. This test you've linked to is there exactly to catch such deviations.

If you absolutely need to add arguments to the constructors of the scikit-learn estimators in this project, add the same arguments in the same order to the Dask estimators, with default values of None or something, and raise NotImplementedError in the Dask interface when any non-None values are passed to those arguments.

For what it's worth, I am not totally convinced yet that we should take on the exact same interface as scikit-learn, (especially the large added complexity of this new validation_set_split_strategy argument). #3313 was primarily about whether or not to enable early stopping by default... not explicitly about changing the signature of the lightgbm.sklearn estimators to match HistGradientBoostingClassifier.

If you are committed to getting the CI passing here I'm willing to consider it, but just want to set the right expectation that I expect you to also explain specifically the benefit of adding all this new complexity.

jameslamb · 2023-04-19T01:54:07Z

By the way, it looks like you are not signing your commits with an email address tied to your GitHub account.

See #5532 (comment) and the comments linked from it for an explanation of what I mean by that and an explanation of how to fix it.

jameslamb · 2023-05-30T12:20:46Z

It has been about 6 weeks since I last provided a review on this PR, and there has not been any activity on it since then.

I'm closing this, assuming it's been abandoned. To make it clear to others onterested in this feature that they shouldn't be waiting on this PR.

@ClaudioSalvatoreArcidiacono if you have time in the future to work with maintainers here, we'd welcome future contributions.

ClaudioSalvatoreArcidiacono · 2023-06-05T09:52:36Z

Hey @jameslamb, I did not have much time to take a look at this lately, I should be more available now and in the coming weeks.

If you also have some time to help me reviewing it I can pick this PR up again.

Regarding your previous comments, thanks for the heads up on signing commits, I will sign the next commits as you mentioned.

About the complexity of the proposed implementation, I am definitely open for feedback from the maintainers and I am willing to change the proposed implementation if necessary.

In the proposed implementation I tried to stick to what is written in the FAQ:

The appropriate splitting strategy depends on the task and domain of the data, information that a modeler has but which LightGBM as a general-purpose tool does not.

So, in the proposed implementation I tried to find a common ground between convenience of activating early stopping using only init params and customisability of the splitting strategy.

jameslamb · 2023-06-30T14:24:45Z

Just to set the right expectation...I personally will not be able to look at this for at least another week, and I think it's unlikely it'll make it into LightGBM 4.0 (#5952).

I'm sorry, but this is quite complex and will require a significant investment of time to review. Some questions I'll be looking to answer when I start reviewing this:

what does "Scikit-learn like interface" mean, precisely?

does it mean you've implemented exactly the same interface as HistGradientBoostingClassifier and HistGradientBoostingRegressor ? If so can you please link to code and docs showing that?

and more broadly, why does this need to change ANYTHING about the public interface of lightgbm? And couldn't the existing mechanisms used inside lightgbm.cv() be used instead of adding all this new code for splitting? e.g.

LightGBM/python-package/lightgbm/engine.py

Lines 562 to 564 in 9f78cce

    
                   If object, it should be one of the scikit-learn splitter classes 
        
                   (https://scikit-learn.org/stable/modules/classes.html#splitter-classes) 
        
                   and have ``split`` method.

what has to happen to make these changes consistent with how lightgbm currently works? For example...
- what happens when early_stopping_rounds is passed to the estimator constructor via **kwargs and n_iter_no_change is set to a non-default value in the constructor... which value wins?
- What happens if early_stopping=True is passed but valid_sets are also passed to .fit()? Does that disable the automatic splitting and just use the provided validation sets?

ClaudioSalvatoreArcidiacono · 2023-07-01T08:35:11Z

Hey @jameslamb, no problem. Thanks for being transparent on your availability on this PR and for your feedback. I followed it and I made the PR easier to review now.

I have removed the functionality to use a custom splitter and I made the changes much smaller.

what does "Scikit-learn like interface" mean, precisely?

does it mean you've implemented exactly the same interface as HistGradientBoostingClassifier and HistGradientBoostingRegressor ? If so can you please link to code and docs showing that?

I have now implemented the same interface as HistGradientBoostingClassifier and HistGradientBoostingRegressor.

and more broadly, why does this need to change ANYTHING about the public interface of lightgbm? And couldn't the existing mechanisms used inside lightgbm.cv() be used instead of adding all this new code for splitting? e.g.

LightGBM/python-package/lightgbm/engine.py

Lines 562 to 564 in 9f78cce

If object, it should be one of the scikit-learn splitter classes

(https://scikit-learn.org/stable/modules/classes.html#splitter-classes)

and have ``split`` method.

In this implementation I tried to reuse the splitting function used in lightgbm.cv(). Thanks for the tip.

what has to happen to make these changes consistent with how lightgbm currently works? For example...

what happens when early_stopping_rounds is passed to the estimator constructor via **kwargs and n_iter_no_change is set to a non-default value in the constructor... which value wins?

Good observation, I think it is indeed needed to rename the arguments so that they will be more consistent with LightGBM naming conventions.

What happens if early_stopping=True is passed but valid_sets are also passed to .fit()? Does that disable the automatic splitting and just use the provided validation sets?

Correct.

jameslamb

@ClaudioSalvatoreArcidiacono now that LightGBM v4.0 is out, I'd be happy to return to reviewing this.

Can you please update it to the latest master? Please also see my next round of comments.

python-package/lightgbm/sklearn.py

tests/python_package_test/test_dask.py

python-package/lightgbm/sklearn.py

ClaudioSalvatoreArcidiacono · 2023-08-21T10:46:44Z

Hey @jameslamb thanks a lot for your review. I am still interested in solving this issue, but I will be on holidays for the next two weeks. I will take a look at your comments once I am back.

jameslamb · 2023-08-21T15:16:12Z

No problem, thanks for letting us know! I'll be happy to continue working on this whenever you have time.

ClaudioSalvatoreArcidiacono · 2024-11-28T19:28:43Z

Hey @jmoralez, @borchero I have set early stopping to be disabled by default and updated the tests and docs. Could you please review this PR when you have a moment ?

Do you maybe think I should mention the early stopping parameter in the docstring of the lightgbm scikit-learn classes?

jameslamb

Thank you for picking this up again! We're happy to work with you but please see my comments... the code, your most recent comments, and the PR title still contradict each other about whether or not this PR is enabling early stopping automatically.

Confusion over this is one of the things that has delayed this PR so long. It's clear to me from @borchero and @jmoralez 's most recent comments, and I agree with them: do not enable early stopping by default in this PR.

I hope that just limiting this to "provide a scikit-learn-style for early stopping" will help reduce it's complexity and get it to a state where we can merge it faster.

jameslamb · 2024-11-29T04:43:22Z

python-package/lightgbm/sklearn.py

+            if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000:
+                params["early_stopping_round"] = 10


In #5808 (comment), you said:

I have set early stopping to be disabled by default

But this block of code directly contradicts that... it enables early stopping by default if there are more than 10K rows in the training data.

From all the prior discussion on this PR, it seems clear to me that @borchero and @jmoralez both requested you not do that, and instead limit the scope of this PR to just providing an API for early stopping that matches how scikit-learn estimators do it. I agree with them... it shouldn't be enabled by default as part of this PR.

Please, do the following:

change the title of the PR (which will become a commit message and release notes item) to [python-package] add scikit-learn-style API for early stopping

remove this logic about early_stopping_round = "auto" (and the corresponding changes further down where you moved parameter processing to after input data validation, so it could reference self._n_rows_train

match whatever API HistGradientBoostingClassifier / HistGradientBoostingRegressor currently have for enabling early stopping

Hey @jameslamb, thanks a lot for your review, I highly appreciate that.

You are right with your comment, it was a mistake on my side, I have fixed it in my latest commits. Thank you for spotting it.

In the latest version i had:

Renamed the PR as you requested.

Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.

I think the APIs of HistGradientBoostingClassifier / HistGradientBoostingRegressor and the current implementation are matching except that:

HistGradientBoostingClassifier / HistGradientBoostingRegressor support early_stopping_round = "auto" as this is their default parameter value, but as we discussed above we do not want this behaviour.

I have tried to support the parameters alias logic of LightGBM while working around the fact that early_stopping is an alias for early_stopping_round(s) and n_iter_no_change. In the current implementation setting early_stopping_round(s) to an integer will override the default value for n_iter_no_change, but setting early_stopping to an integer will not enable early stopping and will not override the default value for n_iter_no_change. I have opted for this implementation to preserve backward compatibility as much as possible. Please let me know if you also agree with it or you have a different opinion, happy to change it.

Thanks for that explanation.

Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.

Agreed, thank you for removing it.

*early_stopping is an alias for early_stopping_round(s) and n_iter_no_change

Thanks for being careful about backwards compatibility!

It's unfortunate that we have early_stopping as an alias for early_stopping_rounds and that scikit-learn's estimators now assign a different meaning to that :/

LightGBM/include/LightGBM/config.h

Lines 394 to 397 in 1090a93

// alias = early_stopping_rounds, early_stopping, n_iter_no_change

// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds

// desc = ``<= 0`` means disable

// desc = can be used to speed up training

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1572-L1575

Seems like that has been in scikit-learn for a long time (since v0.23):

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1577

I see your implementation of this comment in LGBMModel._process_params():

params = _choose_param_value("early_stopping_round", params, self.n_iter_no_change)

I think I see a path to resolving this. It'll be complex, but it would keep LightGBM in compliance with the rules from https://scikit-learn.org/1.5/developers/develop.html#instantiation.

remove keyword argument n_iter_no_change (and property self.n_iter_no_change)... but ensure there are unit tests confirming that passing keyword argument n_iter_no_change through the scikit-learn estimators correctly sets the number of early stopping rounds

make keyword argument early_stopping have type Union[bool, int]

handle that difference in LGMBModel._process_params(), roughly like this:

rewrite this block:

LightGBM/python-package/lightgbm/sklearn.py

Lines 850 to 854 in 1090a93

# use joblib conventions for negative n_jobs, just like scikit-learn

# at predict time, this is handled later due to the order of parameter updates

if stage == "fit":

params = _choose_param_value("num_threads", params, self.n_jobs)

params["num_threads"] = self._process_n_jobs(params["num_threads"])

to something like this:

if stage == "fit": # use joblib conventions for negative n_jobs, just like scikit-learn # at predict time, this is handled later due to the order of parameter updates params = _choose_param_value("num_threads", params, self.n_jobs) params["num_threads"] = self._process_n_jobs(params["num_threads"]) if not isinstance(self.early_stopping, bool) and isinstance(self.early_stopping, int): _log_warning( f"Found 'early_stopping={self.early_stopping}' passed through keyword arguments. "Future versions of 'lightgbm' will not allow this, as scikit-learn expects keyword argument " "'early_stopping' to be a boolean indicating whether or not to perform early stopping with " "a randomly-sampled validation set. To set the number of early stopping rounds, and suppress " "this warning, pass early_stopping_rounds={self.early_stopping} instead." ) params["early_stopping_round"] = _choose_param_value( main_param_name="early_stopping_round", params=params, default_value=self.early_stopping )

And then any other places in the code where you have self.early_stopping is True should use isinstance(self.early_stopping, bool) and self.early_stopping is True, to avoid this SyntaxWarning:

1 is True # <stdin>:1: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?

I think that can work and provide backwards compatibility while allowing us to match scikit-learn's API. And then eventually (after several lightgbm releases), that warning could be converted to an error and the type of early_stopping could be narrowed to just bool.

What do you think?

It's a great suggestion! thanks a lot for it, I have implemented it in the latest version.

ClaudioSalvatoreArcidiacono · 2024-12-10T19:41:28Z

hey @jameslamb, any chance you could take a look at the latest changes ?

jameslamb · 2024-12-15T06:35:22Z

tests/python_package_test/test_sklearn.py

+        y_train,
+        group=q_train,
+        eval_at=[1, 3],
+        callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))],


Suggested change

callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))],

Why is modifying the learning rate necessary for this test? If this is unnecessary and just copied from somwhere else, please remove it.

jameslamb · 2024-12-15T06:41:21Z

python-package/lightgbm/sklearn.py

+        early_stopping: bool = False,
+        n_iter_no_change: int = 10,
+        validation_fraction: Optional[float] = 0.1,
+        **kwargs,


Suggested change

**kwargs,

**kwargs: Any,

Why was this type hint removed? If it was just an accident, please put it back to reduce the size of the diff.

python-package/lightgbm/sklearn.py

jameslamb

hey @jameslamb, any chance you could take a look at the latest changes ?

I reviewed this again tonight. Please see my suggestions. We're getting closer, but there are still some significant unresolved issues. As you respond and change the code, please make every effort possible to keep your changes minimal and only focused on the narrow goal of this PR.

I also marked "resolved" any threads where it seemed that we'd reached a resolution that was reflected in the code. Let's please keep using threads on the "Files changed" tab in this way and resolving them as we move forward.

Thank you for continuing to work on this. I know it can be difficult to get so much critical feedback. That's partially a function of what you've taken on here... you've chosen a very complex problem on a highly-important piece of the library for your first code contribution to LightGBM. If you're willing to keep working with us, I'm willing to keep reviewing and moving this forward.

jameslamb · 2024-12-15T06:53:03Z

python-package/lightgbm/sklearn.py

+        n_iter_no_change : int, optional (default=10)
+            If early stopping is enabled, this parameter specifies the number of iterations with no
+            improvement after which training will be stopped.
+        validation_fraction : float or None, optional (default=0.1)


There are not any tests in test_sklearn.py that pass validation_fraction. Please add some, covering both the default behavior and that passing a non-default value (like 0.4) works as expected.

I don't know the exact code paths off the top of my head, would appreciate if you can investigate... but I think it should be possible to test this by checking the size of the datasets added to valid_sets and confirming that they're as expected (e.g. that the automatically-aded validation set has 4,000 rows if the input data X has 40,000 rows and validation_fraction=0.1 is passed).

If that's not observable through the public API, try to use mocking/patching to observe it instead of adding any additional properties to the Booster / estimators' public API.

Comment in-thread here if you have questions or need help with that.

Thanks for your comment, I have added a couple of tests for that, I have used patching to achieve that, let me know if you think there is more that can be improved in those tests.

jameslamb · 2024-12-15T06:56:01Z

python-package/lightgbm/sklearn.py

-                    )
-
-                valid_sets.append(valid_set)
+        if self.early_stopping is True and eval_set is None:


Most of this part of the diff appears to only be whitespace changes. When I hide whitespace, it looks like just this:

To make it easier for reviewers and to shrink the total number of lines touched, please change your approach here. On master, lightgbm currently has this:

if eval_set is not None: # (existing code)

Change it to:

if eval_set is not None: # (existing code) elif self.early_stopping is True: # (new code added in this PR)

jameslamb · 2024-12-15T06:58:05Z

tests/python_package_test/test_sklearn.py

@@ -1168,6 +1274,7 @@ def fit_and_check(eval_set_names, metric_names, assumed_iteration, first_metric_
        "verbose": -1,
        "seed": 123,
        "early_stopping_rounds": 5,
+        "early_stopping": True,


Was this necessary to make this test pass, or is it just being done for convenience?

if it's necessary, could you explain why? If it was just for convenience... please, introduce a new test after this one that tests what you want to test.

This PR should only be adding functionality without breaking any existing user code. I'd be more confident that that's true if the tests showed only additions, with no modifications to existing tests.

Yes, this was a necessary change in order to make the test pass.

With the code in the current state, passing early_stopping_rounds while leaving early_stopping to its default value (False) does not enable early stopping. Before it did.

If we want to keep the previous behaviour and we want to enable auto early stopping when early_stopping_rounds is set to a value and early_stopping is left to its default value we need to set the default value of early_stopping_rounds to None and to change it in process_parameters to 10 when early_stopping is set to True, which will make the actual default value of early_stopping_rounds a bit hidden in the code. Let me know what you prefer, both options have their own drawbacks.

Ok, that's a problem. And exactly why I have been so strict in asking that we only see new tests added in this PR.

I think that also answers my questions from #5808 (comment)

If the current state of this branch were merged, with this mix of behaviors:

early_stopping keyword argument defaulting to False

early_stopping=False meaning "do not perform any early stopping"

it would turn off early stopping for all existing code using early stopping with lightgbm.sklearn estimators.

That's a very big user-facing breaking change, and one I do not support. Please, let's keep the scope of this PR limited to what I've written in the title... "add scikit-learn-style API for early stopping".

Specifically, this is the behavior I'm expecting:

early_stopping=True:

adds a new validation set to eval_set which is a random sample of {validation_fraction} * num_rows rows from the training data

enables early stopping, with early_stopping_rounds defaulting to 10 if not provided by any other mechanism

does not affect any other relevant early stopping behavior... e.g. if you also passed a dataset to eval_set in fit(...), then that dataset is ALSO used for early stopping

early_stopping=False

everything lightgbm does is identical to what it does today

The docstring for the early_stopping keyword argument should also be updated to clearly explain this. For example:

early_stopping : bool, optional (default=False) Whether to enable scikit-learn-style early stopping. If set to ``True`, a new validation set will be created by randomly sampling ``validation_fraction`` rows from the training data ``X`` passed to ``fit()``. Training will stop if the validation score does not improve for a specific number of rounds (controlled by ``n_iter_no_change``). This parameter is here for compatibility with ``scikit-learn``'s ``HistGradientBoosting`` estimators... it does not affect other ``lightgbm``-specific early stopping mechanisms, like passing the ``lgb.early_stopping`` callback and validation sets to the ``eval_set`` argument of `fit()`.

If we want to keep the previous behaviour and we want to enable auto early stopping when early_stopping_rounds is set to a value and early_stopping is left to its default value we need to set the default value of early_stopping_rounds to None

I think this statement is also coming from you trying to tie this new .early_stopping property of the scikit-learn estimators to "whether or not any early stopping happens", and I guess you are referring to this:

LightGBM/python-package/lightgbm/engine.py

Lines 275 to 288 in 4feee28

if callback._should_enable_early_stopping(params.get("early_stopping_round", 0)):

callbacks_set.add(

callback.early_stopping(

stopping_rounds=params["early_stopping_round"], # type: ignore[arg-type]

first_metric_only=first_metric_only,

min_delta=params.get("early_stopping_min_delta", 0.0),

verbose=_choose_param_value(

main_param_name="verbosity",

params=params,

default_value=1,

).pop("verbosity")

> 0,

)

)

I don't think we should do that. Let's limit the early_stopping keyword argument to the narrow purpose of "whether or not to add 1 randomly-sampled validation set and perform early stopping on it".

Then, the only thing that has to be figured out is how many rounds to to train without improvement before stopping. I gave a proposal for doing that here: #5808 (comment)

@ClaudioSalvatoreArcidiacono, I'd like to propose that we take a different approach to getting this PR done.

Could I take over this PR and push to it, to get it into a state that I'd feel comfortable merging?

Then:

you review the state and tell me if anything that's left concerns you

after you and I agree on the state, we ask other maintainers to review

This has been taking a really large amount of both your and my time and energy. I think this approach would get us to a resolution faster.

I would only push new commits and merge commits (no force-pushing), so you could see everything I changed and revert any of it.

Can we do that?

Hey @jameslamb, the statement

If the current state of this branch were merged, with this mix of behaviours:

early_stopping keyword argument defaulting to False

early_stopping=False meaning "do not perform any early stopping"

it would turn off early stopping for all existing code using early stopping with lightgbm.sklearn estimators.

Is not correct. If the current state of this branch were merged it would not enable early stopping automatically. Meaning that passing an early_stopping callback while leaving early_stopping keyword argument defaulting to False will still enable early stopping as it has always had. See for example this other test which is left unchanged and it passes.

The only non backward compatible change introduced at this stage is passing early_stopping_rounds is not sufficient anymore to enable early stopping (if the early_stopping parameter is left to False). I am definetly open to change that.

Regarding your last suggestion, I am extremely thankful for the effort you have invested in this PR and I am sorry to hear that it is taking more effort than you have anticipated. I am putting my best intentions into this PR and I definetly do not want cause any harm to the library or waste any of your precious time unnecessarily.

I would like to give this PR one last attempt, addressing all of your comments and making sure everything is spot on. I will tag you once I think the PR is ready for a last review. If you think that after this last attempt the gap to be filled is still too big, I will step aside and I will happily let you take over and review your code.

Are you fine with it :)?

@jameslamb, In your previous comment you mentioned

early_stopping=True:

adds a new validation set to eval_set which is a random sample of {validation_fraction} * num_rows rows from the training data

enables early stopping, with early_stopping_rounds defaulting to 10 if not provided by any other mechanism

does not affect any other relevant early stopping behavior... e.g. if you also passed a dataset to eval_set in fit(...), then that dataset is ALSO used for early stopping.

Do you think that we should create an extra validation set even if a validation set has already been provided in the fit()?

I think that if a validation set has been provided in the fit() and the parameter early_stopping=True then we do not need to create a new validation set for early stopping. What do you think?

jameslamb · 2024-12-15T06:59:12Z

tests/python_package_test/test_sklearn.py

+    q_train = np.loadtxt(str(rank_example_dir / "rank.train.query"))
+    n_estimators = 5
+    gbm = lgb.LGBMRanker(n_estimators=n_estimators, random_state=42, verbose=-1)
+    gbm.fit(X_train, y_train, group=q_train)  # Assuming 10 samples in one group


Suggested change

gbm.fit(X_train, y_train, group=q_train) # Assuming 10 samples in one group

gbm.fit(X_train, y_train, group=q_train)

How does this code comment relate to the code? If it's left over from some earlier debugging, please remove it.

python-package/lightgbm/engine.py

jameslamb · 2024-12-15T07:20:08Z

python-package/lightgbm/sklearn.py

@@ -507,7 +507,10 @@ def __init__(
        random_state: Optional[Union[int, np.random.RandomState, np.random.Generator]] = None,
        n_jobs: Optional[int] = None,
        importance_type: str = "split",
-        **kwargs: Any,
+        early_stopping: bool = False,


Suggested change

early_stopping: bool = False,

*,

early_stopping: bool = False,

I think we should make these keyword-only arguments, as they are in scikit-learn: I think we should make these keyword-only arguments, as scikit-learn does.

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1689

Could you please try that (in these estimators and the Dask ones)?

I don't want to do that for other existing parameters, to prevent breaking existing user code, but since these are new parameters, it's safe to be stricter.

jameslamb · 2024-12-15T07:49:55Z

python-package/lightgbm/sklearn.py

+        if self.early_stopping is not True:
+            params["early_stopping_round"] = None


Suggested change

if self.early_stopping is not True:

params["early_stopping_round"] = None

This looks to me like it might turn off early stopping enabled other ways (like passing early_stopping_round=15 or the lgb.early_stopping() callback + some valid_sets) if keyword argument early_stopping=False. Since early_stopping=False is the default, that'd be a backwards-incompatible change.

The early_stopping keyword argument in this PR is not intended to control ALL early stopping, right? I think it should be limited to controlling the scikit-learn-style early stopping, but that the other mechanisms that people have been using with lightgbm for years should continue to work.

jameslamb · 2024-12-15T08:11:01Z

python-package/lightgbm/sklearn.py

+            if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000:
+                params["early_stopping_round"] = 10


Thanks for that explanation.

Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.

Agreed, thank you for removing it.

*early_stopping is an alias for early_stopping_round(s) and n_iter_no_change

Thanks for being careful about backwards compatibility!

It's unfortunate that we have early_stopping as an alias for early_stopping_rounds and that scikit-learn's estimators now assign a different meaning to that :/

LightGBM/include/LightGBM/config.h

Lines 394 to 397 in 1090a93

// alias = early_stopping_rounds, early_stopping, n_iter_no_change

// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds

// desc = ``<= 0`` means disable

// desc = can be used to speed up training

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1572-L1575

Seems like that has been in scikit-learn for a long time (since v0.23):

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1577

I see your implementation of this comment in LGBMModel._process_params():

params = _choose_param_value("early_stopping_round", params, self.n_iter_no_change)

I think I see a path to resolving this. It'll be complex, but it would keep LightGBM in compliance with the rules from https://scikit-learn.org/1.5/developers/develop.html#instantiation.

remove keyword argument n_iter_no_change (and property self.n_iter_no_change)... but ensure there are unit tests confirming that passing keyword argument n_iter_no_change through the scikit-learn estimators correctly sets the number of early stopping rounds

make keyword argument early_stopping have type Union[bool, int]

handle that difference in LGMBModel._process_params(), roughly like this:

rewrite this block:

LightGBM/python-package/lightgbm/sklearn.py

Lines 850 to 854 in 1090a93

# use joblib conventions for negative n_jobs, just like scikit-learn

# at predict time, this is handled later due to the order of parameter updates

if stage == "fit":

params = _choose_param_value("num_threads", params, self.n_jobs)

params["num_threads"] = self._process_n_jobs(params["num_threads"])

to something like this:

if stage == "fit": # use joblib conventions for negative n_jobs, just like scikit-learn # at predict time, this is handled later due to the order of parameter updates params = _choose_param_value("num_threads", params, self.n_jobs) params["num_threads"] = self._process_n_jobs(params["num_threads"]) if not isinstance(self.early_stopping, bool) and isinstance(self.early_stopping, int): _log_warning( f"Found 'early_stopping={self.early_stopping}' passed through keyword arguments. "Future versions of 'lightgbm' will not allow this, as scikit-learn expects keyword argument " "'early_stopping' to be a boolean indicating whether or not to perform early stopping with " "a randomly-sampled validation set. To set the number of early stopping rounds, and suppress " "this warning, pass early_stopping_rounds={self.early_stopping} instead." ) params["early_stopping_round"] = _choose_param_value( main_param_name="early_stopping_round", params=params, default_value=self.early_stopping )

And then any other places in the code where you have self.early_stopping is True should use isinstance(self.early_stopping, bool) and self.early_stopping is True, to avoid this SyntaxWarning:

1 is True # <stdin>:1: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?

I think that can work and provide backwards compatibility while allowing us to match scikit-learn's API. And then eventually (after several lightgbm releases), that warning could be converted to an error and the type of early_stopping could be narrowed to just bool.

What do you think?

python-package/lightgbm/basic.py

…rly-stopping

ClaudioSalvatoreArcidiacono · 2024-12-18T12:45:19Z

python-package/lightgbm/sklearn.py

+                n_splits = max(int(np.ceil(1 / self.validation_fraction)), 2)
+                stratified = isinstance(self, LGBMClassifier)


I am not a huge fan of how the validation set is created from the train set using the _make_n_folds function.

if 1/validation_fraction is not an integer, the result will be that the actual validation set size will not match the validation fraction specified by the user.

For example, if the validation fraction is 0.4 the number of splits calculated here will be 2, which will result in a fraction of 0.5, instead of 0.4.

Using something like train_test_split from scikit-learn would solve the issue for the classification and the regression case, but for ranking tasks our best option is GroupShuffleSplit, which will inevitably suffer from the same issue expressed above. The options that I thought to solve this issue are:

Leave the code as-is and raise a warning when 1/validation_fraction is not an integer.

Use train_test_split for creating the validation set in the classification and regression cases; Raise a warning when 1/validation_fraction is not an integer in the ranking case.

I would lean more towards option 2, but this will make the MR bigger.

@jameslamb I would like to hear your opinion on it, do you perhaps already have something else in mind?

ClaudioSalvatoreArcidiacono · 2024-12-18T16:23:50Z

python-package/lightgbm/sklearn.py

@@ -507,6 +507,9 @@ def __init__(
        random_state: Optional[Union[int, np.random.RandomState, np.random.Generator]] = None,
        n_jobs: Optional[int] = None,
        importance_type: str = "split",
+        *,
+        early_stopping: Union[bool, int] = False,
+        validation_fraction: float = 0.1,


@jameslamb, I have removed the option of having validation_fraction=None and performing early stopping on the train set.

Early stopping in lightgbm is disabled when the validation set equals the train set and I did not want to alter that behavior in this PR, so I have decided to deviate a little from the scikit-learn implementation of auto early stopping and to not allow validation_fraction=None.

Let me know your thoughts on it.

ClaudioSalvatoreArcidiacono requested review from StrikerRUS, shiyu1994, jameslamb and jmoralez as code owners March 26, 2023 18:06

ClaudioSalvatoreArcidiacono mentioned this pull request Mar 26, 2023

[Feature Request] Auto early stopping in Sklearn API #3313

Closed

This comment was marked as resolved.

Sign in to view

jameslamb changed the title ~~3313 enable auto early stopping~~ [python-package] enable early stopping automatically in scikit-learn interface (fixes #3313) Mar 26, 2023

jameslamb added in progress breaking labels Mar 26, 2023

jameslamb requested changes Mar 26, 2023

View reviewed changes

jameslamb added the awaiting response label Apr 24, 2023

34j mentioned this pull request May 18, 2023

[python-package] Add progress_bar callback using tqdm #5867

Closed

jameslamb closed this May 30, 2023

github-actions bot removed the awaiting response label Jun 5, 2023

github-actions bot reopened this Jun 5, 2023

ClaudioSalvatoreArcidiacono requested a review from guolinke as a code owner June 5, 2023 10:11

ClaudioSalvatoreArcidiacono force-pushed the 3313-enable-auto-early-stopping branch from f2ccd52 to 7349a33 Compare June 5, 2023 12:28

jameslamb added the awaiting review label Aug 18, 2023

jameslamb requested changes Aug 18, 2023

View reviewed changes

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved

ClaudioSalvatoreArcidiacono force-pushed the 3313-enable-auto-early-stopping branch from ed6a774 to 340ac3c Compare September 9, 2023 08:23

Solve python 3.8 compatibility issue

9c1c8b4

jameslamb requested changes Nov 29, 2024

View reviewed changes

Remove default to auto

724c7fe

ClaudioSalvatoreArcidiacono changed the title ~~[python-package] enable early stopping automatically in scikit-learn interface (fixes #3313)~~ [python-package] add scikit-learn-style API for early stopping (fixes #3313) Nov 29, 2024

ClaudioSalvatoreArcidiacono changed the title ~~[python-package] add scikit-learn-style API for early stopping (fixes #3313)~~ [python-package] add scikit-learn-style API for early stopping Nov 29, 2024

ClaudioSalvatoreArcidiacono added 6 commits November 29, 2024 09:53

Revert changes in fit top part

c957fce

Make interface as similar as possible to sklearn

2d7da78

Add parameters to dask interface

069a84e

Improve documentation

c430ec1

Linting

416323a

Check for exact value equal true for early stopping

73562ff

jameslamb reviewed Dec 15, 2024

View reviewed changes

Merge branch 'master' into 3313-enable-auto-early-stopping

38edc42

jameslamb self-requested a review December 15, 2024 08:13

jameslamb requested changes Dec 15, 2024

View reviewed changes

ClaudioSalvatoreArcidiacono mentioned this pull request Dec 17, 2024

[python-package] Rename cvfolds variable into cvbooster in engine.cv function #6762

Merged

ClaudioSalvatoreArcidiacono added 3 commits December 18, 2024 11:47

Switch if/else conditions order in fit

9a32376

Merge remote-tracking branch 'origin/master' into 3313-enable-auto-ea…

f33ebd3

…rly-stopping

fix issues in engine.py

a61726f

ClaudioSalvatoreArcidiacono commented Dec 18, 2024

View reviewed changes

ClaudioSalvatoreArcidiacono added 7 commits December 18, 2024 13:57

make new early stopping parameters keyword-only

44316d7

Remove n_iter_no_change parameter

4cbfc84

Address comments in tests

93acf6a

Improve tests

2b049c9

Add tests to check for validation fraction

61371cb

Remove validation_fraction=None option

65c4e2f

Remove validation_fraction=None option also in dask

0a8e843

ClaudioSalvatoreArcidiacono commented Dec 18, 2024

View reviewed changes

		if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000:
		params["early_stopping_round"] = 10

	// alias = early_stopping_rounds, early_stopping, n_iter_no_change
	// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds
	// desc = ``<= 0`` means disable
	// desc = can be used to speed up training

	# use joblib conventions for negative n_jobs, just like scikit-learn
	# at predict time, this is handled later due to the order of parameter updates
	if stage == "fit":
	params = _choose_param_value("num_threads", params, self.n_jobs)
	params["num_threads"] = self._process_n_jobs(params["num_threads"])

	if callback._should_enable_early_stopping(params.get("early_stopping_round", 0)):
	callbacks_set.add(
	callback.early_stopping(
	stopping_rounds=params["early_stopping_round"], # type: ignore[arg-type]
	first_metric_only=first_metric_only,
	min_delta=params.get("early_stopping_min_delta", 0.0),
	verbose=_choose_param_value(
	main_param_name="verbosity",
	params=params,
	default_value=1,
	).pop("verbosity")
	> 0,
	)
	)

	gbm.fit(X_train, y_train, group=q_train) # Assuming 10 samples in one group
	gbm.fit(X_train, y_train, group=q_train)

	early_stopping: bool = False,
	*,
	early_stopping: bool = False,

		if self.early_stopping is not True:
		params["early_stopping_round"] = None

		n_splits = max(int(np.ceil(1 / self.validation_fraction)), 2)
		stratified = isinstance(self, LGBMClassifier)

[python-package] add scikit-learn-style API for early stopping #5808

Are you sure you want to change the base?

[python-package] add scikit-learn-style API for early stopping #5808

Conversation

ClaudioSalvatoreArcidiacono commented Mar 26, 2023

This comment was marked as resolved.

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Mar 26, 2023

ClaudioSalvatoreArcidiacono commented Mar 27, 2023

jameslamb commented Apr 19, 2023

jameslamb commented Apr 19, 2023

jameslamb commented May 30, 2023

ClaudioSalvatoreArcidiacono commented Jun 5, 2023

jameslamb commented Jun 30, 2023

ClaudioSalvatoreArcidiacono commented Jul 1, 2023 • edited Loading

jameslamb left a comment

Choose a reason for hiding this comment

ClaudioSalvatoreArcidiacono commented Aug 21, 2023

jameslamb commented Aug 21, 2023

ClaudioSalvatoreArcidiacono commented Nov 28, 2024

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ClaudioSalvatoreArcidiacono commented Dec 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ClaudioSalvatoreArcidiacono Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

ClaudioSalvatoreArcidiacono Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ClaudioSalvatoreArcidiacono commented Jul 1, 2023 •

edited

Loading

ClaudioSalvatoreArcidiacono Dec 18, 2024 •

edited

Loading

ClaudioSalvatoreArcidiacono Dec 18, 2024 •

edited

Loading