Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] add scikit-learn-style API for early stopping #5808

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

ClaudioSalvatoreArcidiacono
Copy link
Contributor

Fixes #3313

Implements Scikit-learn like interface for early stopping.

@ClaudioSalvatoreArcidiacono

This comment was marked as resolved.

@jameslamb jameslamb changed the title 3313 enable auto early stopping [python-package] enable early stopping automatically in scikit-learn interface (fixes #3313) Mar 26, 2023
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to contribute to LightGBM! I haven't reviewed this yet, but just leaving a blocking review to make it clear to other maintainers that I'd like the opportunity to review before this is merged. The discussion in #3313 talked about waiting for scikit-learn's interface for HistGradientBoostingClassifier to stabilize and mature and then adapting to the choices they made... I'd like the opportunity to look at that interface thoroughly prior to reviewing this.

@jameslamb
Copy link
Collaborator

Until we have a chance to review, you can improve the chances of this being merged by addressing any CI failures you see. You can safely ignore failing R-package jobs, there are some known issues with those (fixed in #5807 ).

@ClaudioSalvatoreArcidiacono
Copy link
Contributor Author

Hey @jameslamb thanks for picking this up. I have started to take a look at the ci failures, I think I can solve most of them easily.

There is one check for which I need some input.

I see that one of the tests checks that the init parameters for the sklearn API and the Dask API have the same arguments (see this one). Early stopping is not available yet for the Dask API, so I do not see how can we easily iron that out. Shall I add some more exceptions to that specific test?

@jameslamb
Copy link
Collaborator

Shall I add some more exceptions to that specific test?

@ClaudioSalvatoreArcidiacono no, please do not do that. This test you've linked to is there exactly to catch such deviations.

If you absolutely need to add arguments to the constructors of the scikit-learn estimators in this project, add the same arguments in the same order to the Dask estimators, with default values of None or something, and raise NotImplementedError in the Dask interface when any non-None values are passed to those arguments.

For what it's worth, I am not totally convinced yet that we should take on the exact same interface as scikit-learn, (especially the large added complexity of this new validation_set_split_strategy argument). #3313 was primarily about whether or not to enable early stopping by default... not explicitly about changing the signature of the lightgbm.sklearn estimators to match HistGradientBoostingClassifier.

If you are committed to getting the CI passing here I'm willing to consider it, but just want to set the right expectation that I expect you to also explain specifically the benefit of adding all this new complexity.

@jameslamb
Copy link
Collaborator

By the way, it looks like you are not signing your commits with an email address tied to your GitHub account.

Screen Shot 2023-04-18 at 8 52 12 PM

See #5532 (comment) and the comments linked from it for an explanation of what I mean by that and an explanation of how to fix it.

@jameslamb
Copy link
Collaborator

It has been about 6 weeks since I last provided a review on this PR, and there has not been any activity on it since then.

I'm closing this, assuming it's been abandoned. To make it clear to others onterested in this feature that they shouldn't be waiting on this PR.

@ClaudioSalvatoreArcidiacono if you have time in the future to work with maintainers here, we'd welcome future contributions.

@jameslamb jameslamb closed this May 30, 2023
@ClaudioSalvatoreArcidiacono
Copy link
Contributor Author

Hey @jameslamb, I did not have much time to take a look at this lately, I should be more available now and in the coming weeks.

If you also have some time to help me reviewing it I can pick this PR up again.

Regarding your previous comments, thanks for the heads up on signing commits, I will sign the next commits as you mentioned.

About the complexity of the proposed implementation, I am definitely open for feedback from the maintainers and I am willing to change the proposed implementation if necessary.

In the proposed implementation I tried to stick to what is written in the FAQ:

The appropriate splitting strategy depends on the task and domain of the data, information that a modeler has but which LightGBM as a general-purpose tool does not.

So, in the proposed implementation I tried to find a common ground between convenience of activating early stopping using only init params and customisability of the splitting strategy.

@jameslamb
Copy link
Collaborator

Just to set the right expectation...I personally will not be able to look at this for at least another week, and I think it's unlikely it'll make it into LightGBM 4.0 (#5952).

I'm sorry, but this is quite complex and will require a significant investment of time to review. Some questions I'll be looking to answer when I start reviewing this:

  • what does "Scikit-learn like interface" mean, precisely?
    • does it mean you've implemented exactly the same interface as HistGradientBoostingClassifier and HistGradientBoostingRegressor ? If so can you please link to code and docs showing that?
    • and more broadly, why does this need to change ANYTHING about the public interface of lightgbm? And couldn't the existing mechanisms used inside lightgbm.cv() be used instead of adding all this new code for splitting? e.g.
      If object, it should be one of the scikit-learn splitter classes
      (https://scikit-learn.org/stable/modules/classes.html#splitter-classes)
      and have ``split`` method.
  • what has to happen to make these changes consistent with how lightgbm currently works? For example...
    • what happens when early_stopping_rounds is passed to the estimator constructor via **kwargs and n_iter_no_change is set to a non-default value in the constructor... which value wins?
    • What happens if early_stopping=True is passed but valid_sets are also passed to .fit()? Does that disable the automatic splitting and just use the provided validation sets?

@ClaudioSalvatoreArcidiacono
Copy link
Contributor Author

ClaudioSalvatoreArcidiacono commented Jul 1, 2023

Hey @jameslamb, no problem. Thanks for being transparent on your availability on this PR and for your feedback. I followed it and I made the PR easier to review now.

I have removed the functionality to use a custom splitter and I made the changes much smaller.

  • what does "Scikit-learn like interface" mean, precisely?

    • does it mean you've implemented exactly the same interface as HistGradientBoostingClassifier and HistGradientBoostingRegressor ? If so can you please link to code and docs showing that?

I have now implemented the same interface as HistGradientBoostingClassifier and HistGradientBoostingRegressor.

  • and more broadly, why does this need to change ANYTHING about the public interface of lightgbm? And couldn't the existing mechanisms used inside lightgbm.cv() be used instead of adding all this new code for splitting? e.g.
    If object, it should be one of the scikit-learn splitter classes
    (https://scikit-learn.org/stable/modules/classes.html#splitter-classes)
    and have ``split`` method.

In this implementation I tried to reuse the splitting function used in lightgbm.cv(). Thanks for the tip.

  • what has to happen to make these changes consistent with how lightgbm currently works? For example...

    • what happens when early_stopping_rounds is passed to the estimator constructor via **kwargs and n_iter_no_change is set to a non-default value in the constructor... which value wins?

Good observation, I think it is indeed needed to rename the arguments so that they will be more consistent with LightGBM naming conventions.

  • What happens if early_stopping=True is passed but valid_sets are also passed to .fit()? Does that disable the automatic splitting and just use the provided validation sets?

Correct.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ClaudioSalvatoreArcidiacono now that LightGBM v4.0 is out, I'd be happy to return to reviewing this.

Can you please update it to the latest master? Please also see my next round of comments.

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved
python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved
@ClaudioSalvatoreArcidiacono
Copy link
Contributor Author

Hey @jameslamb thanks a lot for your review. I am still interested in solving this issue, but I will be on holidays for the next two weeks. I will take a look at your comments once I am back.

@jameslamb
Copy link
Collaborator

No problem, thanks for letting us know! I'll be happy to continue working on this whenever you have time.

@ClaudioSalvatoreArcidiacono
Copy link
Contributor Author

Hey @jmoralez, @borchero I have set early stopping to be disabled by default and updated the tests and docs. Could you please review this PR when you have a moment ?

Do you maybe think I should mention the early stopping parameter in the docstring of the lightgbm scikit-learn classes?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up again! We're happy to work with you but please see my comments... the code, your most recent comments, and the PR title still contradict each other about whether or not this PR is enabling early stopping automatically.

Confusion over this is one of the things that has delayed this PR so long. It's clear to me from @borchero and @jmoralez 's most recent comments, and I agree with them: do not enable early stopping by default in this PR.

I hope that just limiting this to "provide a scikit-learn-style for early stopping" will help reduce it's complexity and get it to a state where we can merge it faster.

Comment on lines 834 to 835
if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000:
params["early_stopping_round"] = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #5808 (comment), you said:

I have set early stopping to be disabled by default

But this block of code directly contradicts that... it enables early stopping by default if there are more than 10K rows in the training data.

From all the prior discussion on this PR, it seems clear to me that @borchero and @jmoralez both requested you not do that, and instead limit the scope of this PR to just providing an API for early stopping that matches how scikit-learn estimators do it. I agree with them... it shouldn't be enabled by default as part of this PR.

Please, do the following:

  • change the title of the PR (which will become a commit message and release notes item) to [python-package] add scikit-learn-style API for early stopping
  • remove this logic about early_stopping_round = "auto" (and the corresponding changes further down where you moved parameter processing to after input data validation, so it could reference self._n_rows_train
  • match whatever API HistGradientBoostingClassifier / HistGradientBoostingRegressor currently have for enabling early stopping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jameslamb, thanks a lot for your review, I highly appreciate that.

You are right with your comment, it was a mistake on my side, I have fixed it in my latest commits. Thank you for spotting it.

In the latest version i had:

  • Renamed the PR as you requested.
  • Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.
  • I think the APIs of HistGradientBoostingClassifier / HistGradientBoostingRegressor and the current implementation are matching except that:
    • HistGradientBoostingClassifier / HistGradientBoostingRegressor support early_stopping_round = "auto" as this is their default parameter value, but as we discussed above we do not want this behaviour.
    • I have tried to support the parameters alias logic of LightGBM while working around the fact that early_stopping is an alias for early_stopping_round(s) and n_iter_no_change. In the current implementation setting early_stopping_round(s) to an integer will override the default value for n_iter_no_change, but setting early_stopping to an integer will not enable early stopping and will not override the default value for n_iter_no_change. I have opted for this implementation to preserve backward compatibility as much as possible. Please let me know if you also agree with it or you have a different opinion, happy to change it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that explanation.

Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.

Agreed, thank you for removing it.

*early_stopping is an alias for early_stopping_round(s) and n_iter_no_change

Thanks for being careful about backwards compatibility!

It's unfortunate that we have early_stopping as an alias for early_stopping_rounds and that scikit-learn's estimators now assign a different meaning to that :/

// alias = early_stopping_rounds, early_stopping, n_iter_no_change
// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds
// desc = ``<= 0`` means disable
// desc = can be used to speed up training

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1572-L1575

Seems like that has been in scikit-learn for a long time (since v0.23):

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1577

I see your implementation of this comment in LGBMModel._process_params():

params = _choose_param_value("early_stopping_round", params, self.n_iter_no_change)

I think I see a path to resolving this. It'll be complex, but it would keep LightGBM in compliance with the rules from https://scikit-learn.org/1.5/developers/develop.html#instantiation.

  • remove keyword argument n_iter_no_change (and property self.n_iter_no_change)... but ensure there are unit tests confirming that passing keyword argument n_iter_no_change through the scikit-learn estimators correctly sets the number of early stopping rounds
  • make keyword argument early_stopping have type Union[bool, int]
  • handle that difference in LGMBModel._process_params(), roughly like this:

rewrite this block:

# use joblib conventions for negative n_jobs, just like scikit-learn
# at predict time, this is handled later due to the order of parameter updates
if stage == "fit":
params = _choose_param_value("num_threads", params, self.n_jobs)
params["num_threads"] = self._process_n_jobs(params["num_threads"])

to something like this:

        if stage == "fit":
            # use joblib conventions for negative n_jobs, just like scikit-learn
            # at predict time, this is handled later due to the order of parameter updates
            params = _choose_param_value("num_threads", params, self.n_jobs)
            params["num_threads"] = self._process_n_jobs(params["num_threads"])

            if not isinstance(self.early_stopping, bool) and isinstance(self.early_stopping, int):
                _log_warning(
                    f"Found 'early_stopping={self.early_stopping}' passed through keyword arguments.
                    "Future versions of 'lightgbm' will not allow this, as scikit-learn expects keyword argument "
                    "'early_stopping' to be a boolean indicating whether or not to perform early stopping with "
                    "a randomly-sampled validation set. To set the number of early stopping rounds, and suppress "
                    "this warning, pass early_stopping_rounds={self.early_stopping} instead."
                 )
                 params["early_stopping_round"] = _choose_param_value(
                     main_param_name="early_stopping_round",
                     params=params,
                     default_value=self.early_stopping
                 )

And then any other places in the code where you have self.early_stopping is True should use isinstance(self.early_stopping, bool) and self.early_stopping is True, to avoid this SyntaxWarning:

1 is True
# <stdin>:1: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?

I think that can work and provide backwards compatibility while allowing us to match scikit-learn's API. And then eventually (after several lightgbm releases), that warning could be converted to an error and the type of early_stopping could be narrowed to just bool.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great suggestion! thanks a lot for it, I have implemented it in the latest version.

@ClaudioSalvatoreArcidiacono ClaudioSalvatoreArcidiacono changed the title [python-package] enable early stopping automatically in scikit-learn interface (fixes #3313) [python-package] add scikit-learn-style API for early stopping (fixes #3313) Nov 29, 2024
@ClaudioSalvatoreArcidiacono ClaudioSalvatoreArcidiacono changed the title [python-package] add scikit-learn-style API for early stopping (fixes #3313) [python-package] add scikit-learn-style API for early stopping Nov 29, 2024
@ClaudioSalvatoreArcidiacono
Copy link
Contributor Author

hey @jameslamb, any chance you could take a look at the latest changes ?

y_train,
group=q_train,
eval_at=[1, 3],
callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
callbacks=[lgb.reset_parameter(learning_rate=lambda x: max(0.01, 0.1 - 0.01 * x))],

Why is modifying the learning rate necessary for this test? If this is unnecessary and just copied from somwhere else, please remove it.

early_stopping: bool = False,
n_iter_no_change: int = 10,
validation_fraction: Optional[float] = 0.1,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**kwargs,
**kwargs: Any,

Why was this type hint removed? If it was just an accident, please put it back to reduce the size of the diff.

python-package/lightgbm/sklearn.py Outdated Show resolved Hide resolved
@jameslamb jameslamb self-requested a review December 15, 2024 08:13
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @jameslamb, any chance you could take a look at the latest changes ?

I reviewed this again tonight. Please see my suggestions. We're getting closer, but there are still some significant unresolved issues. As you respond and change the code, please make every effort possible to keep your changes minimal and only focused on the narrow goal of this PR.

I also marked "resolved" any threads where it seemed that we'd reached a resolution that was reflected in the code. Let's please keep using threads on the "Files changed" tab in this way and resolving them as we move forward.

Thank you for continuing to work on this. I know it can be difficult to get so much critical feedback. That's partially a function of what you've taken on here... you've chosen a very complex problem on a highly-important piece of the library for your first code contribution to LightGBM. If you're willing to keep working with us, I'm willing to keep reviewing and moving this forward.

n_iter_no_change : int, optional (default=10)
If early stopping is enabled, this parameter specifies the number of iterations with no
improvement after which training will be stopped.
validation_fraction : float or None, optional (default=0.1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are not any tests in test_sklearn.py that pass validation_fraction. Please add some, covering both the default behavior and that passing a non-default value (like 0.4) works as expected.

I don't know the exact code paths off the top of my head, would appreciate if you can investigate... but I think it should be possible to test this by checking the size of the datasets added to valid_sets and confirming that they're as expected (e.g. that the automatically-aded validation set has 4,000 rows if the input data X has 40,000 rows and validation_fraction=0.1 is passed).

If that's not observable through the public API, try to use mocking/patching to observe it instead of adding any additional properties to the Booster / estimators' public API.

Comment in-thread here if you have questions or need help with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment, I have added a couple of tests for that, I have used patching to achieve that, let me know if you think there is more that can be improved in those tests.

)

valid_sets.append(valid_set)
if self.early_stopping is True and eval_set is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this part of the diff appears to only be whitespace changes. When I hide whitespace, it looks like just this:

image

To make it easier for reviewers and to shrink the total number of lines touched, please change your approach here. On master, lightgbm currently has this:

if eval_set is not None:
    # (existing code)

Change it to:

if eval_set is not None:
    # (existing code)
elif self.early_stopping is True:
    # (new code added in this PR)

@@ -1168,6 +1274,7 @@ def fit_and_check(eval_set_names, metric_names, assumed_iteration, first_metric_
"verbose": -1,
"seed": 123,
"early_stopping_rounds": 5,
"early_stopping": True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary to make this test pass, or is it just being done for convenience?

if it's necessary, could you explain why? If it was just for convenience... please, introduce a new test after this one that tests what you want to test.

This PR should only be adding functionality without breaking any existing user code. I'd be more confident that that's true if the tests showed only additions, with no modifications to existing tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a necessary change in order to make the test pass.

With the code in the current state, passing early_stopping_rounds while leaving early_stopping to its default value (False) does not enable early stopping. Before it did.

If we want to keep the previous behaviour and we want to enable auto early stopping when early_stopping_rounds is set to a value and early_stopping is left to its default value we need to set the default value of early_stopping_rounds to None and to change it in process_parameters to 10 when early_stopping is set to True, which will make the actual default value of early_stopping_rounds a bit hidden in the code. Let me know what you prefer, both options have their own drawbacks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's a problem. And exactly why I have been so strict in asking that we only see new tests added in this PR.

I think that also answers my questions from #5808 (comment)

If the current state of this branch were merged, with this mix of behaviors:

  • early_stopping keyword argument defaulting to False
  • early_stopping=False meaning "do not perform any early stopping"

it would turn off early stopping for all existing code using early stopping with lightgbm.sklearn estimators.

That's a very big user-facing breaking change, and one I do not support. Please, let's keep the scope of this PR limited to what I've written in the title... "add scikit-learn-style API for early stopping".

Specifically, this is the behavior I'm expecting:

  • early_stopping=True:
    • adds a new validation set to eval_set which is a random sample of {validation_fraction} * num_rows rows from the training data
    • enables early stopping, with early_stopping_rounds defaulting to 10 if not provided by any other mechanism
    • does not affect any other relevant early stopping behavior... e.g. if you also passed a dataset to eval_set in fit(...), then that dataset is ALSO used for early stopping
  • early_stopping=False
    • everything lightgbm does is identical to what it does today

The docstring for the early_stopping keyword argument should also be updated to clearly explain this. For example:

        early_stopping : bool, optional (default=False)
            Whether to enable scikit-learn-style early stopping. If set to ``True`, 
            a new validation set will be created by randomly sampling ``validation_fraction`` rows
            from the training data ``X`` passed to ``fit()``. Training will stop if the validation score
            does not improve for a specific number of rounds (controlled by ``n_iter_no_change``).
            This parameter is here for compatibility with ``scikit-learn``'s ``HistGradientBoosting``
            estimators... it does not affect other ``lightgbm``-specific early stopping mechanisms,
            like passing the ``lgb.early_stopping`` callback and validation sets to the ``eval_set``
            argument of `fit()`.

If we want to keep the previous behaviour and we want to enable auto early stopping when early_stopping_rounds is set to a value and early_stopping is left to its default value we need to set the default value of early_stopping_rounds to None

I think this statement is also coming from you trying to tie this new .early_stopping property of the scikit-learn estimators to "whether or not any early stopping happens", and I guess you are referring to this:

if callback._should_enable_early_stopping(params.get("early_stopping_round", 0)):
callbacks_set.add(
callback.early_stopping(
stopping_rounds=params["early_stopping_round"], # type: ignore[arg-type]
first_metric_only=first_metric_only,
min_delta=params.get("early_stopping_min_delta", 0.0),
verbose=_choose_param_value(
main_param_name="verbosity",
params=params,
default_value=1,
).pop("verbosity")
> 0,
)
)

I don't think we should do that. Let's limit the early_stopping keyword argument to the narrow purpose of "whether or not to add 1 randomly-sampled validation set and perform early stopping on it".

Then, the only thing that has to be figured out is how many rounds to to train without improvement before stopping. I gave a proposal for doing that here: #5808 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ClaudioSalvatoreArcidiacono, I'd like to propose that we take a different approach to getting this PR done.

Could I take over this PR and push to it, to get it into a state that I'd feel comfortable merging?

Then:

  1. you review the state and tell me if anything that's left concerns you
  2. after you and I agree on the state, we ask other maintainers to review

This has been taking a really large amount of both your and my time and energy. I think this approach would get us to a resolution faster.

I would only push new commits and merge commits (no force-pushing), so you could see everything I changed and revert any of it.

Can we do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jameslamb, the statement

If the current state of this branch were merged, with this mix of behaviours:

  • early_stopping keyword argument defaulting to False
  • early_stopping=False meaning "do not perform any early stopping"

it would turn off early stopping for all existing code using early stopping with lightgbm.sklearn estimators.

Is not correct. If the current state of this branch were merged it would not enable early stopping automatically. Meaning that passing an early_stopping callback while leaving early_stopping keyword argument defaulting to False will still enable early stopping as it has always had. See for example this other test which is left unchanged and it passes.

The only non backward compatible change introduced at this stage is passing early_stopping_rounds is not sufficient anymore to enable early stopping (if the early_stopping parameter is left to False). I am definetly open to change that.

Regarding your last suggestion, I am extremely thankful for the effort you have invested in this PR and I am sorry to hear that it is taking more effort than you have anticipated. I am putting my best intentions into this PR and I definetly do not want cause any harm to the library or waste any of your precious time unnecessarily.

I would like to give this PR one last attempt, addressing all of your comments and making sure everything is spot on. I will tag you once I think the PR is ready for a last review. If you think that after this last attempt the gap to be filled is still too big, I will step aside and I will happily let you take over and review your code.

Are you fine with it :)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb, In your previous comment you mentioned

  • early_stopping=True:
    • adds a new validation set to eval_set which is a random sample of {validation_fraction} * num_rows rows from the training data
    • enables early stopping, with early_stopping_rounds defaulting to 10 if not provided by any other mechanism
    • does not affect any other relevant early stopping behavior... e.g. if you also passed a dataset to eval_set in fit(...), then that dataset is ALSO used for early stopping.

Do you think that we should create an extra validation set even if a validation set has already been provided in the fit()?

I think that if a validation set has been provided in the fit() and the parameter early_stopping=True then we do not need to create a new validation set for early stopping. What do you think?

q_train = np.loadtxt(str(rank_example_dir / "rank.train.query"))
n_estimators = 5
gbm = lgb.LGBMRanker(n_estimators=n_estimators, random_state=42, verbose=-1)
gbm.fit(X_train, y_train, group=q_train) # Assuming 10 samples in one group
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gbm.fit(X_train, y_train, group=q_train) # Assuming 10 samples in one group
gbm.fit(X_train, y_train, group=q_train)

How does this code comment relate to the code? If it's left over from some earlier debugging, please remove it.

python-package/lightgbm/engine.py Outdated Show resolved Hide resolved
@@ -507,7 +507,10 @@ def __init__(
random_state: Optional[Union[int, np.random.RandomState, np.random.Generator]] = None,
n_jobs: Optional[int] = None,
importance_type: str = "split",
**kwargs: Any,
early_stopping: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
early_stopping: bool = False,
*,
early_stopping: bool = False,

I think we should make these keyword-only arguments, as they are in scikit-learn: I think we should make these keyword-only arguments, as scikit-learn does.

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1689

Could you please try that (in these estimators and the Dask ones)?

I don't want to do that for other existing parameters, to prevent breaking existing user code, but since these are new parameters, it's safe to be stricter.

Comment on lines 845 to 846
if self.early_stopping is not True:
params["early_stopping_round"] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.early_stopping is not True:
params["early_stopping_round"] = None

This looks to me like it might turn off early stopping enabled other ways (like passing early_stopping_round=15 or the lgb.early_stopping() callback + some valid_sets) if keyword argument early_stopping=False. Since early_stopping=False is the default, that'd be a backwards-incompatible change.

The early_stopping keyword argument in this PR is not intended to control ALL early stopping, right? I think it should be limited to controlling the scikit-learn-style early stopping, but that the other mechanisms that people have been using with lightgbm for years should continue to work.

Comment on lines 834 to 835
if hasattr(self, "_n_rows_train") and self._n_rows_train > 10_000:
params["early_stopping_round"] = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that explanation.

Removed the logic about early_stopping_round = "auto" and the corresponding changes where I switched parameters processing and input validation.

Agreed, thank you for removing it.

*early_stopping is an alias for early_stopping_round(s) and n_iter_no_change

Thanks for being careful about backwards compatibility!

It's unfortunate that we have early_stopping as an alias for early_stopping_rounds and that scikit-learn's estimators now assign a different meaning to that :/

// alias = early_stopping_rounds, early_stopping, n_iter_no_change
// desc = will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds
// desc = ``<= 0`` means disable
// desc = can be used to speed up training

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1572-L1575

Seems like that has been in scikit-learn for a long time (since v0.23):

https://github.com/scikit-learn/scikit-learn/blob/6cccd99aee3483eb0f7562afdd3179ccccab0b1d/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py#L1577

I see your implementation of this comment in LGBMModel._process_params():

params = _choose_param_value("early_stopping_round", params, self.n_iter_no_change)

I think I see a path to resolving this. It'll be complex, but it would keep LightGBM in compliance with the rules from https://scikit-learn.org/1.5/developers/develop.html#instantiation.

  • remove keyword argument n_iter_no_change (and property self.n_iter_no_change)... but ensure there are unit tests confirming that passing keyword argument n_iter_no_change through the scikit-learn estimators correctly sets the number of early stopping rounds
  • make keyword argument early_stopping have type Union[bool, int]
  • handle that difference in LGMBModel._process_params(), roughly like this:

rewrite this block:

# use joblib conventions for negative n_jobs, just like scikit-learn
# at predict time, this is handled later due to the order of parameter updates
if stage == "fit":
params = _choose_param_value("num_threads", params, self.n_jobs)
params["num_threads"] = self._process_n_jobs(params["num_threads"])

to something like this:

        if stage == "fit":
            # use joblib conventions for negative n_jobs, just like scikit-learn
            # at predict time, this is handled later due to the order of parameter updates
            params = _choose_param_value("num_threads", params, self.n_jobs)
            params["num_threads"] = self._process_n_jobs(params["num_threads"])

            if not isinstance(self.early_stopping, bool) and isinstance(self.early_stopping, int):
                _log_warning(
                    f"Found 'early_stopping={self.early_stopping}' passed through keyword arguments.
                    "Future versions of 'lightgbm' will not allow this, as scikit-learn expects keyword argument "
                    "'early_stopping' to be a boolean indicating whether or not to perform early stopping with "
                    "a randomly-sampled validation set. To set the number of early stopping rounds, and suppress "
                    "this warning, pass early_stopping_rounds={self.early_stopping} instead."
                 )
                 params["early_stopping_round"] = _choose_param_value(
                     main_param_name="early_stopping_round",
                     params=params,
                     default_value=self.early_stopping
                 )

And then any other places in the code where you have self.early_stopping is True should use isinstance(self.early_stopping, bool) and self.early_stopping is True, to avoid this SyntaxWarning:

1 is True
# <stdin>:1: SyntaxWarning: "is" with 'int' literal. Did you mean "=="?

I think that can work and provide backwards compatibility while allowing us to match scikit-learn's API. And then eventually (after several lightgbm releases), that warning could be converted to an error and the type of early_stopping could be narrowed to just bool.

What do you think?

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
Comment on lines 1035 to 1036
n_splits = max(int(np.ceil(1 / self.validation_fraction)), 2)
stratified = isinstance(self, LGBMClassifier)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a huge fan of how the validation set is created from the train set using the _make_n_folds function.

if 1/validation_fraction is not an integer, the result will be that the actual validation set size will not match the validation fraction specified by the user.

For example, if the validation fraction is 0.4 the number of splits calculated here will be 2, which will result in a fraction of 0.5, instead of 0.4.

Using something like train_test_split from scikit-learn would solve the issue for the classification and the regression case, but for ranking tasks our best option is GroupShuffleSplit, which will inevitably suffer from the same issue expressed above. The options that I thought to solve this issue are:

  1. Leave the code as-is and raise a warning when 1/validation_fraction is not an integer.
  2. Use train_test_split for creating the validation set in the classification and regression cases; Raise a warning when 1/validation_fraction is not an integer in the ranking case.

I would lean more towards option 2, but this will make the MR bigger.

@jameslamb I would like to hear your opinion on it, do you perhaps already have something else in mind?

@@ -507,6 +507,9 @@ def __init__(
random_state: Optional[Union[int, np.random.RandomState, np.random.Generator]] = None,
n_jobs: Optional[int] = None,
importance_type: str = "split",
*,
early_stopping: Union[bool, int] = False,
validation_fraction: float = 0.1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb, I have removed the option of having validation_fraction=None and performing early stopping on the train set.

Early stopping in lightgbm is disabled when the validation set equals the train set and I did not want to alter that behavior in this PR, so I have decided to deviate a little from the scikit-learn implementation of auto early stopping and to not allow validation_fraction=None.

Let me know your thoughts on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Auto early stopping in Sklearn API
4 participants