-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] Add support for early stopping in Dask interface #3712
Comments
Closing in favor of being in #2302 with other feature requests. Please leave a comment here if you'd like to work on this. |
sure, thank you! I'm really close to having a small reproducible example for the random "cannot bind to port XXXX" issue, will link that here when I've written it up. |
Omigod lifesaver, thank you! |
…twork (fixes #3753) (#3766) * starting work * fixed port-binding issue on localhost * minor cleanup * updates * getting closer * definitely working for LocalCluster * it works, it works * docs * add tests * removing testing-only files * linting * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * remove duplicated code * remove unnecessary listen() Co-authored-by: Nikita Titov <[email protected]>
@ffineis do you think you'll have time to work on this this week? We're planning to do a 3.2.0 release of LightGBM in the next week or two. I didn't include this in my list of must-have Dask features for the next release (#3872 (comment)), but I'd love to try to get this change in if we can since it can have such a big impact on training runtime. If you don't have time this week, could I take this back from you and try it out? Thanks so much for all your help with the Dask module so far!! |
Hey! Yes sorry I’ve started planning for this, gimme another week? The only
tricky part of matching xgboost.dask’s implementation is the id(training
data) check. So yeah, sorry haven’t made any progress yet but was planning
to make some commits this week.
…On Sat, Feb 6, 2021 at 5:19 PM James Lamb ***@***.***> wrote:
@ffineis <https://github.com/ffineis> do you think you'll have time to
work on this this week? We're planning to do a 3.2.0 release of LightGBM in
the next week or two. I didn't include this in my list of must-have Dask
features for the next release (#3872 (comment)
<#3872 (comment)>),
but I'd love to try to get this change in if we can since it can have such
a big impact on training runtime.
If you don't have time this week, could I take this back from you and try
it out?
Thanks so much for all your help with the Dask module so far!!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3712 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACZIXCNZCJFB4DG5IQV7SCTS5XFBPANCNFSM4VSHUQIQ>
.
|
oh yeah no problem, thanks! |
Hi. I've been playing around with this and have a working (although horrible) implementation of this. The approach I took was using the futures of the persisted collections instead of turning the collections to lists of delayeds, so this avoids recomputing the training set in case it is in the |
Interesting! I'll leave it to @ffineis to comment on that. Right now, I think our highest priority is supporting early stopping, and it would be ok if the first implementation of that merged to |
Hey @jmoralez, thanks for the ideation! Honestly, don't feel bad, I think any implementation of ES will be pretty hairy given that we're using lists of delayed partitions instead of distributed lgbm.Datasets. I'm a fan of how xgboost.dask attempts to accomplish what you've mention via This method works if the entirety of an eval |
Yeah that sounds fair. Is there a plan to create an |
yes but I haven't written it up. Will do that right now. I like what |
Closing for now due to the lack of active work on this feature and open PRs. |
Summary
DaskLGBMClassifier
andDaskLGBMRegressor
in the Python package should support early stopping.Motivation
Early stopping is generally useful with gradient boosting algorithms, to avoid wasted training iterations or unnecessary growth in the model size once desirable performance has been achieved. This feature is available in the non-Dask interfaces for LightGBM, and should be available with the Dask one.
Description
This should mimic the approach XGBoost took (https://github.com/dmlc/xgboost/blob/516a93d25c3b6899558700430ffc99a29ea21e1a/python-package/xgboost/dask.py#L1386), where
eval_set
contains Dask collection s(Dask Array or Dask DataFrame).References
The text was updated successfully, but these errors were encountered: