-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Early stopping not reproducible when nthreads>1 #5758
Comments
Thanks for using LightGBM, and for your detailed report. Sorry it took so long for someone to respond here. I don't support changing the behavior of early stopping in the Python package in the way you're proposing, which I believe is:
In my opinion, the risk of bugs and maintenance burden due to added complexity introduced by that type of change in an already-complex part of the codebase introduces isn't worth it in exchange for improving the reproducibility of The behavior you've observed is only because you're passing With If you want close-to-reproducible behavior from
If you want to treat very small improvements in eval metrics as "not actually an improvement", for the purpose of early stopping, use the
|
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
Training models with exactly the same setting (including seed) can result in different numbers of trees, when setting nthreads>1 and early_stopping_rounds>0. I found a workaround by modifying the python package, but not sure if this problem only occurs in python.
Reproducible example
The example is in Python. The result is not 100% reproducible because the randomness comes from multithreading, but models with different numbers of trees do happen frequently.
Code:
Output:
The irreproducible problem will go away when setting nthread=1
Output:
Environment info
LightGBM version or commit hash: 3.3.5
Command(s) you used to install LightGBM
Additional Comments
After a closer look at the trees built in each model, we found why the early stop happens at different number of trees. At some iteration, the training function may fail to grow a tree because it cannot find any split. Theoretically, the evaluation result of this iteration should be the same as last iteration because the model remains the same. However, because the evaluation is done with multithreading, the evaluation result may change by some numeric error. As a result an early stop that should have happened may fail to happen, because it sees an improvement in the evaluation result, which is actually just numeric error.
This problem can be fix by checking both improvement in evaluation result and increase in number of trees when checking for early stop. It should consider an iteration to be better than a previous one only if both the evaluation improves and number of trees increases. For example, I fixed it by modifying the early stop call back https://github.com/microsoft/LightGBM/blob/v3.3.5/python-package/lightgbm/callback.py#L254
to the following, where in the beggining I set current_iter = [-1]:
The text was updated successfully, but these errors were encountered: