-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Can't retrieve best_iteration after Optuna optimization #6384
Comments
Thanks for using LightGBM. We'd be happy to help you, but would really appreciate if you could reduce this to a smaller, self-contained example that demonstrates the issue. Consider the strategies in this guide: https://stackoverflow.com/help/minimal-reproducible-example. For example:
We'd really appreciate if you could, for example, create an example that could be copied and pasted with 0 modification by someone trying to help you. For example, start from this for binary classification: import lightgbm as lgb
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=10_000, centers=[[-4, -4], [-4, 4]]) And then fill in the least additional code necessary to show the problem you're asking for help with. |
@jameslamb Really appreciate for your useful advice! I have updated my issue and removed unnecessary code, the newly updated code can be run directly without any modification. |
Thank you so much for that! One of us will try to look into this soon and help. If you find anything else while investigating, please post it here. |
Thanks for the help! I do have found something might be helpful. When I added a Code: def objective():
# other code
print('Inner best iteration',model.best_iteration)
# other code
trial = study.best_trial
best_model=study.user_attrs['best_booster']
print('Outer best iteration',best_model.best_iteration) Output:
|
Interesting! I might be able to provide more information on that later. Also, I just noticed you've double-posted this on Stack Overflow as well: https://stackoverflow.com/questions/78223783/cant-retrieve-best-iteration-in-lightgbm. Please don't do that. Maintainers here also monitor the |
Oops. Sorry for the inconvenience! I will delete the double-posted stackflow question right away. |
Since this is done at the end of training LightGBM/python-package/lightgbm/engine.py Lines 299 to 300 in 501ce1c
I believe |
@jmoralez Thanks to your reply. Sorry I might not fully understand. As you said, Did you mean that the model returned from Assuming that the |
I meant I didn't know why that was removed but that you could use the current_iteration instead. Looking a bit closer at your example you try to get the attribute from the study, not the trial. Can you try the following instead? best_model = study.best_trial.user_attrs['best_booster'] |
I tried, but it still not working. best_model_1 = study.best_trial.user_attrs['best_booster']
print('===BEST MODEL 1===')
print('Best iteration',best_model_1.best_iteration)
print('Current iteration', best_model_1.current_iteration())
print('Memory ID with best_model_1:', id(best_model_1))
best_model_2 = study.user_attrs['best_booster']
print('===BEST MODEL 2===')
print('Best iteration',best_model_2.best_iteration)
print('Current iteration', best_model_2.current_iteration())
print('Memory ID with best_model_2:', id(best_model_2)) Output
It shows that |
I think this is a question for the optuna folks, the only place I see where we set best iteration to -1 is in the LightGBM/python-package/lightgbm/basic.py Line 3583 in 28536a0
I don't know what they do to the user attributes that would result in that line being run. I'd still suggest to use the |
Thank you very much for you patience. Set best iteration inside objective function was a great idea! Actually retrieving the exact value of best iteration was no longer a problem for me since I will issue this problem to Optuna later and keep updating here. |
Hi from the optuna community. When I did !conda install lightgbm
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
import optuna
from lightgbm import early_stopping
dataset = load_breast_cancer()
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)
def objective(trial, train_set, valid_set, num_iterations):
params = {
'objective':'binary',
'metric': ['auc'],
'verbosity':-1,
'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.5)
}
pruning_callback = optuna_integration.LightGBMPruningCallback(trial, 'auc', valid_name='valid_set')
model = lgb.train(
params,
num_boost_round=num_iterations,
train_set=train_set,
valid_sets=[train_set, valid_set],
valid_names=['train_set', 'valid_set'],
callbacks=[pruning_callback, early_stopping(50)]
)
print(model.best_iteration, copy.copy(model).best_iteration, copy.deepcopy(model).best_iteration)
prob_pred = model.predict(x_test, num_iteration=model.best_iteration)
return roc_auc_score(y_test, prob_pred, labels=[0,1])
train_set = lgb.Dataset(x_train, label=y_train)
valid_set = lgb.Dataset(x_test, label=y_test)
func = lambda trial: objective(trial=trial, train_set=train_set, valid_set=valid_set, num_iterations=num_iterations)
num_iterations = 100
study = optuna.create_study(
pruner=optuna.pruners.HyperbandPruner(),
direction='maximize'
)
study.optimize(func, n_trials=1) The output looks like
|
If you suspect this is a LightGBM issue, and if you're familiar with |
Sorry for interrupting your conversation, I think I might found a possible reproducible example without using Optuna. from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
from lightgbm import early_stopping
import copy
dataset = load_breast_cancer()
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)
train_set = lgb.Dataset(x_train, label=y_train)
valid_set = lgb.Dataset(x_test, label=y_test)
params = {
'objective':'binary',
'metric': ['auc'],
'verbosity':-1,
'num_iteration': 100
}
model = lgb.train(
params,
train_set=train_set,
valid_sets=[train_set, valid_set],
valid_names=['train_set', 'valid_set'],
callbacks=[early_stopping(50)]
)
print(model.best_iteration, copy.copy(model).best_iteration, copy.deepcopy(model).best_iteration) Output
|
I just checked the source code of LightGBM/python-package/lightgbm/basic.py Lines 2841 to 2847 in 0c0eb2a
Seems like the LightGBM/python-package/lightgbm/basic.py Lines 3587 to 3643 in 0c0eb2a
LightGBM/python-package/lightgbm/basic.py Lines 2705 to 2733 in 0c0eb2a
Pardon me if I am wrong. |
Oh I forgot about the copy. Linking #5539, which is similar. |
Thank you for your information. Perhaps the model's attributes(e.g. |
Environment info
LightGBM version or commit hash: 4.1.0
Optuna version:3.6.0
Optuna_Integration version:3.6.0
Command(s) you used to install LightGBM
Description
Hi there, I am new to LightGBM and currently I can't find any useful solutions from google/stackoverflow/github issues, so I wonder if posting a new issue would be helpful, pardon me for the inappropriate behavior since I'm using 'issue' to ask a 'question'.
Here's my problem:
I was using Optuna to optimize my LightGBM model. At the same time I was using LightGBM callbacks
early_stopping(50)
to early stop the iterations. I have set the best model in loops of optimization, and retrieved the best model(best booster) from theuser_attr
. Since theearly_stopping
callbacks was set, the training output logs showed some content like this below:Assuming that the auc value above
valid_set's auc: 0.874471
was indeed the best value from all iterations, the best_iteration should be[30]
as showed above.However, I got
-1
from invokingbest_model.best_iteration
like this below:My question is: How can I get the correct
best_iteration
value from the best model retrieved fromstudy
object?Thanks to whom may solving my problem!
Looking forward to your reply :)
Reproducible example
The text was updated successfully, but these errors were encountered: