-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] respect 'verbose' argument in lgb.cv() (fixes #4667) #4903
Merged
Merged
Changes from 8 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ad6a293
fixes
jameslamb ff5200b
revert debugging code
jameslamb 0bbbbe6
add test
jameslamb cfad4e5
check for LightGBM explicitly
jameslamb b3e456f
empty commit
jameslamb e56aff3
Merge branch 'master' into fix/r-lgb-cv-verbose
jameslamb aa06119
merge master
jameslamb 5bac0a5
revert unnecessary line deletion
jameslamb 3fdcfb4
Merge branch 'master' into fix/r-lgb-cv-verbose
jameslamb 619eaf4
respect verbose everywhere and update params for constructted dataset
jameslamb c307809
Update R-package/R/lgb.Dataset.R
jameslamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: I don't totally understand why this change fixes #4667. At first I was worried that this function might suffer from a larger problem like "
lgb.cv()
does not respect values passed toparams
", but that is not true.Consider the following example. Setting
min_gain_to_split
to a large number for the binary objective results in no splits, as expected.The logs are full of the following
Changing to
min_gain_to_split = 0.0
, those warnings go away and training produces trees with multiple leaves.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this related to the lack of corresponding code lines in R-package?
LightGBM/python-package/lightgbm/engine.py
Lines 222 to 225 in 2ef3cb8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I'm not really sure. The code in
lgb.train()
andlgb.cv()
in the R package is very similar, andlgb.train()
doesn't have this issue with the parameterverbose
.Anyway, I think this fix is still worth merging even if it isn't fully understood...I don't think adding another
$update_params()
with the passed-in params introduces any risk.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. But this introduces performance degradation because this method calls C API function under the hood and is being executed for each fold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I think that in this case, the cost of calling this method once per fold (one time at set up, not on each iteration, just to be clear) is worth it in exchange for ensuring that
lgb.cv()
's behavior is correct.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm letting you to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok looking into this more closely tonight, I think I figured out:
verbose
is not being respectedverbose
is not being respected and all other parameters are working correctlyShort Summary
Two bugs in the R package:
Dataset$slice()
callsLGBM_DatasetGetSubset()
, which will globally set the log level to INFO ifverbosity
isn't in the parameters passed to it. Andverbosity
is never passed through.Dataset$update_params(new_params)
is called on aDataset
that has already been constructed, it doesn't storenew_params
in memory in the R object (private$params
)I just pushed 619eaf4 which reverts the addition of
booster$reset_parameter()
and adds fixes for these two bugs.Much Longer Summary
I was able to reduce the internals of
lgb.cv()
to the following code that helps narrow down the issue.sample code (click me)
root cause: The use of
slice()
on aDataset
resets LightGBM's log level globally to the default value (INFO).More details:
Dataset$slice()
gets params fromDataset$get_params()
LightGBM/R-package/R/lgb.Dataset.R
Line 538 in af5b40e
Dataset$get_params()
filters down to only Dataset-specific parameters.LightGBM/R-package/R/lgb.Dataset.R
Lines 583 to 592 in af5b40e
verbose
is not in the Dataset-specific parameters.LightGBM/R-package/R/aliases.R
Line 9 in af5b40e
Dataset$slice()
eventually callsDataset$construct()
which callsLGBM_DatasetGetSubset()
, which runsconfig.Set()
LightGBM/src/c_api.cpp
Line 1407 in af5b40e
config.Set()
callsLog::ResetLogLevel()
which resets log level globally for the current process.LightGBM/src/io/config.cpp
Lines 241 to 249 in af5b40e
verbosity
isn't found inparams
, it defaults to INFO-levelLightGBM/include/LightGBM/config.h
Line 543 in af5b40e
Dataset.subset()
in the Python package doesn't suffer from this issue because it referencesself.params
(the dictionary of parameters stored as an attribute on theDataset
instance) instead of usingDataset.get_params()
, soverbose
is not filtered out.LightGBM/python-package/lightgbm/basic.py
Line 1860 in af5b40e
At first, I tried to just fix this by adding
"verbosity"
to the list of parameters returned byDataset$get_params()
. Unfortunately, that can lead to a new type of problem where the value ofverbosity
stored in a constructed Dataset will take precedence over values passed tolgb.train()
/lgb.cv()
if that Dataset is re-used multiple times.LightGBM/R-package/R/lgb.Booster.R
Line 42 in af5b40e
Next, I tried just changing the use of
Dataset$get_params()
to useDataset$private$params
inDataset$slice()
. That uncovered another bug. That still suffered from the problem that once the training Dataset was constructed, verbosity was locked in to the value from the last time it was constructed.Dataset$construct()
, if it's already been constructed then R callsLGBM_DatasetUpdateParamChecking()
LightGBM/R-package/R/lgb.Dataset.R
Line 562 in af5b40e
Dataset$construct()
does the right thing based on whether or not raw data is still available. However, if that call does NOT fail,private$params
is not updated!LightGBM/R-package/R/lgb.Dataset.R
Lines 559 to 577 in af5b40e