Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear split info buffer in cost efficient gradient boosting before every iteration (fix partially #3679) #5164

Merged
merged 9 commits into from
Jun 8, 2022

Conversation

shiyu1994
Copy link
Collaborator

@shiyu1994 shiyu1994 commented Apr 21, 2022

This is to fix the case provided by @mshivers in #3679. It turns out that the bug is not related to the core algorithm of LightGBM, but due to the cost-efficient gradient boosting module.

In the code snippet below, the best split of SerialTreeLearner can be updated by CostEfficientGradientBoosting,

void UpdateLeafBestSplits(Tree* tree, int best_leaf,
const SplitInfo* best_split_info,
std::vector<SplitInfo>* best_split_per_leaf) {
auto config = tree_learner_->config_;
auto train_data = tree_learner_->train_data_;
const int inner_feature_index =
train_data->InnerFeatureIndex(best_split_info->feature);
auto& ref_best_split_per_leaf = *best_split_per_leaf;
if (!config->cegb_penalty_feature_coupled.empty() &&
!is_feature_used_in_split_[inner_feature_index]) {
is_feature_used_in_split_[inner_feature_index] = true;
for (int i = 0; i < tree->num_leaves(); ++i) {
if (i == best_leaf) continue;
auto split = &splits_per_leaf_[static_cast<size_t>(i) *
train_data->num_features() +
inner_feature_index];
split->gain +=
config->cegb_tradeoff *
config->cegb_penalty_feature_coupled[best_split_info->feature];
// Avoid to update the leaf that cannot split
if (ref_best_split_per_leaf[i].gain > kMinScore &&
*split > ref_best_split_per_leaf[i]) {
ref_best_split_per_leaf[i] = *split;
}
}
}

However, CostEfficientGradientBoosting did not clear its buffer of splits splits_per_leaf_ before a new boosting iteration starts, which causes it to contain splits from previous trees. And these splits with wrong split information (sum of gradients and hessians, gain, etc.) will be mixed into the current tree.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you accidentally tag the wrong issue in the description? This doesn't seem related to #4969 at all.

Also, is it possible to create a test that reproduces the bug this PR addresses, to ensure this fix is working and prevent it from being accidentally re-introduced in the future?

@shiyu1994 shiyu1994 changed the title Clear split info buffer in cost efficient gradient boosting before every iteration (fix #4969) Clear split info buffer in cost efficient gradient boosting before every iteration (fix #4946) Apr 22, 2022
@shiyu1994
Copy link
Collaborator Author

Did you accidentally tag the wrong issue in the description? This doesn't seem related to #4969 at all.

Sorry, it should be #4946. Corrected.

@bluesummerv
Copy link

bluesummerv commented Apr 24, 2022

n1 = 800000
n2 = 2200
x = np.random.random((n1,n2))
y= np.random.random((n1))
model = lgb.LGBMRegressor(num_leaves=100, n_estimators=20, device='gpu')
model.fix(x,y)

Segmentation fault (core dumped)

Thanks! @shiyu1994

@shiyu1994
Copy link
Collaborator Author

@bluesummerv Thanks for using LightGBM. It seems that your example is not related with this PR. Could you please open a new issue for your bug report?

@shiyu1994 shiyu1994 changed the title Clear split info buffer in cost efficient gradient boosting before every iteration (fix #4946) Clear split info buffer in cost efficient gradient boosting before every iteration (partially fix #3679) May 5, 2022
@shiyu1994
Copy link
Collaborator Author

Sorry, it should be #4946. Corrected.

More accurately, this PR only fixes the case provided by @mshivers in #3679.

@shiyu1994
Copy link
Collaborator Author

shiyu1994 commented May 5, 2022

Also, is it possible to create a test that reproduces the bug this PR addresses, to ensure this fix is working and prevent it from being accidentally re-introduced in the future?

@jameslamb Thanks for the suggestion. But TBH I don't think we need a test case for this fix. It just fixes a logical mistake in the code of CEGB, which is not a corner case that is likely to be introduced again by future modifications. I think it makes sense to add test cases to guarantee the code to work in some tricky scenarios. But perhaps it is not necessary to add a test case for obvious logical mistakes in programming. WDYT?

@jameslamb
Copy link
Collaborator

WDYT?

I believe a test should be added in this PR.

I strongly believe that PRs which fix bugs in software should also introduce tests confirming that the software no longer exhibits those bugs, unless the effort involved in creating the tests or the cost of running the tests is too large.

Given that we have already have a clear reproducible example of a bug in LightGBM (#3679 (comment)), the effort required to convert it to a test case should not be too much, and I think it is worth it for the increased confidence that this bug won't be reintroduced. That test would also give us some additional code coverage of using CEGB, which isn't tested thoroughly in the project today.

In my opinion, tests should not just be used for "corner cases" or "tricky scenarios". Tests should try to cover the ways that users might use the software, and check that the software behaves as those uses would expect it to. Testing gives us confidence that once a bug is fixed, it won't be reintroduced.

This project is very large and there are many, many possible code paths based on e.g. different combinations of training parameters, compilation options, data characteristics, operating systems, etc. That set of combinations is far too large for us to just rely on maintainers' knowledge and pull request reviews to catch regressions.

@shiyu1994
Copy link
Collaborator Author

@jameslamb Done with adding the test case. Please check.

@StrikerRUS
Copy link
Collaborator

@guolinke This is very important bugfix for our long living bug. Could you please help to review it?

@shiyu1994
Copy link
Collaborator Author

bugfix for our long living bug

Note that this only fixes the case when CEGB is used (that's why the word partially is used in the title of this PR). I need further verification to see if the same bug (Check failed: (best_split_info.left_count) > (0)) still happens in normal cases.

@StrikerRUS
Copy link
Collaborator

Note that this only fixes the case when CEGB is used (that's why the word partially is used in the title of this PR).

Thanks for the clarification!

However, GitHub autoclosing mechanism doesn't understand this and will close #3679 after merging this PR.
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
I propose changing the title to "fix partially #3679" to not close that issue automatically.

@StrikerRUS StrikerRUS changed the title Clear split info buffer in cost efficient gradient boosting before every iteration (partially fix #3679) Clear split info buffer in cost efficient gradient boosting before every iteration (fix partially #3679) May 10, 2022
tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved
tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved
tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved
tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
@shiyu1994 shiyu1994 requested a review from StrikerRUS June 1, 2022 14:15
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@shiyu1994
Copy link
Collaborator Author

@StrikerRUS

LGTM, thanks!

Could you please remove the change requests so that we can merge this RP? Currently it is blocked by the change requests. Thanks!

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove the change requests

Looks like that's from my initial review.

image

I've removed it here. Thanks for the fix!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants