Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Booster teardown appears to interfere with other Boosters in process #6741

Open
jameslamb opened this issue Dec 9, 2024 · 1 comment

Comments

@jameslamb
Copy link
Collaborator

Description

While trying to set up a CI job to test {lightgbm} against its reverse dependencies (#6734), I found that a {misspi} example using {lightgbm} was persistently and reproducibly failing.

I'm not sure exactly why yet, but to me it looks like either a bug in {lightgbm} or at least something we should try to understand better.

Reproducible example

Using {lightgbm} 4.5.0 or the latest development version (d4d6c87) and {misspi} 0.1.0 (the latest version on CRAN as of this writing):

library(misspi)

data(toxicity, package = "misspi")
set.seed(0)
toxicity.miss <- missar(toxicity, 0.4, 0.2)
out <- misspi(toxicity.miss, viselect = 128, ncore = 1)

This is a minimized code snippet taken directly from {misspi}'s examples (catstats/misspi/R/misspi.R).

This fails like this:

task 97 failed - "Forced splits file includes feature index 0, but maximum feature index in dataset is -1"

That comes from here:

void GBDT::CheckForcedSplitFeatures() {
std::queue<Json> forced_split_nodes;
forced_split_nodes.push(forced_splits_json_);
while (!forced_split_nodes.empty()) {
Json node = forced_split_nodes.front();
forced_split_nodes.pop();
const int feature_index = node["feature"].int_value();
if (feature_index > max_feature_idx_) {
Log::Fatal("Forced splits file includes feature index %d, but maximum feature index in dataset is %d",
feature_index, max_feature_idx_);
}

... but {misspi} is not using forced splits at all.

That error is happening somewhere around here: https://github.com/catstats/misspi/blob/6cd240dfda151e682866f3221ba30cafe1943c49/R/misspi.R#L388

In a block that (in pseudocode) looks like this:

cl <- makeCluster(ncores)
doSNOW::registerDoSNOW(cl)
x.imputed.new.tmp <- foreach(i = column_indices, .combine = "cbind") %dopar% {
  # train a LightGBM model predicting each column as a function of all other columns
  obs.data.lgb <- lightgbm::lgb.Dataset(data = x.imputed[-miss.row.id, active.set], label = x.imputed[-miss.row.id, i])
  model <- lightgbm::lgb.train(params=params, data=obs.data.lgb, verbose=-1, ...)
  predict.miss <- predict(model, x.imputed[miss.row.id, active.set, drop=F])
}

It's trying to train one model per column in an input dataset, with that column as the target and all others as features, to be used in missing-value imputation.

I strongly suspect that this error about forced splits is actually a symptom of a problem like "when one of the Boosters is torn down, it tears down other process-global state and that affects on of the other trainig processes".

Environment info

LightGBM version or commit hash: 4.5.0 or the latest development version (d4d6c87)

Command(s) you used to install LightGBM

sh build-cran-package.sh --no-build-vignettes
R CMD INSTALL --with-keep.source ./lightgbm_*.tar.gz

Additional Comments

cc @catstats for awareness

Will this be a problem for CRAN submission?

It shouldn't be. Notice that this is reproducible with {lightgbm} 4.5.0 (the latest version on CRAN, on CRAN since July 2024) and {misspi} 0.1.0 (latest version on CRAN, on CRAN since October 2023). So CRAN must not be checking this example in its checks.

How to close this

  1. reduce this to an example that doesn't use {misspi} and figure out the root cause
  2. decide what (if any) next steps {lightgbm} should take to support this pattern
  3. implement those next steps
@jmoralez
Copy link
Collaborator

jmoralez commented Dec 9, 2024

I've had this happen to me in python when the features are somehow left empty, here's a minimal example:

import lightgbm as lgb
import numpy as np

X = np.array(100 * [[]])  # has shape (100, 0)
y = np.random.rand(100)
ds = lgb.Dataset(X, y)
lgb.train({}, ds)
# LightGBMError: Forced splits file includes feature index 0, but maximum feature index in dataset is -1

Maybe that's what's happening in {misspi}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants