Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python api can't continue train with binary file data #4311

Closed
papillonyi opened this issue May 21, 2021 · 4 comments
Closed

python api can't continue train with binary file data #4311

papillonyi opened this issue May 21, 2021 · 4 comments
Labels

Comments

@papillonyi
Copy link

Description

when i tried to continue train with binary file data, it raise a error
image
I train with a binary file with 10 iterations first and it success;
then, i try to train with init_model=last_train_result, and it raise such error;

And i try do the same thing with lightgbm cli, it works

Reproducible example

here is my code
image

Environment info

th version of lightGBM i used is 3.1.1 in linux

Command(s) you used to install LightGBM

Additional Comments

@jameslamb jameslamb added the bug label May 24, 2021
@StrikerRUS StrikerRUS mentioned this issue Jul 12, 2021
21 tasks
@jameslamb
Copy link
Collaborator

jameslamb commented Jan 2, 2022

@papillonyi thanks very much for using LightGBM. sorry for the very long delay in responding to this!

In the future, please do not post logs, error messages, or code as screenshots. Post such things as text instead, so others facing the same challenges can find this discussion from search engines.

Providing a minimal, reproducible example that can be easily copied and run by others would also make it much more likely that issues will be addressed quickly.


I tried to create a minimal, reproducible example based on the provided code snippet, using the Python package as of the latest commit on master (af5b40e).

The code below produces the reported error.

import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=1_000, n_informative=5)

data_file_name = "output.bin"
model_file_name = "model-temp.text"
params = {
    "boosting_type": "gbdt",
    "verbose": 1,
    "deterministic": True,
    "objective": "regression"
}

# create dataset and save it to binary file
lgb.Dataset(data=X, label=y).save_binary(data_file_name)

[LightGBM] [Info] Saving data to binary file output.bin

# train for 10 iterations on data from file
dtrain = lgb.Dataset(data_file_name)
booster = lgb.train(
    params=params,
    train_set=dtrain,
    num_boost_round=10
)
booster.save_model(model_file_name)

[LightGBM] [Info] Load from binary file output.bin
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002147 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 25500
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 100
[LightGBM] [Info] Start training from score 0.505056

# clear booster and dataset
del booster
del dtrain

# try to containue training for 10 more rounds on
# model and Dataset from file
dtrain = lgb.Dataset(data_file_name)

booster = lgb.train(
    params=params,
    train_set=dtrain,
    init_model=model_file_name,
    num_boost_round=10
)

[LightGBM] [Info] Load from binary file output.bin
[LightGBM] [Fatal] Unknown format of training data. Only CSV, TSV, and LibSVM (zero-based) formatted text files are supported.

stack trace (click me)
---------------------------------------------------------------------------
LightGBMError                             Traceback (most recent call last)
/tmp/ipykernel_91/4239506626.py in <module>
----> 1 booster = lgb.train(
      2     params=params,
      3     train_set=dtrain.construct(),
      4     init_model=model_file_name,
      5     num_boost_round=10

/opt/conda/lib/python3.8/site-packages/lightgbm/engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, keep_training_booster, callbacks)
    159         raise TypeError("Training only accepts Dataset object")
    160 
--> 161     train_set._update_params(params) \
    162              ._set_predictor(predictor) \
    163              .set_feature_name(feature_name) \

/opt/conda/lib/python3.8/site-packages/lightgbm/basic.py in _set_predictor(self, predictor)
   2068         elif self.data is not None:
   2069             self._predictor = predictor
-> 2070             self._set_init_score_by_predictor(self._predictor, self.data)
   2071         elif self.used_indices is not None and self.reference is not None and self.reference.data is not None:
   2072             self._predictor = predictor

/opt/conda/lib/python3.8/site-packages/lightgbm/basic.py in _set_init_score_by_predictor(self, predictor, data, used_indices)
   1385         num_data = self.num_data()
   1386         if predictor is not None:
-> 1387             init_score = predictor.predict(data,
   1388                                            raw_score=True,
   1389                                            data_has_header=data_has_header,

/opt/conda/lib/python3.8/site-packages/lightgbm/basic.py in predict(self, data, start_iteration, num_iteration, raw_score, pred_leaf, pred_contrib, data_has_header, is_reshape)
    782         if isinstance(data, (str, Path)):
    783             with _TempFile() as f:
--> 784                 _safe_call(_LIB.LGBM_BoosterPredictForFile(
    785                     self.handle,
    786                     c_str(str(data)),

/opt/conda/lib/python3.8/site-packages/lightgbm/basic.py in _safe_call(ret)
    125     """
    126     if ret != 0:
--> 127         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    128 
    129 

LightGBMError: Unknown format of training data. Only CSV, TSV, and LibSVM (zero-based) formatted text files are supported.

@jameslamb
Copy link
Collaborator

Looking at the stacktrace, this is coming from a call to LGBM_BoosterPredictForFile() inside `Booster.predict(), here

_safe_call(_LIB.LGBM_BoosterPredictForFile(

LGBM_BoosterPredictForFile() cannot be used with LightGBM Dataset binary files at the moment.

Related conversations:


I try to do the same thing with lightgbm cli, it works

I was able to replicate that behavior as well, and confirm that the CLI does support training continuation using a Dataset binary file

First, I installed the Python package and built the CLI.

cd python-package
python setup.py install
cd ..

mkdir build
cd build
cmake ..
make -j2
cd ..

Then I ran the following Python code to generate the Dataset binary file and initial model.

import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=1_000, n_informative=5)

data_file_name = "output.bin"
model_file_name = "model-temp.text"
params = {
    "boosting_type": "gbdt",
    "verbose": 1,
    "deterministic": True,
    "objective": "regression"
}

# create dataset and save it to binary file
lgb.Dataset(data=X, label=y).save_binary(data_file_name)

# train for 10 iterations on data from file
dtrain = lgb.Dataset(data_file_name)
booster = lgb.train(
    params=params,
    train_set=dtrain,
    num_boost_round=10
)
booster.save_model(model_file_name)

Checked that the model produced had exactly 10 trees.

cat model-temp.text | grep 'Tree=' | tail -1

Tree=9

Next, I created a file train.conf with configuration for the CLI.

task = train
objective = regression
data = output.bin
num_trees = 7
output_model = model-from-cli.txt
input_model = model-temp.text

Next, ran training with the CLI and checked that a new model file was produced, with 17 total trees (10 from initial training, 7 from training continuation).

./lightgbm config=train.conf
cat model-from-cli.txt | grep 'Tree=' | tail -1

Tree=16

@jameslamb
Copy link
Collaborator

I think at this point we should convert this issue into a feature request like "[python] support training continuation using a text model file and binary Dataset file" (and probably [R-package]), what do you think @shiyu1994 @StrikerRUS ?

And that solutions for supporting that might be one of the following:

@jameslamb
Copy link
Collaborator

Going through old issues today, I realize that this and #6144 describe exactly the same problem.

Since #6144 is more recent, I'm cross-linking these two and closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants