-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot construct a valid set refer to a train set with max_bin != 255 #6159
Comments
Thanks for using LightGBM. The reproducible example you provided isn't reproducible. For example:
You could help reduce the effort required to answer this question by addressing those concerns and providing a minimal, reproducible example. If you haven't done that before and are unsure where to start, see:
Can you please provide such details or explain why that's not possible? |
Sorry to have provided unreproducible pseudocode. Here is the reproducible one: import numpy as np
import lightgbm as lgb
lgb_random = lgb.Dataset(np.random.rand(10000, 100), np.random.rand(10000, 1), params={'max_bin':15})
lgb_random.save_binary('random.bin')
del lgb_random
val = lgb.Dataset(np.random.rand(1000, 100), np.random.rand(1000, 1), reference=lgb.Dataset('random.bin'))
val.save_binary('val.bin') In other words, any random data can reproduce my problem. I wonder that why the |
@aslongaspossible Thanks for reporting this issue. It seems that by default the A quick fix would be add Still, I agree that this is not convenient. Since the binary file should ideally contain all the information to reconstruct the preprocessed dataset. Will look into how to fix this laster. |
Thanks @aslongaspossible for providing a reproducible example. Given that, I see the issue and agree with @shiyu1994 's recommendation. To facilitate that in the future, until LightGBM provides more convenient behavior, consider storing Dataset parameters alongside wherever you store the
@shiyu1994 @aslongaspossible please see #4904 (comment) where I described this exact issue in detail. I think we have a path forward, but haven't as yet had anyone take up implementing it: #4904 (comment) |
This works. Thank you! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
When I want to construct a valid set refer to a train set with max_bin=15, it raises "Dataset max_bin 15 != config 255". Seems that I can never create a valid set with max_bin != 255?
Reproducible example
Where
val_dataframe
are features of valid set,val_label
are labels,'train_bin'
are saved train set oflgb.Dataset
in binary withmax_bin=15
.Environment info
LightGBM version or commit hash: 3.3.5
Command(s) you used to install LightGBM
The text was updated successfully, but these errors were encountered: