-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] predict() breaks when using a Dataset stored in a file #4034
Comments
Thanks very much for using For Linux example, could you try changing uses of It will be another day or two before I'm able to look at this in depth, apologies. |
Thanks for the quick response!! The description above now uses a permanent file and shows the error message, which is still there after updating the example. |
I started looking into this tonight. I think the two issues might be unrelated but not sure yet, so it's ok to leave them here as one thing for now. I was able to reproduce the "Data file doesn't exist" bug on my Mac, with slightly simpler sample code. library(lightgbm)
# set up training data
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
# set up scoring data
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(
dataset = dtrain
, data = test$data
, label = test$label
)
test_file <- file.path(getwd(), "test.bin")
if (file.exists(test_file)) {
file.remove(test_file)
}
lgb.Dataset.save(
dataset = dtest
, fname = test_file
)
model <- lgb.train(
params = list(
objective = "regression"
, metric = "l2"
)
, data = dtrain
, nrounds = 5L
, learning_rate = 1.0
)
model$predict(test_file) I saw this behavior on |
@ticarki sorry for the delay in getting back to you. For the Windows half of this issue, I'm confident now that it's the same as #4045. I just submitted a fix for that issue (#4155). I tried your Windows example above on the branch for #4155 and no longer see a crash. Could you please try it out? You can follow the steps at #4045 (comment) to install from that feature branch. I haven't tested yet if the problem you saw on Linux is related. I suspect that it isn't. So for now, I'm going to change the name of this issue to just describe that problem. Let me know if you disagree with how I've rephrased the title. |
Ok, I came back to look at this tonight. I think that now, thanks to #4252, the reproducible examples above will produce a more informative error message.
I realize now that the examples are trying to predict on a saved LightGBM Dataset. I don't think that is supported. As @shiyu1994 said in #4210 (comment)
I believe that LightGBM/src/application/predictor.hpp Line 169 in f831808
Lines 232 to 239 in f831808
Line 177 in f831808
@shiyu1994 am I right about that? If I am, I can update the documentation to clarify the supported file types. @ticarki if you want to get predictions from a trained model and want to do that on data stored in a file, you'll have to use raw data in one of those formats for now. Adding this to the end of the code from #4034 (comment) worked for me. test_csv <- file.path(getwd(), "test.csv")
write.table(
x = as.matrix(test$data)
, file = test_csv
, row.names = FALSE
, col.names = FALSE
, sep = ","
)
preds_from_file <- model$predict(test_csv, header = FALSE)
preds_in_mem <- model$predict(out_data)
identical(preds_from_file, preds_in_mem) |
@jameslamb Yes. Currently a
This claim is wrong as pointed out by @StrikerRUS in #4210 (comment). We can use the |
In #4545, I've proposed some documentation changes and an error message change to try to make it a bit clearer that only text files are supported in For anyone finding this issue, you can try the following sample code with the R package to evaluate a constructed library(lightgbm)
# set up training data
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
# set up scoring data
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(
dataset = dtrain
, data = test$data
, label = test$label
)
dtest$construct()
test_file <- file.path(getwd(), "test.bin")
if (file.exists(test_file)) {
file.remove(test_file)
}
lgb.Dataset.save(
dataset = dtest
, fname = test_file
)
model <- lgb.train(
params = list(
objective = "regression"
, metric = "l2"
, learning_rate = 1.0
)
, data = dtrain
, nrounds = 5L
)
# evaluate constructed dataset
model$eval(
data = lgb.Dataset(
data = test_file
)$construct()
, name = "test_set"
) |
…ed Datasets (fixes #4034) (#4545) * documentation changes * add list of supported formats to error message * add unit tests * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * update per review comments * make references consistent Co-authored-by: Nikita Titov <[email protected]>
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
On Windows R crashes using Dataset.lgb.save, without error message.
On Linux I am able to save the dataset, but lgb.predict can not find saved dataset
Reproducible example
For the Windows bug (the example given by lightgbm::lgb.Dataset.save)
For the Linux bug (Example given by lightgbm::lgb.load + predict using a file as input)
The error:
Environment info
LightGBM version or commit hash:
lightgbm_3.1.1
Command(s) you used to install LightGBM
install.packages('lightgbm')
The text was updated successfully, but these errors were encountered: