Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) #4545

Merged
merged 10 commits into from
Aug 25, 2021

Conversation

jameslamb
Copy link
Collaborator

Fixes #4034.
Contributes to #4310.

predict() methods in LightGBM can be used with data stored in a text file (CSV, TSV, or LibSVM). Those methods do not support creating predictions on other formats like LightGBM Dataset objects saved in a binary file.

This PR proposes the following changes to fix #4034:

  • clarifies the supported file formats for predict methods in the R and Python package documentation
  • adds a unit test in the R package on the behavior of calling Booster$eval() on a constructed Dataset stored in a file ([R-package] predict() breaks when using a Dataset stored in a file #4034 (comment))
  • updates the relevant error message in the C++ code used to determine file type when predicting on a file, to make it clearer that only certain formats of text file are supported

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up! I have some comments below:

R-package/R/lgb.Dataset.R Outdated Show resolved Hide resolved
R-package/R/lgb.Dataset.R Outdated Show resolved Hide resolved
python-package/lightgbm/basic.py Outdated Show resolved Hide resolved
src/io/parser.cpp Outdated Show resolved Hide resolved
@jameslamb jameslamb requested a review from StrikerRUS August 23, 2021 14:07
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM, except one minor unifying suggestion.

Comment on lines 714 to 715
#' a character representing a path to a text file (CSV, TSV, or LibSVM)
#' or a LightGBM Dataset binary file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make identical descriptions really identical without paraphrasing. Users might spend their time trying to find a difference where there is actually no any difference. Also, it will help us to not miss all occurrences of identical parameters during possible future updates.

Suggested change
#' a character representing a path to a text file (CSV, TSV, or LibSVM)
#' or a LightGBM Dataset binary file
#' a character representing a path to a text file (CSV, TSV, or LibSVM),
#' or a character representing a path to a binary \code{Dataset} file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sure, seems fine to me!

I'll do this one manually instead of applying in the browser, since the corresponding lgb.Dataset.Rd will also have to be regenerated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in de189a6, thanks as always for being thorough! You're right, that will make it easier to catch all such phrases in the future.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-package] predict() breaks when using a Dataset stored in a file
2 participants