[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) #4545

jameslamb · 2021-08-22T03:52:34Z

Fixes #4034.
Contributes to #4310.

predict() methods in LightGBM can be used with data stored in a text file (CSV, TSV, or LibSVM). Those methods do not support creating predictions on other formats like LightGBM Dataset objects saved in a binary file.

This PR proposes the following changes to fix #4034:

clarifies the supported file formats for predict methods in the R and Python package documentation
adds a unit test in the R package on the behavior of calling Booster$eval() on a constructed Dataset stored in a file ([R-package] predict() breaks when using a Dataset stored in a file #4034 (comment))
updates the relevant error message in the C++ code used to determine file type when predicting on a file, to make it clearer that only certain formats of text file are supported

StrikerRUS

Thank you for picking this up! I have some comments below:

R-package/R/lgb.Dataset.R

python-package/lightgbm/basic.py

src/io/parser.cpp

Co-authored-by: Nikita Titov <[email protected]>

… into fix/predict-from-file

StrikerRUS

Thank you! LGTM, except one minor unifying suggestion.

StrikerRUS · 2021-08-23T23:36:55Z

R-package/R/lgb.Dataset.R

+#'             a character representing a path to a text file (CSV, TSV, or LibSVM)
+#'             or a LightGBM Dataset binary file


Let's make identical descriptions really identical without paraphrasing. Users might spend their time trying to find a difference where there is actually no any difference. Also, it will help us to not miss all occurrences of identical parameters during possible future updates.

Suggested change

#' a character representing a path to a text file (CSV, TSV, or LibSVM)

#' or a LightGBM Dataset binary file

#' a character representing a path to a text file (CSV, TSV, or LibSVM),

#' or a character representing a path to a binary \code{Dataset} file

oh sure, seems fine to me!

I'll do this one manually instead of applying in the browser, since the corresponding lgb.Dataset.Rd will also have to be regenerated.

updated in de189a6, thanks as always for being thorough! You're right, that will make it easier to catch all such phrases in the future.

github-actions · 2023-08-23T16:27:36Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added 4 commits August 19, 2021 22:09

documentation changes

fc45eb5

Merge branch 'master' into fix/predict-from-file

fc3e7ba

add list of supported formats to error message

9a0cc23

add unit tests

582362a

jameslamb requested review from shiyu1994 and StrikerRUS August 22, 2021 03:52

jameslamb requested review from btrotta, chivee, guolinke, henry0312 and Laurae2 as code owners August 22, 2021 03:52

This was referenced Aug 22, 2021

release 3.3.0 #4310

Closed

[R-package] predict() breaks when using a Dataset stored in a file #4034

Closed

jameslamb added doc maintenance and removed doc labels Aug 22, 2021

StrikerRUS requested changes Aug 22, 2021

View reviewed changes

R-package/R/lgb.Dataset.R Outdated Show resolved Hide resolved

R-package/R/lgb.Dataset.R Outdated Show resolved Hide resolved

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved

src/io/parser.cpp Outdated Show resolved Hide resolved

jameslamb and others added 4 commits August 22, 2021 15:53

Apply suggestions from code review

4dcca6a

Co-authored-by: Nikita Titov <[email protected]>

Merge branch 'master' into fix/predict-from-file

f14ec66

Merge branch 'fix/predict-from-file' of github.com:microsoft/LightGBM…

4bd8841

… into fix/predict-from-file

update per review comments

33448b6

jameslamb requested a review from StrikerRUS August 23, 2021 14:07

StrikerRUS approved these changes Aug 23, 2021

View reviewed changes

jameslamb added 2 commits August 24, 2021 17:06

Merge branch 'master' into fix/predict-from-file

8458753

make references consistent

de189a6

jameslamb merged commit 417ba19 into master Aug 25, 2021

jameslamb deleted the fix/predict-from-file branch August 25, 2021 02:33

jameslamb mentioned this pull request Oct 14, 2021

[R-package] [ci] Test failure on CRAN's r-devel-windows-x86_64-gcc10-UCRT check #4680

Closed

jameslamb mentioned this pull request Jan 2, 2022

python api can't continue train with binary file data #4311

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) #4545

[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) #4545

jameslamb commented Aug 22, 2021

StrikerRUS left a comment •

edited

Loading

StrikerRUS left a comment

StrikerRUS Aug 23, 2021

jameslamb Aug 24, 2021

jameslamb Aug 24, 2021

github-actions bot commented Aug 23, 2023

		#' a character representing a path to a text file (CSV, TSV, or LibSVM)
		#' or a LightGBM Dataset binary file

[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) #4545

[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) #4545

Conversation

jameslamb commented Aug 22, 2021

StrikerRUS left a comment • edited Loading

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS Aug 23, 2021

Choose a reason for hiding this comment

jameslamb Aug 24, 2021

Choose a reason for hiding this comment

jameslamb Aug 24, 2021

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

StrikerRUS left a comment •

edited

Loading