`LGBM_BoosterPredictForCSRSingleRowFastInit` behavior differs from documentation #4980

david-cortes · 2022-01-26T00:14:16Z

Calling C function LGBM_BoosterPredictForCSRSingleRowFastInit has undocumented side effects in the booster object that it takes as input. According to the documentation, this function should produce a FastConfigHandle object which is to then be passed to the prediction function LGBM_BoosterPredictForCSRSingleRowFast, but additionally:

Calling this function will modify the BoosterHandle input that it is passed, changing its behavior also in calls to LGBM_BoosterPredictForCSRSingleRow (which do not use the produced FastConfigHandle).
Calling this function twice reusing the same BoosterHandle but with different parameters (such as start_iteration) will in the second call produce an object which still has the configuration from the earlier object.
Calling LGBM_BoosterPredictForCSRSingleRow with a different predict_type will affect the configuration from the FastConfigHandle object in further calls to LGBM_BoosterPredictForCSRSingleRowFast, even though FastConfigHandle is not passed to LGBM_BoosterPredictForCSRSingleRow.

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2022-01-26T14:14:14Z

@AlbertoEAF Could you please take a look?

AlbertoEAF · 2022-02-14T12:37:54Z

Hello,

I'm not available this month and probably not the next either.

I believe @david-cortes is likely right. If I recall correctly, there might be some Predict operations that still have some internal coupling with the Booster itself, as that's the way the prediction functions of old (which the Fast ones reuse) are implemented. Doing predictions with certain changed parameters can modify the Booster itself, thus afecting posterior calls to Predict functions of any kind, including the Fast variants.

This being the case it won't be a trivial fix.

Maybe as a first step we should document to what side-effects Fast and FastInit prediction functions are susceptible to first. We should state in their docs that currently, prediction functions of any kind (fast & non-fast) can modify the Booster itself, indirectly affecting posterior use of the Fast methods that one assumes depend only on their init params (as they should). We must first review which parameters do this though.

Note: In particular, and by design, to minimize calls at every FastPredict call, threading parameters are only set in FastInit and assume no one is using other threading parameter values other than those set in the FastInit call. I know it's not the best design, but as long as it's documented, this gives a better performance, and in case one wants to reset OpenMP's parallel processing settings he can call the FastInit again.
Maybe we could add a FastInitReset method or something, to force again the FastInit'ed set of parameters passed in FastInit for that FastConfigHandle, if there's some possibility of contamination from calls with other parameters.

Maybe this approach of having a FastInitReset or FastInitUpdate to modify the FastConfigHandle settings / force them again can be used as a stop-gap instead of having to modify all the internal Predict+Booster machinery in LGBM which has some coupling and will take some time to refactor.

I also don't know yet if it's viable to refactor LGBM to fix this without sacrificing performance for the non-Fast methods, which should never be an option in my view.

StrikerRUS · 2022-02-15T21:58:38Z

@AlbertoEAF

I'm not available this month and probably not the next either.

No problem! Thank you very much for your detailed input!

david-cortes · 2022-08-23T18:17:15Z

Once this gets sorted out, the R docs need to be updated to reflect the correct behavior: #5312 (comment)

david-cortes mentioned this issue Jan 26, 2022

[R-package] Add missing prediction functions to R interface #4982

Closed

AlbertoEAF mentioned this issue Jun 2, 2022

bug in tests: Java surefire forking is causing LightGBM to fail with serious C++ issues feedzai/feedzai-openml-java#122

Open

david-cortes mentioned this issue Jun 20, 2022

[R-package] Add remainder of prediction funtions #5312

Merged

jameslamb added the bug label Jun 1, 2023

Ten0 mentioned this issue Oct 11, 2023

Fix single row prediction performance in a multi-threaded environment #6024

Merged

stonebrakert6 mentioned this issue Oct 13, 2023

Closed - Opened by Mistake #6141

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`LGBM_BoosterPredictForCSRSingleRowFastInit` behavior differs from documentation #4980

`LGBM_BoosterPredictForCSRSingleRowFastInit` behavior differs from documentation #4980

david-cortes commented Jan 26, 2022 •

edited

Loading

StrikerRUS commented Jan 26, 2022

AlbertoEAF commented Feb 14, 2022 •

edited

Loading

StrikerRUS commented Feb 15, 2022

david-cortes commented Aug 23, 2022

LGBM_BoosterPredictForCSRSingleRowFastInit behavior differs from documentation #4980

LGBM_BoosterPredictForCSRSingleRowFastInit behavior differs from documentation #4980

Comments

david-cortes commented Jan 26, 2022 • edited Loading

StrikerRUS commented Jan 26, 2022

AlbertoEAF commented Feb 14, 2022 • edited Loading

StrikerRUS commented Feb 15, 2022

david-cortes commented Aug 23, 2022

`LGBM_BoosterPredictForCSRSingleRowFastInit` behavior differs from documentation #4980

`LGBM_BoosterPredictForCSRSingleRowFastInit` behavior differs from documentation #4980

david-cortes commented Jan 26, 2022 •

edited

Loading

AlbertoEAF commented Feb 14, 2022 •

edited

Loading