Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675

tarunreddy1018 · 2020-12-24T12:03:17Z

Recently I have been using LightGBM 2.1.0 c_api library in production. And It was working fine. And now we have decided to update the version to 3.1.0 and check the performance gain. But we faced an issue where we were getting different results on different runs of predict function for single row (LGBM_BoosterPredictForMatSingleRowFast) method for the same input. But this problem occurs only when this function is being called in multithreaded environment. That means we can have different threads calling this function for different inputs. Could this mean that the predict function is no longer a thread safe?

Sample Code:
Config Setup: LGBM_BoosterPredictForMatSingleRowFastInit(m.model, C.C_API_PREDICT_NORMAL, C.int(0), C.int(0),
C.C_API_DTYPE_FLOAT64, C.int(len(m.featureNames)),
C.CString("num_threads=1"), &m.config)

Predict Call: LGBM_BoosterPredictForMatSingleRowFast(m.config, cinputArray, &outLen, cresult)

The above methods are being called using cgo in golang

guolinke · 2021-01-11T04:47:26Z

cc @AlbertoEAF

AlbertoEAF · 2021-01-13T15:15:23Z

Hello @tarunreddy1018 yes, as far as I recall, neither LGBM_BoosterPredictForMatSingleRowFast nor the non-fast variant LGBM_BoosterPredictForMatSingleRow were thread-safe in my use case, even in v2.3.1. In my use case we had to protect the predict calls with a mutex+lock scheme in the "user" code.

Did you test multi-threading with pre-v3.0.0 code? I recall that there were some changes in 3.0.0 and beyond regarding threading but I don't think there was a regression in thread-safety vs pre-v3.0.0 code. I say this because of my past experience using the library as well where I had the same issues as you.

More details

LGBM_BoosterPredictForMatSingleRowFast differs from the LGBM_BoosterPredictForMatSingleRow in the sense that it keeps your "prediction settings" and booster in your FastConfig (m.config) which is only used "read-only" by LGBM_BoosterPredictForMatSingleRowFast. Besides that, it internally uses the same machinery as the non-fast version:

int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle,
                                           const void* data,
                                           int64_t* out_len,
                                           double* out_result) {
  API_BEGIN();
  FastConfig *fastConfig = reinterpret_cast<FastConfig*>(fastConfig_handle);
  // Single row in row-major format:
  auto get_row_fun = RowPairFunctionFromDenseMatric(data, 1, fastConfig->ncol, fastConfig->data_type, 1);
  fastConfig->booster->PredictSingleRow(fastConfig->predict_type, fastConfig->ncol,
                                        get_row_fun, fastConfig->config,
                                        out_result, out_len);
  API_END();
}

As this "fast" prediction method is simpler, it implies that the non-thread safe code lives in the core: PredictSingleRow and/or RowPairFunctionFromDenseMatric functions, which are used as well in the non-fast variant LGBM_BoosterPredictForMatSingleRow, which is not thread-safe either.

tarunreddy1018 · 2021-01-13T18:33:43Z

Hi @AlbertoEAF , I have tested with multi-threading (from golang wrapper code which uses c api) with pre-v3.0.0 code. In fact we used v2.3.1 in production and all the predictions were as expected and consistent. This is expected because v2.3.1 had a lock guard in the beginning of predict function call. So, even if it is called in multi-threading way, only one thread executes the predict function at a time. So, there was no issue. But starting with v3.0.0, there was an update in locking strategy where it is using "shared_locks" and "unique_locks". So, now predict function calls have a "shared_lock". This means multiple threads can run predict function simultaneously. This is where we started seeing issues. Now, the predictions were not consistent and as expected. This made us to think if the LightGbm predict function calls are no longer thread safe. I was thinking if the intent was not to have the predict function calls thread-safe, then having a "shared_lock" in the predict function would not serve any purpose right?

AlbertoEAF · 2021-01-13T19:10:44Z

Interesting @tarunreddy1018. Recalling better, I used 2.3.150 from lightgbmlib (Microsoft's Java bindings), which corresponds to some commit between tags v2.3.1 and v2.3.2, and already had random scores unlike you when not using locks on my user code, which makes me believe the issue is here for longer than expected.

However, ignoring my results, it seems from your observations there was a regression in the LightGBM thread-safety with those locking strategy changes in v3.
If that really is the case, I'm not sure that was intentional and would almost bet it is not desired and should be considered a bug, right @guolinke?

I was thinking if the intent was not to have the predict function calls thread-safe, then having a "shared_lock" in the predict function would not serve any purpose right?

Correct, this really seems more like a bug with thread-safety than anything else.

AlbertoEAF · 2021-01-27T17:01:41Z

@tarunreddy1018 can we close this issue?

If you want to see improvements regarding threading maybe open a new issue like a feature request and mention this issue and #3751 so we know all the details we discussed.

StrikerRUS · 2021-02-06T21:49:10Z

Fixed via #3771.

github-actions · 2023-08-23T16:23:39Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

This was referenced Jan 13, 2021

Issue with call to c_api Predict functions in multi threaded way from golang #3751

Closed

Fix thread-safety in C API's PredictSingleRow #3771

Merged

StrikerRUS closed this as completed Feb 6, 2021

StrikerRUS mentioned this issue May 15, 2021

Lock-free Inference #4290

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675

Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675

tarunreddy1018 commented Dec 24, 2020

guolinke commented Jan 11, 2021

AlbertoEAF commented Jan 13, 2021 •

edited

Loading

tarunreddy1018 commented Jan 13, 2021

AlbertoEAF commented Jan 13, 2021 •

edited

Loading

AlbertoEAF commented Jan 27, 2021

StrikerRUS commented Feb 6, 2021

github-actions bot commented Aug 23, 2023

Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675

Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675

Comments

tarunreddy1018 commented Dec 24, 2020

guolinke commented Jan 11, 2021

AlbertoEAF commented Jan 13, 2021 • edited Loading

More details

tarunreddy1018 commented Jan 13, 2021

AlbertoEAF commented Jan 13, 2021 • edited Loading

AlbertoEAF commented Jan 27, 2021

StrikerRUS commented Feb 6, 2021

github-actions bot commented Aug 23, 2023

AlbertoEAF commented Jan 13, 2021 •

edited

Loading

AlbertoEAF commented Jan 13, 2021 •

edited

Loading