Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675

Closed
tarunreddy1018 opened this issue Dec 24, 2020 · 7 comments

Comments

@tarunreddy1018
Copy link

Recently I have been using LightGBM 2.1.0 c_api library in production. And It was working fine. And now we have decided to update the version to 3.1.0 and check the performance gain. But we faced an issue where we were getting different results on different runs of predict function for single row (LGBM_BoosterPredictForMatSingleRowFast) method for the same input. But this problem occurs only when this function is being called in multithreaded environment. That means we can have different threads calling this function for different inputs. Could this mean that the predict function is no longer a thread safe?

Sample Code:
Config Setup: LGBM_BoosterPredictForMatSingleRowFastInit(m.model, C.C_API_PREDICT_NORMAL, C.int(0), C.int(0),
C.C_API_DTYPE_FLOAT64, C.int(len(m.featureNames)),
C.CString("num_threads=1"), &m.config)

Predict Call: LGBM_BoosterPredictForMatSingleRowFast(m.config, cinputArray, &outLen, cresult)

The above methods are being called using cgo in golang

@guolinke
Copy link
Collaborator

cc @AlbertoEAF

@AlbertoEAF
Copy link
Contributor

AlbertoEAF commented Jan 13, 2021

Hello @tarunreddy1018 yes, as far as I recall, neither LGBM_BoosterPredictForMatSingleRowFast nor the non-fast variant LGBM_BoosterPredictForMatSingleRow were thread-safe in my use case, even in v2.3.1. In my use case we had to protect the predict calls with a mutex+lock scheme in the "user" code.

Did you test multi-threading with pre-v3.0.0 code? I recall that there were some changes in 3.0.0 and beyond regarding threading but I don't think there was a regression in thread-safety vs pre-v3.0.0 code. I say this because of my past experience using the library as well where I had the same issues as you.

More details

LGBM_BoosterPredictForMatSingleRowFast differs from the LGBM_BoosterPredictForMatSingleRow in the sense that it keeps your "prediction settings" and booster in your FastConfig (m.config) which is only used "read-only" by LGBM_BoosterPredictForMatSingleRowFast. Besides that, it internally uses the same machinery as the non-fast version:

int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle,
                                           const void* data,
                                           int64_t* out_len,
                                           double* out_result) {
  API_BEGIN();
  FastConfig *fastConfig = reinterpret_cast<FastConfig*>(fastConfig_handle);
  // Single row in row-major format:
  auto get_row_fun = RowPairFunctionFromDenseMatric(data, 1, fastConfig->ncol, fastConfig->data_type, 1);
  fastConfig->booster->PredictSingleRow(fastConfig->predict_type, fastConfig->ncol,
                                        get_row_fun, fastConfig->config,
                                        out_result, out_len);
  API_END();
}

As this "fast" prediction method is simpler, it implies that the non-thread safe code lives in the core: PredictSingleRow and/or RowPairFunctionFromDenseMatric functions, which are used as well in the non-fast variant LGBM_BoosterPredictForMatSingleRow, which is not thread-safe either.

@tarunreddy1018
Copy link
Author

Hi @AlbertoEAF , I have tested with multi-threading (from golang wrapper code which uses c api) with pre-v3.0.0 code. In fact we used v2.3.1 in production and all the predictions were as expected and consistent. This is expected because v2.3.1 had a lock guard in the beginning of predict function call. So, even if it is called in multi-threading way, only one thread executes the predict function at a time. So, there was no issue. But starting with v3.0.0, there was an update in locking strategy where it is using "shared_locks" and "unique_locks". So, now predict function calls have a "shared_lock". This means multiple threads can run predict function simultaneously. This is where we started seeing issues. Now, the predictions were not consistent and as expected. This made us to think if the LightGbm predict function calls are no longer thread safe. I was thinking if the intent was not to have the predict function calls thread-safe, then having a "shared_lock" in the predict function would not serve any purpose right?

@AlbertoEAF
Copy link
Contributor

AlbertoEAF commented Jan 13, 2021

Interesting @tarunreddy1018. Recalling better, I used 2.3.150 from lightgbmlib (Microsoft's Java bindings), which corresponds to some commit between tags v2.3.1 and v2.3.2, and already had random scores unlike you when not using locks on my user code, which makes me believe the issue is here for longer than expected.

However, ignoring my results, it seems from your observations there was a regression in the LightGBM thread-safety with those locking strategy changes in v3.
If that really is the case, I'm not sure that was intentional and would almost bet it is not desired and should be considered a bug, right @guolinke?


I was thinking if the intent was not to have the predict function calls thread-safe, then having a "shared_lock" in the predict function would not serve any purpose right?

Correct, this really seems more like a bug with thread-safety than anything else.

@AlbertoEAF
Copy link
Contributor

@tarunreddy1018 can we close this issue?

If you want to see improvements regarding threading maybe open a new issue like a feature request and mention this issue and #3751 so we know all the details we discussed.

@StrikerRUS
Copy link
Collaborator

Fixed via #3771.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants