-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting random outputs when calling LightGBM Predict for single row function in a multi threaded environment (LGBM_BoosterPredictForMatSingleRowFast) method #3675
Comments
cc @AlbertoEAF |
Hello @tarunreddy1018 yes, as far as I recall, neither Did you test multi-threading with pre-v3.0.0 code? I recall that there were some changes in 3.0.0 and beyond regarding threading but I don't think there was a regression in thread-safety vs pre-v3.0.0 code. I say this because of my past experience using the library as well where I had the same issues as you. More details
int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle,
const void* data,
int64_t* out_len,
double* out_result) {
API_BEGIN();
FastConfig *fastConfig = reinterpret_cast<FastConfig*>(fastConfig_handle);
// Single row in row-major format:
auto get_row_fun = RowPairFunctionFromDenseMatric(data, 1, fastConfig->ncol, fastConfig->data_type, 1);
fastConfig->booster->PredictSingleRow(fastConfig->predict_type, fastConfig->ncol,
get_row_fun, fastConfig->config,
out_result, out_len);
API_END();
} As this "fast" prediction method is simpler, it implies that the non-thread safe code lives in the core: |
Hi @AlbertoEAF , I have tested with multi-threading (from golang wrapper code which uses c api) with pre-v3.0.0 code. In fact we used v2.3.1 in production and all the predictions were as expected and consistent. This is expected because v2.3.1 had a lock guard in the beginning of predict function call. So, even if it is called in multi-threading way, only one thread executes the predict function at a time. So, there was no issue. But starting with v3.0.0, there was an update in locking strategy where it is using "shared_locks" and "unique_locks". So, now predict function calls have a "shared_lock". This means multiple threads can run predict function simultaneously. This is where we started seeing issues. Now, the predictions were not consistent and as expected. This made us to think if the LightGbm predict function calls are no longer thread safe. I was thinking if the intent was not to have the predict function calls thread-safe, then having a "shared_lock" in the predict function would not serve any purpose right? |
Interesting @tarunreddy1018. Recalling better, I used 2.3.150 from lightgbmlib (Microsoft's Java bindings), which corresponds to some commit between tags v2.3.1 and v2.3.2, and already had random scores unlike you when not using locks on my user code, which makes me believe the issue is here for longer than expected. However, ignoring my results, it seems from your observations there was a regression in the LightGBM thread-safety with those locking strategy changes in v3.
Correct, this really seems more like a bug with thread-safety than anything else. |
@tarunreddy1018 can we close this issue? If you want to see improvements regarding threading maybe open a new issue like a feature request and mention this issue and #3751 so we know all the details we discussed. |
Fixed via #3771. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Recently I have been using LightGBM 2.1.0 c_api library in production. And It was working fine. And now we have decided to update the version to 3.1.0 and check the performance gain. But we faced an issue where we were getting different results on different runs of predict function for single row (LGBM_BoosterPredictForMatSingleRowFast) method for the same input. But this problem occurs only when this function is being called in multithreaded environment. That means we can have different threads calling this function for different inputs. Could this mean that the predict function is no longer a thread safe?
Sample Code:
Config Setup: LGBM_BoosterPredictForMatSingleRowFastInit(m.model, C.C_API_PREDICT_NORMAL, C.int(0), C.int(0),
C.C_API_DTYPE_FLOAT64, C.int(len(m.featureNames)),
C.CString("num_threads=1"), &m.config)
Predict Call: LGBM_BoosterPredictForMatSingleRowFast(m.config, cinputArray, &outLen, cresult)
The above methods are being called using cgo in golang
The text was updated successfully, but these errors were encountered: