Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PushRows APIs are not thread safe for sparse data #5383

Closed
svotaw opened this issue Jul 19, 2022 · 3 comments
Closed

PushRows APIs are not thread safe for sparse data #5383

svotaw opened this issue Jul 19, 2022 · 3 comments
Labels

Comments

@svotaw
Copy link
Contributor

svotaw commented Jul 19, 2022

Description

The API LGBM_DatasetPushRows (and CSR version) are not thread safe for sparse data. Internally, there is some thread safety for the OpenMP parallelism, since each thread gets its own buffer in a SparseBin. However, if multiple threads each call LGBM_DatasetPushRows, OpenMP will return the same set of tid's (i.e. 0 to num_omp_threads-1) for each external thread. Hence, external thread A and external thread B will both get an internal tid of 0, and eventually collide when appending to the same sparse buffer.

Reproducible example

Simply call LGBM_DatasetPushRows on a Dataset with sparse bins from 2 different threads, and you will eventually get an access violation when 2 threads both try to add a record to the same element of a SparseBin push_buffers_ array.

Environment info

N/A

Additional Comments

A fix for this issue has been added to the PR #5299, since we require thread safety to stream between multiple Spark Tasks (this issue was discovered from sporadic test failures with the existing PR). The fix as first pushed only applies to the new streaming push APIs in that PR (e.g. LGBM_DatasetPushRowsWithMetadata), but it could be added to the existing PushRows APIs as well if so desired. The fix (to be approved still) involves initializing the bins with the number of external threads that will be used, thus allowing the SparseBin to add extra push buffers for thread isolation.

@jameslamb jameslamb added the bug label Jul 19, 2022
@shiyu1994
Copy link
Collaborator

@svotaw Thanks for working on this. Yes. I think LGBM_DatasetPushRows wasn't designed for multithreading from outside the API before. I'll review the PR this weekend.

@svotaw
Copy link
Contributor Author

svotaw commented Nov 30, 2022

Fixed with #5551

@svotaw svotaw closed this as completed Nov 30, 2022
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants