Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] temporarily stop testing against scikit-learn nightlies, load lib_lightgbm earlier #6654

Merged
merged 10 commits into from
Sep 24, 2024
2 changes: 1 addition & 1 deletion .ci/conda-envs/ci-core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ joblib>=1.3.2
matplotlib-base>=3.7.3
numpy>=1.24.4
pandas>2.0
pyarrow>=6.0
pyarrow-core>=6.0
python-graphviz>=0.20.3
scikit-learn>=1.3.2
scipy>=1.1
Expand Down
2 changes: 1 addition & 1 deletion .ci/test-python-latest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ python -m pip install \
'numpy>=2.0.0.dev0' \
'matplotlib>=3.10.0.dev0' \
'pandas>=3.0.0.dev0' \
'scikit-learn>=1.6.dev0' \
'scikit-learn==1.5.*' \
'scipy>=1.15.0.dev0'

python -m pip install \
Expand Down
45 changes: 45 additions & 0 deletions python-package/lightgbm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,51 @@
Contributors: https://github.com/microsoft/LightGBM/graphs/contributors.
"""

import platform

# gcc's libgomp tries to allocate a small amount of aligned static thread-local storage ("TLS")
# when it's dynamically loaded.
#
# If it's not able to find a block of aligned memory large enough, loading fails like this:
#
# > ../lib/libgomp.so.1: cannot allocate memory in static TLS block
#
# On aarch64 Linux, processes and loaded libraries share the same pool of static TLS,
# which makes such failures more likely on that architecture.
# (ref: https://bugzilla.redhat.com/show_bug.cgi?id=1722181#c6)
#
# Therefore, the later in a process libgomp.so is loaded, the greater the risk that loading
# it will fail in this way... so lightgbm tries to dlopen() it immediately, before any
# other imports or computation.
#
# This should generally be safe to do ... many other dynamically-loaded libraries have fallbacks
# that allow successful loading if there isn't sufficient static TLS available.
#
# libgomp.so absolutely needing it, by design, makes it a special case
# (ref: https://gcc.gcc.gnu.narkive.com/vOXMQqLA/failure-to-dlopen-libgomp-due-to-static-tls-data).
#
# other references:
#
# * https://github.com/microsoft/LightGBM/pull/6654#issuecomment-2352014275
# * https://github.com/microsoft/LightGBM/issues/6509
# * https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
# * https://bugzilla.redhat.com/show_bug.cgi?id=1722181#c6
#
if platform.system().lower() == "linux" and platform.processor().lower() == "aarch64":
import ctypes

try:
# this issue seems specific to libgomp, so no need to attempt e.g. libomp or libiomp
_ = ctypes.CDLL("libgomp.so.1", ctypes.RTLD_GLOBAL)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did seem to fix the issues observed in CI!

But it would probably be simpler to just load lib_lightgbm.{dylib,dll,so} earlier instead. I will test that and push some changes in a few hours. Let's not merge this yet, please.

except: # noqa: E722
# this needs to be try-catched, to handle these situations:
#
# * LightGBM built without OpenMP (-DUSE_OPENMP=OFF)
# * non-gcc OpenMP used (e.g. clang/libomp, icc/libiomp)
# * no file "libgomp.so" available to the linker (e.g. maybe only "libgomp.so.1")
#
pass

from pathlib import Path

from .basic import Booster, Dataset, Sequence, register_logger
Expand Down
Loading