-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] [python-package] temporarily stop testing against scikit-learn nightlies, load lib_lightgbm earlier #6654
Conversation
😫 QEMU aarch64 job is failing with this error again:
For lots of prior context on this: #6509 |
Back in #6509 (comment), I found that much of the static TLS was being used by libraries for cloud providers (AWS / Azure / GCP). I just pushed 0229097 switching the CI environments here from conda-forge's That |
Switching to I'm able to reproduce it locally (in Docker on my M2 mac, which importantly is also arm64): environment setup in docker similar to CI (click me)docker run \
--rm \
-v $(pwd):/opt/LightGBM \
-w /opt/LightGBM \
-it lightgbm/vsts-agent:manylinux2014_aarch64 \
bash
curl \
-sL \
-o miniforge.sh \
https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh
sh miniforge.sh -b -p "${HOME}/miniconda3"
export PATH="${HOME}/miniconda3/bin:${PATH}"
conda create \
-y \
-c conda-forge \
-n test-env \
--file ./.ci/conda-envs/ci-core.txt \
"python=3.12"
source activate test-env This fails with the same error seen in CI: source activate test-env
sh build-python.sh bdist_wheel
pip install --no-deps \
./dist/lightgbm-4.5.0.99-py3-none-linux_aarch64.whl
python -c "import lightgbm; print(lightgbm.__version__)"
# OSError: /root/miniconda3/envs/test-env/bin/../lib/libgomp.so.1: cannot allocate memory in static TLS block Uninstalling conda uninstall --yes scikit-learn
python -c "import lightgbm; print(lightgbm.__version__)"
# 4.5.0.99 output of 'conda info' (click me)
output of 'conda env export' (click me)
That doesn't mean anything I suspected that maybe the issue is from mixing I'll try to investigate more tomorrow 😫 |
python-package/lightgbm/__init__.py
Outdated
|
||
try: | ||
# this issue seems specific to libgomp, so no need to attempt e.g. libomp or libiomp | ||
_ = ctypes.CDLL("libgomp.so.1", ctypes.RTLD_GLOBAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did seem to fix the issues observed in CI!
But it would probably be simpler to just load lib_lightgbm.{dylib,dll,so}
earlier instead. I will test that and push some changes in a few hours. Let's not merge this yet, please.
@StrikerRUS I've changed this significantly since your first review, in response to this issue: #6654 (comment) Could you please review again whenever you have time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Thanks a lot for the fix to another one problem from our old friend OpenMP!
Haha yep, thanks very much! I appreciate you reviewing this, I know the evidence supporting this fix is very dense. At least this one feels like it will make the experience permanently better and isn't just a workaround... and I think it's getting better with every release as we learn more. |
@jameslamb |
ugh, sorry I keep forgetting to do that! Thank you for reminding me. I just removed that version from RTD. |
As described in #6653,
lightgbm
is currently failing scikit-learn compatibility tests against the latestscikit-learn
nightlies (1.6.0dev0).That's being worked on in #6651.
This PR proposes temporarily dropping
scikit-learn
from the list of projects whose nightlieslightgbm
is tested against, to unblock CI here.Update (Sep 22)
While working on this, CI for the Python package started failing in another way:
This proposes a permanent fix for that as well... trying to
dlopen()
load_lightgbm as early as possible when runningimport lightgbm
.