-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] [python-package] check distributions with pydistcheck #5838
Conversation
.ci/check_python_dists.sh
Outdated
@@ -17,4 +18,12 @@ if { test "${TASK}" = "bdist" || test "${METHOD}" = "wheel"; }; then | |||
check-wheel-contents ${DIST_DIR}/*.whl || exit -1 | |||
fi | |||
|
|||
echo "pydistcheck..." | |||
pydistcheck \ | |||
--inspect \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument --inspect
prints a summary like the following:
----- package inspection summary -----
file size
* compressed size: 1.3M
* uncompressed size: 4.3M
* compression space saving: 69.5%
contents
* directories: 0
* files: 16 (1 compiled)
size by extension
* .so - 3963.7K (90.5%)
* .py - 396.7K (9.1%)
* no-extension - 17.6K (0.4%)
* .txt - 0.0K (0.0%)
* .typed - 0.0K (0.0%)
------------ check results -----------
errors found while checking: 0
Which I think would have been useful in situations like this: #5252 (review)
@jameslamb This is interesting! Let me try pydistcheck on XGBoost. Right now, I often inspect the content of Python wheels by opening it as a ZIP archive. An automated check will be quite useful for me. |
Thanks @hcho3 , hope it helps you in the projects you work on 😊 |
.ci/check_python_dists.sh
Outdated
if [ $TASK == "CUDA" ] && [ $METHOD == "wheel" ]; then | ||
pydistcheck \ | ||
--inspect \ | ||
--ignore 'compiled-objects-have-debug-symbols,max-allowed-size-compressed' \ | ||
--max-allowed-size-uncompressed '60M' \ | ||
--max-allowed-files 800 \ | ||
${DIST_DIR}/* || exit -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
branch accounts for the fact that the CUDA 11.x wheel currently has the following warnings from pydistcheck
:
1. [compiled-objects-have-debug-symbols] Found compiled object containing debug symbols. For details, extract the distribution contents and run 'nm --debug-syms "lightgbm/lib_lightgbm.so"'.
2. [distro-too-large-compressed] Compressed size 35.1M is larger than the allowed size (5.0M).
3. [distro-too-large-uncompressed] Uncompressed size 54.8M is larger than the allowed size (15.0M).
errors found while checking: 3
I'm not sure why the CUDA build is so much larger than the non-CUDA one or where those debug symbols are coming from...maybe we are statically linking to CUDA libraries? Maybe those debug symbols are coming from the use off -lineinfo
compiler flag?
Line 218 in f74875e
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -O3 -lineinfo") |
Either way, I'm proposing not spending time investigating that right now, and instead just at least adding this check to detect if future changes in this project make the wheel significantly larger (which could be a problem in storage-sensitive environments like AWS Lambda).
.ci/check_python_dists.sh
Outdated
elif [ $ARCH == "aarch64" ]; then | ||
pydistcheck \ | ||
--inspect \ | ||
--ignore 'compiled-objects-have-debug-symbols' \ | ||
--max-allowed-size-compressed '5M' \ | ||
--max-allowed-size-uncompressed '15M' \ | ||
--max-allowed-files 800 \ | ||
${DIST_DIR}/* || exit -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aarch64 wheel we produces one error from pydistcheck
:
1. [compiled-objects-have-debug-symbols] Found compiled object containing debug symbols. For details, extract the distribution contents and run 'nm --debug-syms "lightgbm/lib_lightgbm.so"'.
errors found while checking: 1
I'm not sure where those are coming from, but I'm proposing ignoring them for now. The aarch64 linux integrated OpenCL wheel has a very similar compressed size (2.3MB) to the arm64 linux wheels (1.6M).
This pull request has been automatically locked since there has not been any recent activity since it was closed. |
Contributes to #5061.
Proposes adding
pydistcheck
to the set of checks run on Python sdists and wheels (the artifacts this project publishes to PyPI).pydistcheck
is a linter for Python distributions, similar withcheck-wheel-contents
but with a very different set of checks, including:lib_lightgbm.so
/lib_lightgbm.dll
I believe adding this will improve our confidence in releases of the Python package.
Notes for Reviewers
Full disclosure,
pydistcheck
is my own personal side-project: https://github.com/jameslamb/pydistcheck. I started it after a conference talk I gave last year. I've resisted introducing it into LightGBM until now because I didn't think it was appropriate to use my position as a maintainer here to promote my own project.I'm proposing it now because I think that project is at a level of maturity that it could genuinely be helpful here in LightGBM. I used it, for example, to catch two small packaging issues in
pandas
recently:And because it's been very helpful during my active development on changing the packaging strategy here for #5061 (e.g. in #5837 and #5759 ). It has helped me catch several mistakes during that development already.
Thanks very much for considering it.