Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] check distributions with pydistcheck #5838

Merged
merged 12 commits into from
May 4, 2023

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Apr 19, 2023

Contributes to #5061.

Proposes adding pydistcheck to the set of checks run on Python sdists and wheels (the artifacts this project publishes to PyPI).

pydistcheck is a linter for Python distributions, similar with check-wheel-contents but with a very different set of checks, including:

I believe adding this will improve our confidence in releases of the Python package.

Notes for Reviewers

Full disclosure, pydistcheck is my own personal side-project: https://github.com/jameslamb/pydistcheck. I started it after a conference talk I gave last year. I've resisted introducing it into LightGBM until now because I didn't think it was appropriate to use my position as a maintainer here to promote my own project.

I'm proposing it now because I think that project is at a level of maturity that it could genuinely be helpful here in LightGBM. I used it, for example, to catch two small packaging issues in pandas recently:

And because it's been very helpful during my active development on changing the packaging strategy here for #5061 (e.g. in #5837 and #5759 ). It has helped me catch several mistakes during that development already.

Thanks very much for considering it.

@@ -17,4 +18,12 @@ if { test "${TASK}" = "bdist" || test "${METHOD}" = "wheel"; }; then
check-wheel-contents ${DIST_DIR}/*.whl || exit -1
fi

echo "pydistcheck..."
pydistcheck \
--inspect \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument --inspect prints a summary like the following:

----- package inspection summary -----
file size
  * compressed size: 1.3M
  * uncompressed size: 4.3M
  * compression space saving: 69.5%
contents
  * directories: 0
  * files: 16 (1 compiled)
size by extension
  * .so - 3963.7K (90.5%)
  * .py - 396.7K (9.1%)
  * no-extension - 17.6K (0.4%)
  * .txt - 0.0K (0.0%)
  * .typed - 0.0K (0.0%)
------------ check results -----------
errors found while checking: 0

(macOS bdist build link)

Which I think would have been useful in situations like this: #5252 (review)

@jameslamb jameslamb changed the title WIP: [ci] [python-package] check distributions with pydistcheck [ci] [python-package] check distributions with pydistcheck Apr 19, 2023
@jameslamb jameslamb marked this pull request as ready for review April 19, 2023 02:29
@jameslamb jameslamb requested a review from StrikerRUS as a code owner April 19, 2023 02:29
@jameslamb jameslamb marked this pull request as draft April 19, 2023 03:12
@jameslamb jameslamb changed the title [ci] [python-package] check distributions with pydistcheck WIP: [ci] [python-package] check distributions with pydistcheck Apr 19, 2023
@hcho3
Copy link
Contributor

hcho3 commented Apr 19, 2023

@jameslamb This is interesting! Let me try pydistcheck on XGBoost. Right now, I often inspect the content of Python wheels by opening it as a ZIP archive. An automated check will be quite useful for me.

@jameslamb
Copy link
Collaborator Author

Thanks @hcho3 , hope it helps you in the projects you work on 😊

Comment on lines 24 to 30
if [ $TASK == "CUDA" ] && [ $METHOD == "wheel" ]; then
pydistcheck \
--inspect \
--ignore 'compiled-objects-have-debug-symbols,max-allowed-size-compressed' \
--max-allowed-size-uncompressed '60M' \
--max-allowed-files 800 \
${DIST_DIR}/* || exit -1
Copy link
Collaborator Author

@jameslamb jameslamb Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if branch accounts for the fact that the CUDA 11.x wheel currently has the following warnings from pydistcheck:

1. [compiled-objects-have-debug-symbols] Found compiled object containing debug symbols. For details, extract the distribution contents and run 'nm --debug-syms "lightgbm/lib_lightgbm.so"'.
2. [distro-too-large-compressed] Compressed size 35.1M is larger than the allowed size (5.0M).
3. [distro-too-large-uncompressed] Uncompressed size 54.8M is larger than the allowed size (15.0M).
errors found while checking: 3

(build link)

I'm not sure why the CUDA build is so much larger than the non-CUDA one or where those debug symbols are coming from...maybe we are statically linking to CUDA libraries? Maybe those debug symbols are coming from the use off -lineinfo compiler flag?

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -O3 -lineinfo")

Either way, I'm proposing not spending time investigating that right now, and instead just at least adding this check to detect if future changes in this project make the wheel significantly larger (which could be a problem in storage-sensitive environments like AWS Lambda).

Comment on lines 31 to 38
elif [ $ARCH == "aarch64" ]; then
pydistcheck \
--inspect \
--ignore 'compiled-objects-have-debug-symbols' \
--max-allowed-size-compressed '5M' \
--max-allowed-size-uncompressed '15M' \
--max-allowed-files 800 \
${DIST_DIR}/* || exit -1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aarch64 wheel we produces one error from pydistcheck:

1. [compiled-objects-have-debug-symbols] Found compiled object containing debug symbols. For details, extract the distribution contents and run 'nm --debug-syms "lightgbm/lib_lightgbm.so"'.
errors found while checking: 1

(build link)

I'm not sure where those are coming from, but I'm proposing ignoring them for now. The aarch64 linux integrated OpenCL wheel has a very similar compressed size (2.3MB) to the arm64 linux wheels (1.6M).

@jameslamb jameslamb changed the title WIP: [ci] [python-package] check distributions with pydistcheck [ci] [python-package] check distributions with pydistcheck Apr 21, 2023
@jameslamb jameslamb marked this pull request as ready for review April 21, 2023 05:20
@jameslamb jameslamb merged commit 11e17f3 into master May 4, 2023
@jameslamb jameslamb deleted the ci/use-pydistcheck branch May 4, 2023 17:02
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants