Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enforce wheel size limits, README formatting in CI #6136

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

jameslamb
Copy link
Member

Description

Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

@jameslamb jameslamb added 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 13, 2024
@github-actions github-actions bot added Cython / Python Cython or Python issue ci labels Nov 13, 2024
@jameslamb jameslamb changed the title WIP: [DO NOT MERGE] enforce wheel size limits, README formatting in CI enforce wheel size limits, README formatting in CI Nov 13, 2024
@jameslamb jameslamb requested a review from bdice November 13, 2024 18:18
@jameslamb jameslamb marked this pull request as ready for review November 13, 2024 18:19
@jameslamb jameslamb requested review from a team as code owners November 13, 2024 18:19
@jameslamb jameslamb removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 13, 2024
]

# detect when package size grows significantly
max_allowed_size_compressed = '1.5G'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this value coming from the existing wheel size, or coming from somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems too big. cuML wheels appear to be closer to 550MB. Source: https://anaconda.org/rapidsai-wheels-nightly/cuml-cu12/files

Maybe set the threshold at 600MB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the existing wheel size + a buffer. It varies by CPU architecture, Python version (because of Cython stuff), and CUDA version (because, for example, we don't use CUDA math lib wheels for CUDA 11).

The largest one I've seen clicking through logs on this PR was CUDA 11.8.0, Python 3.10, amd64:

checking 'final_dist/cuml_cu11-24.12.0a38-cp310-cp310-manylinux_2_28_x86_64.whl'
----- package inspection summary -----
file size
  * compressed size: 1.3G
  * uncompressed size: 2.2G
  * compression space saving: 42.2%
contents
  * directories: 72
  * files: 432 (86 compiled)
size by extension
  * .so - 2.2G (99.9%)
  * .py - 1.4M (0.1%)
  * .pyx - 1.2M (0.1%)
  * .0 - 0.2M (0.0%)
  * .ipynb - 0.1M (0.0%)
  * no-extension - 57.2K (0.0%)
  * .png - 51.3K (0.0%)
  * .pxd - 34.0K (0.0%)
  * .txt - 25.7K (0.0%)
  * .md - 10.3K (0.0%)
  * .h - 2.1K (0.0%)
  * .ini - 0.8K (0.0%)
largest files
  * (2.2G) cuml/libcuml++.so
  * (3.0M) cuml/experimental/fil/fil.cpython-310-x86_64-linux-gnu.so
  * (2.9M) cuml/fil/fil.cpython-310-x86_64-linux-gnu.so
  * (1.5M) cuml/cluster/hdbscan/hdbscan.cpython-310-x86_64-linux-gnu.so
  * (1.5M) cuml/svm/linear.cpython-310-x86_64-linux-gnu.so
------------ check results -----------
errors found while checking: 0

(build link)

So proposing setting this to around 200MB above that size, so we'd be notified if the binary size increased above that level.

There's nothing special about 1.5GB... it's already way way too big to be on PyPI. But proposing putting some limit so that we can get automated feedback from CI about binary size growth, and make informed decisions about whether to do something about it... similar to setting a coverage threshold for tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaahhh, but CUDA 11 is huge. We only did CUDA wheels work for CUDA 12. https://anaconda.org/rapidsai-wheels-nightly/cuml-cu11/files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly:

# Link to the CUDA wheels with shared libraries for CUDA 12+
if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 12.0)
set(CUDA_STATIC_MATH_LIBRARIES OFF)
else()
if(USE_CUDA_MATH_WHEELS)
message(FATAL_ERROR "Cannot use CUDA math wheels with CUDA < 12.0")
endif()
set(CUDA_STATIC_MATH_LIBRARIES ON)
endif()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m indifferent as well. Let’s stick to the single definition for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright sounds good, thanks for considering it. I'm glad these changes are helping to expose these differences and leading to these conversations 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be a fan having two different limits. Mostly because for "not CUDA 11" the limit of 1.5GB might as well be "infinity". As in, if we ever reach it, it will be way to late to course correct.

Should I make a PR that uses Jams' suggestion for two limits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim sure! Go for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim I've put up PRs in other repos following this suggestion, if you'd like something to copy from here in cuml:

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting the large threshold for now. Maybe we can make it depend on CUDA version? Or just wait until we drop CUDA 11.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 8711e44 into rapidsai:branch-24.12 Nov 14, 2024
63 checks passed
@jameslamb jameslamb deleted the wheel-validation branch November 14, 2024 01:17
rapids-bot bot pushed a commit to rapidsai/cuvs that referenced this pull request Nov 15, 2024
`cuvs-cu11` wheels are significantly larger than `cuvs-cu12` wheels, because (among other reasons) they are not able to dynamically link to CUDA math library wheels.

In #464, I proposed a size limit for CI checks of "max CUDA 11 wheel size + a buffer".

This PR proposes using different thresholds based on CUDA major version, following these discussions:

* rapidsai/cugraph#4754 (comment)
* rapidsai/cuml#6136 (comment)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)

URL: #469
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants