enforce wheel size limits, README formatting in CI #6136

jameslamb · 2024-11-13T15:27:03Z

Description

Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

checks on wheel size (compressed),
- to be sure they're under PyPI limits
- and to prompt discussion on PRs that significantly increase wheel sizes
checks on README formatting
- to ensure they'll render properly as the PyPI project homepages
- e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/

dantegd · 2024-11-13T23:59:43Z

python/cuml/pyproject.toml

+]
+
+# detect when package size grows significantly
+max_allowed_size_compressed = '1.5G'


Is this value coming from the existing wheel size, or coming from somewhere else?

This seems too big. cuML wheels appear to be closer to 550MB. Source: https://anaconda.org/rapidsai-wheels-nightly/cuml-cu12/files

Maybe set the threshold at 600MB.

This is the existing wheel size + a buffer. It varies by CPU architecture, Python version (because of Cython stuff), and CUDA version (because, for example, we don't use CUDA math lib wheels for CUDA 11).

The largest one I've seen clicking through logs on this PR was CUDA 11.8.0, Python 3.10, amd64:

checking 'final_dist/cuml_cu11-24.12.0a38-cp310-cp310-manylinux_2_28_x86_64.whl' ----- package inspection summary ----- file size * compressed size: 1.3G * uncompressed size: 2.2G * compression space saving: 42.2% contents * directories: 72 * files: 432 (86 compiled) size by extension * .so - 2.2G (99.9%) * .py - 1.4M (0.1%) * .pyx - 1.2M (0.1%) * .0 - 0.2M (0.0%) * .ipynb - 0.1M (0.0%) * no-extension - 57.2K (0.0%) * .png - 51.3K (0.0%) * .pxd - 34.0K (0.0%) * .txt - 25.7K (0.0%) * .md - 10.3K (0.0%) * .h - 2.1K (0.0%) * .ini - 0.8K (0.0%) largest files * (2.2G) cuml/libcuml++.so * (3.0M) cuml/experimental/fil/fil.cpython-310-x86_64-linux-gnu.so * (2.9M) cuml/fil/fil.cpython-310-x86_64-linux-gnu.so * (1.5M) cuml/cluster/hdbscan/hdbscan.cpython-310-x86_64-linux-gnu.so * (1.5M) cuml/svm/linear.cpython-310-x86_64-linux-gnu.so ------------ check results ----------- errors found while checking: 0

(build link)

So proposing setting this to around 200MB above that size, so we'd be notified if the binary size increased above that level.

There's nothing special about 1.5GB... it's already way way too big to be on PyPI. But proposing putting some limit so that we can get automated feedback from CI about binary size growth, and make informed decisions about whether to do something about it... similar to setting a coverage threshold for tests.

Aaahhh, but CUDA 11 is huge. We only did CUDA wheels work for CUDA 12. https://anaconda.org/rapidsai-wheels-nightly/cuml-cu11/files

Yeah exactly:

cuml/python/cuml/CMakeLists.txt

Lines 96 to 104 in 56e5e62

# Link to the CUDA wheels with shared libraries for CUDA 12+

if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 12.0)

set(CUDA_STATIC_MATH_LIBRARIES OFF)

else()

if(USE_CUDA_MATH_WHEELS)

message(FATAL_ERROR "Cannot use CUDA math wheels with CUDA < 12.0")

endif()

set(CUDA_STATIC_MATH_LIBRARIES ON)

endif()

I’m indifferent as well. Let’s stick to the single definition for now.

alright sounds good, thanks for considering it. I'm glad these changes are helping to expose these differences and leading to these conversations 😊

I'd be a fan having two different limits. Mostly because for "not CUDA 11" the limit of 1.5GB might as well be "infinity". As in, if we ever reach it, it will be way to late to course correct.

Should I make a PR that uses Jams' suggestion for two limits?

@betatim sure! Go for it.

@betatim I've put up PRs in other repos following this suggestion, if you'd like something to copy from here in cuml:

use different wheel-size thresholds based on CUDA version cuvs#469

enforce wheel size limits, README formatting in CI cugraph#4754

bdice

Accepting the large threshold for now. Maybe we can make it depend on CUDA version? Or just wait until we drop CUDA 11.

jameslamb · 2024-11-14T01:17:05Z

/merge

`cuvs-cu11` wheels are significantly larger than `cuvs-cu12` wheels, because (among other reasons) they are not able to dynamically link to CUDA math library wheels. In #464, I proposed a size limit for CI checks of "max CUDA 11 wheel size + a buffer". This PR proposes using different thresholds based on CUDA major version, following these discussions: * rapidsai/cugraph#4754 (comment) * rapidsai/cuml#6136 (comment) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #469

enforce wheel size limits, README formatting in CI

293b5f7

jameslamb added 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 13, 2024

github-actions bot added Cython / Python Cython or Python issue ci labels Nov 13, 2024

jameslamb mentioned this pull request Nov 13, 2024

Add validation on wheel file characteristics in CI rapidsai/build-planning#110

Closed

jameslamb added 2 commits November 13, 2024 09:48

arm64 packages are around 1.1GB

60bbe65

some CUDA 11 wheels can be 1.3G (because no CUDA math lib wheels)

42989fa

jameslamb changed the title ~~WIP: [DO NOT MERGE] enforce wheel size limits, README formatting in CI~~ enforce wheel size limits, README formatting in CI Nov 13, 2024

jameslamb requested a review from bdice November 13, 2024 18:18

jameslamb marked this pull request as ready for review November 13, 2024 18:19

jameslamb requested review from a team as code owners November 13, 2024 18:19

jameslamb removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 13, 2024

dantegd reviewed Nov 13, 2024

View reviewed changes

bdice approved these changes Nov 14, 2024

View reviewed changes

rapids-bot bot merged commit 8711e44 into rapidsai:branch-24.12 Nov 14, 2024
63 checks passed

jameslamb deleted the wheel-validation branch November 14, 2024 01:17

jameslamb mentioned this pull request Nov 14, 2024

use different wheel-size thresholds based on CUDA version rapidsai/cuvs#469

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enforce wheel size limits, README formatting in CI #6136

enforce wheel size limits, README formatting in CI #6136

jameslamb commented Nov 13, 2024

dantegd Nov 13, 2024

bdice Nov 14, 2024

jameslamb Nov 14, 2024

bdice Nov 14, 2024

jameslamb Nov 14, 2024

bdice Nov 14, 2024

jameslamb Nov 14, 2024

betatim Nov 14, 2024

jameslamb Nov 14, 2024

jameslamb Nov 14, 2024

bdice left a comment

jameslamb commented Nov 14, 2024

	# Link to the CUDA wheels with shared libraries for CUDA 12+
	if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 12.0)
	set(CUDA_STATIC_MATH_LIBRARIES OFF)
	else()
	if(USE_CUDA_MATH_WHEELS)
	message(FATAL_ERROR "Cannot use CUDA math wheels with CUDA < 12.0")
	endif()
	set(CUDA_STATIC_MATH_LIBRARIES ON)
	endif()

enforce wheel size limits, README formatting in CI #6136

enforce wheel size limits, README formatting in CI #6136

Conversation

jameslamb commented Nov 13, 2024

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

jameslamb commented Nov 14, 2024