Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how C++ wheels are handled in devcontainer builds #119

Open
vyasr opened this issue Nov 16, 2024 · 4 comments
Open

Rework how C++ wheels are handled in devcontainer builds #119

vyasr opened this issue Nov 16, 2024 · 4 comments

Comments

@vyasr
Copy link
Contributor

vyasr commented Nov 16, 2024

Currently C++ wheels are part of the devcontainer manifest and are therefore among the libraries built in those environments. However, in these environments the wheel build is designed to be a no-op because the library is already installed. The implementation involves 1) the CMake for the C++ exiting early if a find_package call finds that the library has already been built elsewhere, 2) the Python logic for load_library allowing through various cases where no library is found under the assumption that the Python extension modules will have suitable RPATHs baked in to find libraries, and 3) various dependency shenanigans to ensure that we get the desired dependency lists in different cases (e.g. DLFW vs. pip devcontainers vs. conda devcontainers). These changes require us to jump through a number of hoops downstream because they force us to consider cases where a C++ wheel package may be installed but actually not contain any DSO. With the benefit of some experience, it's time that we reimagine how we handle these various cases.

Rather than doing what we currently do, we can also satisfy all of these requirements by simply not installing the C++ wheels at all into environments where the system library is always preferred. The difficulty with accomplishing this so far in devcontainers and DLFW has been that this conflicts with the declared dependency lists, which state that e.g. pylibcudf depends on libcudf. Trying to remove the libcudf installation results in pip thinking the dependency is unsatisfied and going to install it. We can fix this by changing our dependencies.yaml files to ensure that when building with certain settings (config-settings for RBB that are passed to dfg's dependency generator) the C++ wheels are filtered out of the dependency list altogether. This solution should ensure that we never install C++ wheels except in environments where they are needed.

As a consequence of this, we will always have C++ wheels that contain DSOs (no more empty shells) and they can always default to loading the library from the wheel. They can also be consistently configured to allow loading system libraries (e.g. using an environment variable).

@vyasr
Copy link
Contributor Author

vyasr commented Nov 16, 2024

@trxcllnt pointed out that we could just reuse the use_cuda_wheels flag since that already pretty much covers this (we may want to rename it to capture the more general nature though).

@jameslamb
Copy link
Member

I broadly support this.

Here's the set of dependencies I see this being immediately relevant for:

  • RAPIDS lib{project} wheels (e.g. libcudf, libcuspatial, etc.)
  • CUDA toolkit, math libraries, etc. libraries (e.g. nvidia-nccl-cu{11,12}, nvidia-cublas-cu{11,12})
  • UCX wheels (i.e. libucx-cu{11,12})

Doing this would also allow us to drop some patches in DLFW, like the ones used to patch out the libucx-cu{11,12} dependency from ucxx.

I agree that re-using the use_cuda_wheels mechanism (docs for those finding this from search) is one path forward, and that if we do that we should change the name to something more general (use_shared_lib_wheels?).

But what if...

I can think of another implementation though. What if we added filtering support to rapids-dependency-file-generator?

Right now, rapids-make-pip-dependencies from the rapids-build-utils devcontainer feature does stuff like this:

cat "${requirement[@]}" "${pip_reqs_txts[@]}"                                                           \
| (grep -v '^#' || [ "$?" == "1" ])                                                                       \
| (grep -v -E '^$' || [ "$?" == "1" ])                                                                    \
| ( if test -n "${no_dedupe-}"; then cat -; else tr -s "[:blank:]" | LC_ALL=C sort -u; fi )               \
| (grep -v -P "^($(tr -d '[:blank:]' <<< "${pip_noinstall[@]/%/|}"))(=.*|>.*|<.*)?$" || [ "$?" == "1" ]) 

(code link)

What if, instead, rapids-dependency-file-generator picked up an --exclude argument similar to auditwheel repair that was applied as a filter on the resolved set of dependency names?

Like for creating an environment:

matrix_selectors="cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};cuda_suffixed=true"

rapids-dependency-file-generator \
  --output requirements \
  --file-key "py_run_cugraph" \
  --matrix "${matrix_selectors}" \
  --exclude 'libucx-cu12' \
  --exclude 'nvidia-nccl-cu12' \
| requirements.txt

And for builds, it could be provided to rapids-build-backend through --config-settings, like

pip wheel \
  --no-build-isolation \
  --no-deps \
  --config-settings rapidsai.exclude-deps="libucx-cu12;nvidia-nccl-cu12" \
  .

We could bikeshed about exactly how to do that (wildcards? globs? regular expression? actually listing out literal distribution names?) and other implementation details.

But in general.... I think there's something to this? It'd mean that the cudf, cuml, cugraph etc. dependencies.yaml don't have to encode all this information. They can just be more focused on what's necessary for their CI and published releases.

@vyasr
Copy link
Contributor Author

vyasr commented Dec 5, 2024

That seems like a good approach for this problem. I think we have enough data points illustrating what we need that it makes sense to search for ways to stop stuffing more complexity into dependencies.yaml and implement this sort of helpful feature in dfg/rbb.

@jameslamb
Copy link
Member

Don't have a clear idea yet of how (if at all) it could be connected, but linking because it seems related... I just learned more fully about how rapids-make-{conda,pip}-dependencies can be customized to exclude packages: rapidsai/devcontainers#432 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants