Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] raft v24.12 #2505

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

[RELEASE] raft v24.12 #2505

wants to merge 31 commits into from

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-24.12 and v24.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.12 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-24.12 into main for the release

raydouglass and others added 30 commits September 19, 2024 12:04
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
Forward-merge branch-24.10 into branch-24.12
This PR updates all the RMM imports to use pylibrmm/librmm now that `rmm._lib` is deprecated . It should be merged after [rmm/1676](rapidsai/rmm#1676).

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: #2451
Forward-merge branch-24.10 into branch-24.12
Contributes to rapidsai/build-planning#106

Proposes specifying the RAPIDS version in `conda install` calls in CI that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases.

Also proposes combining together successive `pip install` calls. `pip install AB` is safer than `pip install A; pip install B` because `pip` doesn't take the current set of installed packages into consideration when it installs new packages.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #2467
Follow-up PR to address feedback: #2474 (comment)

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #2475
The 12.6.1 CUDA compiler has issues with enable_if inside the template arguments of some kernels. We can simplify kernel logic and remove the usage of enable_if.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #2469
This PR is replacing the `VAULT_HOST` variable with `AWS_ROLE_ARN`. This is required to use the new token service to get AWS credentials.

Authors:
  - Jordan Jacobelli (https://github.com/jjacobelli)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Bradley Dice (https://github.com/bdice)

URL: #2472
Contributes to rapidsai/build-planning#111

Proposes some small packaging/CI changes, matching similar changes being made across RAPIDS.

* printing `sccache` stats to CI logs
* reducing `pip`'s verbosity in wheel building scripts
* updating to the latest `rapids-dependency-file-generator` (v1.16.0)
* always explicitly specifying `cpp` / `python` in calls to `rapids-upload-wheels-to-s3`

## Notes for Reviewers

This originally also ran wheel builds with `--no-build-isolation`, but I reverted that based on rapidsai/build-planning#108 (comment).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #2470
`thrust::host_vector` initializes its elements at creation and requires the element type be default-constructible.
This translates to `raft::pinned_mdarray` and makes the mdarray unusable for non-default-constructible objects, like `cuda::atomic<>` (and many user-defined types). This is against all other mdarray types in raft, which are based on `rmm::device_uvector` and are not initialized at construction time.
The PR changes the underlying container to a plain pointer + cudaMallocHost/cudaFreeHost.

**Breaking change**: if anyone relies on the `pinned_mdarray` to initialize itself, the code will break (but mdarrays should not initialize at construction in raft anyway). The affected classes have different private members now, so the ABI changes as well.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: #2478
Here is the results of looking at the cudaPointerGetAttributes of different allocation types on Grace + Hopper. Allocations of `malloc` are still usable on the GPU.
```
ccudaPointerGetAttributes attributes malloc ptr
  is_dev_ptr  -> 1
  is_host_ptr -> 1
  memory loc  -> unregistered

cudaPointerGetAttributes attributes cudaMalloc ptr
  is_dev_ptr  -> 1
  is_host_ptr -> 0
  memory loc  -> device

cudaPointerGetAttributes attributes cudaMallocManaged cudaMemAttachGlobal ptr
  is_dev_ptr  -> 1
  is_host_ptr -> 1
  memory loc  -> managed

```

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Micka (https://github.com/lowener)

URL: #2480
This project is incompatible with newer versions of `cuda-python`. This puts ceilings of `<=11.8.3` (CUDA 11) and `<=12.6.0` (CUDA 12) on that library.

Those ceilings should be removed and replaced with `!=` constraints once new releases of `cuda-python` are up that this project is compatible with.

See rapidsai/build-planning#116 for more information.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #2486
…on Cython dependency (#2490)

Contributes to rapidsai/build-planning#110

Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.

* checks on wheel size (compressed),
  - *to be sure they're under PyPI limits*
  - *and to prompt discussion on PRs that significantly increase wheel sizes*
* checks on README formatting
  - *to ensure they'll render properly as the PyPI project homepages*
  - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/*

Also puts a ceiling on Cython to its latest stable release (`<=3.0.11`), to fix #2490 (comment). Work to relax that is tracked in (#2491).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #2490
A simple PR for pinning the FAISS version to fetch the compatible tag for raft-ann-bench

Authors:
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2496
This PR removes raft-ann-bench from the conda packages, build system, and documentation. This removal was previously announced for the 24.12 release in #2448.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Bradley Dice (https://github.com/bdice)

URL: #2497
We are keeping random ball cover headers in RAFT for 24.12, and random ball cover depends on distances and brute-force. Because of this, we're going to leave all of the VSS headers in RAFT for the time being, and will remove them all in a future PR once RBC is formally migrated to cuVS. The tests, benchmarks, and instantiations for all of these APIs will be removed, though, so while the actual headers can still be used, they are no longer being tested and could fail without warning. 

I've also included a note to users in the README about this, stating to use at their own risk.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Bradley Dice (https://github.com/bdice)

URL: #2498
…ng ignored in headers (#2501)

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #2501
Follow-up to #2498 (comment).

This removes an extraneous file in `cpp/template` that resulted from an incorrect merge conflict resolution and a leftover reference to the template project in `update-version.sh`.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #2500
I unfortunately don't have permissions to push on @aamijar branch for the previous Lanczos solver PR (#2416) so I kept his commits and continued it here.

## Lanczos Solver for Sparse Eigen Decomposition
We propose a new lanczos solver in raft that fixes the issues present in the previous solver `raft::sparse::solver::detail::computeSmallestEigenvectors`.

Specifically we address the following issues:
1. Numerical Stability for both float32 and float64 datatypes
2. Efficiency and Speed of Convergence

This new implementation is taken from the cupy library `cupyx.scipy.sparse.linalg.eigsh` where the thick-restart and full reorthogonalzation methods are used.

Additionally this PR exposes a python api for raft lanczos solver with an interface similar to `scipy.sparse.linalg.eigsh` and `cupyx.scipy.sparse.linalg.eigsh`.

```py3
from pylibraft.solver import eigsh
```

Authors:
  - Micka (https://github.com/lowener)
  - Anupam (https://github.com/aamijar)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2481
This upgrade is important because cutlass 3.4 fixed issues with kernel visibility that would otherwise lead to global kernel symbols from raft kernels using cutlass.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #2503
@raydouglass raydouglass requested review from a team as code owners November 21, 2024 20:50
@raydouglass raydouglass requested review from msarahan and removed request for a team November 21, 2024 20:50
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.