Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate cudf.isclose #17351

Conversation

enoch-gives-thanks
Copy link

@enoch-gives-thanks enoch-gives-thanks commented Nov 17, 2024

Description

This PR adds deprecation warning to cudf.isclose and direct user to an alternative solution.
closes #13593

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

msarahan and others added 4 commits November 15, 2024 20:54
This is a prototype implementation of rapidsai/build-infra#139

The work that this builds on:
* rapidsai/gha-tools#118, which adds a shell wrapper that automatically creates spans for the commands that it wraps. It also uses the `opentelemetry-instrument` command to set up monkeypatching for supported Python libraries, if the command is python-based
* https://github.com/rapidsai/shared-workflows/tree/add-telemetry, which installs the gha-tools work from above and sets necessary environment variables. This is only done for the conda-cpp-build.yaml shared workflow at the time of submitting this PR.

The goal of this PR is to observe telemetry data sent from a GitHub Actions build triggered by this PR as a proof of concept. Once it all works, the remaining work is:

* merge rapidsai/gha-tools#118
* Move the opentelemetry-related install stuff in https://github.com/rapidsai/shared-workflows/compare/add-telemetry?expand=1#diff-ca6188672785b5d214aaac2bf77ce0528a48481b2a16b35aeb78ea877b2567bcR118-R125 into https://github.com/rapidsai/ci-imgs, and rebuild ci-imgs
* expand coverage to other shared workflows
* Incorporate the changes from this PR to other jobs and to other repos

Authors:
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#16924
Updates cmake to 3.28.6 in the JNI Dockerfile used to build the cudf jar.  This helps avoid a bug in older cmake where FindCUDAToolkit can fail to find cufile libraries.

Authors:
  - Jason Lowe (https://github.com/jlowe)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Gera Shegalov (https://github.com/gerashegalov)

URL: rapidsai#17342
Copy link

copy-pr-bot bot commented Nov 17, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package labels Nov 17, 2024
@@ -5,17 +5,17 @@
"args": {
"CUDA": "11.8",
"PYTHON_PACKAGE_MANAGER": "conda",
"BASE": "rapidsai/devcontainers:24.12-cpp-cuda11.8-mambaforge-ubuntu22.04"
"BASE": "rapidsai/devcontainers:25.02-cpp-cuda11.8-mambaforge-ubuntu22.04"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR targets 24.12. Why have all the dependencies been changed to 25.02?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR's title, name, description, and actual changes do not make sense to me.
Recommend closing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be an attempt at #13593, an issue I filed a while back. @the-gates-of-Zion I can help work with you on this, if you are interested. This should target branch-25.02, so I will retarget the PR.

@bdice bdice changed the base branch from branch-24.12 to branch-25.02 November 18, 2024 15:43
@bdice bdice removed request for a team, Matt711 and AyodeAwe November 18, 2024 15:43
@@ -5322,6 +5322,29 @@ def isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False):
5 False
dtype: bool
"""
warnings.warn(
"`cudf.close` is deprecated and will be removed in a future version of cudf. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"`cudf.close` is deprecated and will be removed in a future version of cudf. "
"`cudf.isclose` is deprecated and will be removed in a future version of cudf. "

Comment on lines +5327 to +5345
'''
import cupy as cp
import pandas as pd
from cudf.core.column import (
as_column,
)

a = pd.array([1.0, 2.0, None])
b = pd.array([1.0, 2.1, None])

a_col = as_column(a)
a_array = cupy.asarray(a_col.data_array_view(mode="read"))

b_col = as_column(b)
b_array = cupy.asarray(b_col.data_array_view(mode="read"))

result = cp.isclose(a, b, equal_nan=True)
print(result) # Output: [ True False True]
''',
Copy link
Contributor

@bdice bdice Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We noted in #13593: "As part of the deprecation, we decided to add a warning message indicating how to support nulls while using cupy.isclose." However, I am not sure if this is the right code snippet.

The code we want to give to the user as a replacement for this deprecated code path is essentially the current implementation in this method. Basically it has two steps:

  • Call cupy.isclose to see if the values are "close" to each other. This does not account for null values in the input.
  • Compare the null masks of each input. If equal_nan is False, we mark all null values as False ("not close") in the result. If equal_nan is True, we set the values where one column's input is null as False and the values where both columns' input are null as True.

I would suggest we add this snippet to the page "Working with missing data": https://docs.rapids.ai/api/cudf/stable/user_guide/missing-data/

Then we can link to this snippet in the docs from the deprecated method's docstring and in the deprecation warning.

Does that make sense? Let me know if you have questions.

@bdice bdice changed the title Pull request/#13593 Deprecate cudf.isclose Nov 18, 2024
davidwendt and others added 2 commits November 18, 2024 16:13
Moves `cpp/benchmarks/string/translate.cpp` implementation from google-bench to nvbench.
This is benchmark for the `cudf::strings::translate` API.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#17325
mroeschke and others added 16 commits December 17, 2024 01:38
…, ...)` type (rapidsai#17604)

From an offline discussion, a pandas object with an `category[interval[...]]` type would be incorrectly be interpreted as a `category[struct[...]]` type. This can cause further problems with `cudf.pandas` as a `category[struct[...]]` type cannot be properly interpreted by pandas.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17604
Clang-tidy does not like `[[nodiscard]]` after `__device__` and I don't like red squigly lines.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: rapidsai#17608
Recent changes in dask and dask-expr have broken `dask_cudf.read_csv` (dask/dask-expr#1178, dask/dask#11603). Fortunately, the breaking changes help us avoid legacy CSV code in the long run.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#17612
…i#17611)

A recent nightly failure discovered by @davidwendt here: https://github.com/rapidsai/cudf/actions/runs/12367794950/job/34543121050 indicates an environment cannot be created with `pytorch>=2.4.0` and `pyarrow==14.0.0 & 14.0.1`. Thus this bump to `14.0.2`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17611
This PR has two fixes:
- Since we're pinning to a commit, a shallow clone will start failing as soon as HEAD gets bumped on the main branch (which will happen next when cuml/raft logging features are merged). We need to stop using shallow clones.
- The CMake code for setting the default logging levels was setting the wrong macro name.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17588
…apidsai#17610)

Fixes memcheck error found in nightly build checks in the STREAM_REPLACE_TEST's `ReplaceTest.NormalizeNansAndZerosMutable` gtest. The mutable-view passed to the `cudf::normalize_nans_and_zeros` API was pointing to invalidated data.

The following line created the invalid view
```
cudf::mutable_column_view mutable_view = cudf::column(input, cudf::test::get_default_stream());
```
The temporary `cudf::column` is destroyed once the `mutable_view` is created so this view would now point to a freed column. The view must be created from a non-temporary column and also must be non-temporary itself so that it is not implicitly converted to a `column_view`.

Error introduced by rapidsai#17436

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#17610
Return a `cudf::detail::host_vector` from `flatten_single_pass_aggs` because this vector is eventually copied to the device and we might want to use pinned memory.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Karthikeyan (https://github.com/karthikeyann)
  - MithunR (https://github.com/mythrocks)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17605
…apidsai#17618)

We stopped running clang-tidy on test files in rapidsai#17078 so no reason to carry around these patches anymore.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17618
Simplifies telemetry a bit. More details at rapidsai/shared-actions#28.

Telemetry will still not be collected until @ajschmidt8 enables the TELEMETRY_ENABLED environment variable for this repo.

Authors:
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17615
@isVoid isVoid requested review from a team as code owners December 20, 2024 09:05
@isVoid isVoid requested review from harrism and nvdbaranec December 20, 2024 09:05
@github-actions github-actions bot removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package labels Dec 20, 2024
@isVoid
Copy link
Contributor

isVoid commented Dec 20, 2024

Closing the PR due to I accidentally rewrote the history of the previous commits.

@isVoid isVoid closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Deprecate and remove cudf.isclose.