Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public APIs to Access Underlying cudf and pandas Objects from cudf.pandas Proxy Objects #17629

Open
wants to merge 1 commit into
base: branch-25.02
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/cudf/source/cudf_pandas/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,24 @@ cuDF (learn more in [this
blog](https://medium.com/rapids-ai/easy-cpu-gpu-arrays-and-dataframes-run-your-dask-code-where-youd-like-e349d92351d)) and the [RAPIDS Accelerator for Apache Spark](https://nvidia.github.io/spark-rapids/)
provides a similar configuration-based plugin for Spark.


## Recommendation for libraries that are type aware.

When working with `cudf.pandas` proxy objects, it is important to access the real underlying objects to ensure compatibility with libraries that are `cudf` or `pandas` aware. You can use the following methods to retrieve the actual `cudf` or `pandas` objects:

- `get_cudf_pandas_fast_object()`: This method returns the fast `cudf` object from the proxy.
- `get_cudf_pandas_slow_object()`: This method returns the slow `pandas` object from the proxy.

Here is an example of how to use these methods:

```python
# Assuming `proxy_obj` is a cudf.pandas proxy object
fast_obj = proxy_obj.get_cudf_pandas_fast_object()
slow_obj = proxy_obj.get_cudf_pandas_slow_object()

# Now you can use `fast_obj` and `slow_obj` with libraries that are cudf or pandas aware
```

(are-there-any-known-limitations)=
## Are there any known limitations?

Expand Down
8 changes: 8 additions & 0 deletions python/cudf/cudf/pandas/fast_slow_proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,12 @@ def _fsproxy_fast_to_slow(self):
return fast_to_slow(self._fsproxy_wrapped)
return self._fsproxy_wrapped

def get_cudf_pandas_fast_object(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend that we avoid fast/slow names in the public API. I think get_cudf_object() is sufficient. Or possibly as_cudf()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we use reserved names like __as_cudf__() to make it more obvious that this is a protocol for library developers and not intended for users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We proxy numpy objects too, we will also have to keep that in mind to name these API. How about these names:

  1. __as_fast_object__()
  2. __as_gpu_object__()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like (2). Are there any other public APIs involving words like fast/slow or GPU/CPU?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a dunder name is appropriate. Those are typically for protocols that may be implemented across libraries as a standard, not something that single library decides on for itself. I do agree with avoiding fast/slow names, although we may eventually have to rework this if we rip the proxy out of cudf to make it easier for libraries like cuml to reuse. For now as_gpu_object seems good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dunder idea was that maybe this should be reserved for other libraries as a documented “protocol” and not presented as a user API. Closer to what IPython does with fancy reprs than the Array API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're thinking of IPython's _repr_html_, note that that is not a dunder but rather uses single underscores on either side (e.g. HTML._repr_html_). Conversely, the __html__ protocol is cross-library and implemented by other libraries (although I don't know exactly how wide the adoption is, I have seen it in a few other places)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for the correction.

return self._fsproxy_slow_to_fast()

def get_cudf_pandas_slow_object(self):
return self._fsproxy_fast_to_slow()

@property # type: ignore
def _fsproxy_state(self) -> _State:
return (
Expand All @@ -221,6 +227,8 @@ def _fsproxy_state(self) -> _State:
"_fsproxy_slow_type": slow_type,
"_fsproxy_slow_to_fast": _fsproxy_slow_to_fast,
"_fsproxy_fast_to_slow": _fsproxy_fast_to_slow,
"get_cudf_pandas_fast_object": get_cudf_pandas_fast_object,
"get_cudf_pandas_slow_object": get_cudf_pandas_slow_object,
"_fsproxy_state": _fsproxy_state,
}

Expand Down
6 changes: 6 additions & 0 deletions python/cudf/cudf_pandas_tests/test_cudf_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -1885,3 +1885,9 @@ def test_dataframe_setitem():
new_df = df + 1
df[df.columns] = new_df
tm.assert_equal(df, new_df)


def test_dataframe_get_fast_slow_methods():
df = xpd.DataFrame({"a": [1, 2, 3], "b": [1, 2, 3]})
assert isinstance(df.get_cudf_pandas_fast_object(), cudf.DataFrame)
assert isinstance(df.get_cudf_pandas_slow_object(), pd.DataFrame)
Loading