Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Python 3.12, update Kafka dependencies to 2.5.x #16745

Merged
merged 13 commits into from
Sep 6, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Sep 4, 2024

Description

Contributes to rapidsai/build-planning#40

This PR adds support for Python 3.12.

Other changes required to add that support:

  • updating librdkafka / python-confluent-kafka, 1.9.* -> 2.5.* (link to thread)
  • removing use of ast.Num in syntax tree parsing, in favor of checking the .value of an ast.Constant against a hard-coded list of builtin types (link to thread)
  • ignoring deprecation warnings about datetime.datetime.utcnow() (link to thread)
  • skipping doctests that end up running repr() on an OrderedDict (link to thread)

Notes for Reviewers

This is part of ongoing work to add Python 3.12 support across RAPIDS.
It temporarily introduces a build/test matrix including Python 3.12, from rapidsai/shared-workflows#213.

A follow-up PR will revert back to pointing at the branch-24.10 branch of shared-workflows once all
RAPIDS repos have added Python 3.12 support.

This will fail until all dependencies have been updates to Python 3.12

CI here is expected to fail until all of this project's upstream dependencies support Python 3.12.

This can be merged whenever all CI jobs are passing.

@jameslamb jameslamb added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Sep 4, 2024
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars labels Sep 4, 2024
@jameslamb jameslamb changed the title Add support for Python 3.12 WIP: Add support for Python 3.12 Sep 4, 2024
@@ -39,7 +39,7 @@ requirements:
- python
- rapids-build-backend >=0.3.0,<0.4.0.dev0
- setuptools
- python-confluent-kafka >=1.9.0,<1.10.0a0
- python-confluent-kafka >=2.5.0,<2.6.0a0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't any python-confluent-kafka releases in the 1.x series that have Python 3.12 packages on conda-forge (https://anaconda.org/conda-forge/python-confluent-kafka).

Resulting in build errors like this:

  LibMambaUnsatisfiableError: Encountered problems while solving:
    - package python-confluent-kafka-1.9.0-py310h5764c6d_0 requires python_abi 3.10.* *_cp310, but none of the providers can be installed
  
  Could not solve for environment specs
  The following packages are incompatible
  ├─ python-confluent-kafka >=1.9.0,<1.10.0a0  is installable with the potential options
  │  ├─ python-confluent-kafka [1.9.0|1.9.2] would require
  │  │  └─ python_abi 3.10.* *_cp310, which can be installed;
  │  ├─ python-confluent-kafka [1.9.0|1.9.2] would require
  │  │  └─ python_abi 3.7.* *_cp37m, which can be installed;
  │  ├─ python-confluent-kafka [1.9.0|1.9.2] would require
  │  │  └─ python_abi 3.8.* *_cp38, which can be installed;
  │  ├─ python-confluent-kafka [1.9.0|1.9.2] would require
  │  │  └─ python_abi 3.9.* *_cp39, which can be installed;
  │  └─ python-confluent-kafka 1.9.2 would require
  │     └─ python_abi 3.11.* *_cp311, which can be installed;
  └─ python 3.12**  is not installable because there are no viable options
     ├─ python [3.12.0|3.12.1|...|3.12.5] would require
     │  └─ python_abi 3.12.* *_cp312, which conflicts with any installable versions previously reported;
     └─ python 3.12.0rc3 would require
        └─ _python_rc, which does not exist (perhaps a missing channel).

(build link)

Pushed a commit (8c206d7) which updates python-confluent-kafka and librdkafka to the latest minor version (the 2.5.x series) and continues the practice to pinning build and runtime dependencies within a single minor version of Kafka libraries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the Kafka dependencies is fine with me.

cc: @jdye64 for awareness.

@@ -120,7 +120,7 @@ def visit_Name(self, node):
self.stack.append(ColumnReference(col_id))

def visit_Constant(self, node):
if not isinstance(node, (ast.Num, ast.Str)):
if not isinstance(node.value, (float, int, str, complex)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of ast.Num here resulted in a deprecation warning that was treated as an error in tests (thanks @mroeschke !).

DeprecationWarning: ast.Num is deprecated and will be removed in Python 3.14; use ast.Constant instead

(build link)

I looked into this... seems ast.Num and other specialized constants have been deprecated since Python 3.8 and are set to be removed in Python 3.14.

This proposes checking the value of the ast.Constant being passed in here against a hard-coded set of builtins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. Thanks.

@jameslamb jameslamb changed the title WIP: Add support for Python 3.12 WIP: Add support for Python 3.12, update Kafka dependencies to 2.5.x Sep 5, 2024
@@ -2359,7 +2359,7 @@ def to_dict(
You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
>>> df.to_dict(into=OrderedDict) # doctest: +SKIP
Copy link
Member Author

@jameslamb jameslamb Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with Python 3.12, the repr() representation for an OrderedDict has changed:

(thanks @bdice for pointing me to that changelog item!)

That's causing doctest failures like this:

E       AssertionError: 1 of 13 doctests failed for DataFrame.to_dict:
E         **********************************************************************
E         File "/opt/conda/envs/test/lib/python3.12/site-packages/cudf/core/dataframe.py", line 2362, in DataFrame.to_dict
E         Failed example:
E             df.to_dict(into=OrderedDict)
E         Expected:
E             OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
E                          ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
E         Got:
E             OrderedDict({'col1': OrderedDict({'row1': 1, 'row2': 2}), 'col2': OrderedDict({'row1': 0.5, 'row2': 0.75})})
E         **********************************************************************
E         1 items had failures:
E            1 of  13 in DataFrame.to_dict
E         ***Test Failed*** 1 failures.
E         
E       assert not 1
E        +  where 1 = TestResults(failed=1, attempted=13).failed

/__w/cudf/cudf/python/cudf/cudf/tests/test_doctests.py:125: AssertionError

(build link)

This proposes just skipping those tests, here and in the docs for Series.to_dict().

@@ -8,6 +8,8 @@ filterwarnings =
error
ignore:::.*xdist.*
ignore:::.*pytest.*
# some third-party dependencies (e.g. 'boto3') still using datetime.datetime.utcnow()
ignore:.*datetime.*utcnow.*scheduled for removal.*:DeprecationWarning
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the cudf and dask_cudf tests on Python 3.12, there are a few dozen cases where DeprecationWarning like the following are treated as errors

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

(build link)

Looking through the logs, those all appear to be coming from code in boto3 / botocore that we don't control. There are no direct uses of datetime.datetime.utcnow() in this repo.

git grep utcnow

So this is proposing not failing tests when those warnings are raised.

Context on the deprecation of datetime.datetime.utcnow():

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to scope this more narrowly? I haven't looked at the error logs to see how widespread this problem is, but I wonder if we can use something like a local context manager to wrap the places where this is occurring.

Let's not treat this as a blocker for this PR -- but we might want to make it a follow-up task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It's coming from this codepath in botocore

self = <botocore.auth.S3SigV4Auth object at 0x7f6feb62f380>
request = <botocore.awsrequest.AWSRequest object at 0x7f700a3b3800>

    def add_auth(self, request):
        if self.credentials is None:
            raise NoCredentialsError()
>       datetime_now = datetime.datetime.utcnow()
E       DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

https://github.com/rapidsai/cudf/actions/runs/10724599183/job/29742782194#step:7:4279

If it's possible to ignore warnings from specific dependencies, that'd be useful. I couldn't figure out how to do that in a few minutes of googling and docs reading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible to ignore warnings from specific dependencies, that'd be useful. I couldn't figure out how to do that in a few minutes of googling and docs reading.

In a warning filter, after specifying the warning category you can also specify :module:line https://docs.python.org/3/library/warnings.html#describing-warning-filters

So in a follow up PR, you can add :botocore at the end to just ignore this warning for that module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mroeschke , I finally went back and did this. You suggestion worked perfectly 😁

#16896

@jameslamb jameslamb changed the title WIP: Add support for Python 3.12, update Kafka dependencies to 2.5.x Add support for Python 3.12, update Kafka dependencies to 2.5.x Sep 5, 2024
@jameslamb jameslamb marked this pull request as ready for review September 5, 2024 22:42
@jameslamb jameslamb requested review from a team as code owners September 5, 2024 22:42
@jameslamb jameslamb requested a review from msarahan September 5, 2024 22:42
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments -- none are blocking. Let's merge this if CI passes and decide whether to pursue any follow-up tasks.

@@ -39,7 +39,7 @@ requirements:
- python
- rapids-build-backend >=0.3.0,<0.4.0.dev0
- setuptools
- python-confluent-kafka >=1.9.0,<1.10.0a0
- python-confluent-kafka >=2.5.0,<2.6.0a0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the Kafka dependencies is fine with me.

cc: @jdye64 for awareness.

@@ -120,7 +120,7 @@ def visit_Name(self, node):
self.stack.append(ColumnReference(col_id))

def visit_Constant(self, node):
if not isinstance(node, (ast.Num, ast.Str)):
if not isinstance(node.value, (float, int, str, complex)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. Thanks.

@@ -8,6 +8,8 @@ filterwarnings =
error
ignore:::.*xdist.*
ignore:::.*pytest.*
# some third-party dependencies (e.g. 'boto3') still using datetime.datetime.utcnow()
ignore:.*datetime.*utcnow.*scheduled for removal.*:DeprecationWarning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to scope this more narrowly? I haven't looked at the error logs to see how widespread this problem is, but I wonder if we can use something like a local context manager to wrap the places where this is occurring.

Let's not treat this as a blocker for this PR -- but we might want to make it a follow-up task.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 7018a33 into rapidsai:branch-24.10 Sep 6, 2024
104 checks passed
@jameslamb jameslamb deleted the python-3.12 branch September 6, 2024 04:09
rapids-bot bot pushed a commit that referenced this pull request Sep 6, 2024
#16745 added support for Python 3.12 in this project.

When that was merged, nightly `unit-tests-cudf-pandas` jobs on Python 3.12 started failing, with errors from compiling `pandas`: ([build link](https://github.com/rapidsai/cudf/actions/runs/10733915866/job/29768130164))

That's only happening because we're running `pip install pandas==2.1` in those jobs, which matches exactly `pandas==2.1.0`, which does not have Python 3.12 wheels on PyPI (https://pypi.org/project/pandas/2.1.0/#files).

```text
Collecting pandas==2.1
  Downloading pandas-2.1.0.tar.gz (4.3 MB)
```

`pandas==2.1.1` DOES have Python 3.12 wheels on PyPI (https://pypi.org/project/pandas/2.1.1/#files).

To fix those jobs, this proposes allowing the patch version of `pandas` installed in those CI jobs to float:

* before: `pip install pandas==2.1`
* after: `pip install pandas==2.1.*`

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Bradley Dice (https://github.com/bdice)

URL: #16763
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Sep 9, 2024
Follow-up to #1380.

Now that both `cudf` (rapidsai/cudf#16745) and `ucxx` (rapidsai/ucxx#276) have Python 3.12 wheels available, it should be possible to test `dask-cuda` against Python 3.12 in CI.

This proposes that.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1382
rapids-bot bot pushed a commit that referenced this pull request Sep 27, 2024
Proposes some small changes I've taken as follow-ups from previous work here.

* #16745 filtered out all linter warnings about uses of `datetime.utcnow()` ... this PR limits that to only the warnings observed from `botocore` (so that the linter will helpfully warn us about such uses directly in `cudf`)
  - ref #16745 (comment)
* reduces the verbosity of logs for wheel builds (`-vvv` to `-v`)
  - similar to rapidsai/cugraph#4651

## Notes for Reviewers

This is intentionally targeted at `24.12`. No need to rush this into 24.10 before code freeze.

### How I tested this

<details><summary>locally in docker (click me)</summary>

```shell
docker run \
    --rm \
    --gpus 1 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:latest \
    bash

pip install \
    --prefer-binary \
    'cudf-cu12[test]==24.10.*,>=0.0.0a0' \
    'flask' \
    'flask-cors' \
    'moto>=4.0.8' \
    'boto3' \
    's3fs>=2022.3.0'

cd ./python/cudf

pytest \
    cudf/tests/test_s3.py
```

</details>

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #16896
copy-pr-bot bot pushed a commit that referenced this pull request Sep 28, 2024
Proposes some small changes I've taken as follow-ups from previous work here.

* #16745 filtered out all linter warnings about uses of `datetime.utcnow()` ... this PR limits that to only the warnings observed from `botocore` (so that the linter will helpfully warn us about such uses directly in `cudf`)
  - ref #16745 (comment)
* reduces the verbosity of logs for wheel builds (`-vvv` to `-v`)
  - similar to rapidsai/cugraph#4651

## Notes for Reviewers

This is intentionally targeted at `24.12`. No need to rush this into 24.10 before code freeze.

### How I tested this

<details><summary>locally in docker (click me)</summary>

```shell
docker run \
    --rm \
    --gpus 1 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:latest \
    bash

pip install \
    --prefer-binary \
    'cudf-cu12[test]==24.10.*,>=0.0.0a0' \
    'flask' \
    'flask-cors' \
    'moto>=4.0.8' \
    'boto3' \
    's3fs>=2022.3.0'

cd ./python/cudf

pytest \
    cudf/tests/test_s3.py
```

</details>

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #16896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants