-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix type of empty Index
and raise warning in Series
constructor
#14116
Conversation
@@ -2840,7 +2841,7 @@ def test_series_all_null(num_elements, null_type): | |||
@pytest.mark.parametrize("num_elements", [0, 2, 10, 100]) | |||
def test_series_all_valid_nan(num_elements): | |||
data = [np.nan] * num_elements | |||
sr = cudf.Series(data, nan_as_null=False) | |||
sr = _create_cudf_series(data, nan_as_null=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
For num_elements == 0
, the null count should still be zero even if it interprets the dtype as string/object (so the test is still valid, though it might warn?).
For num_elements > 0
, the data should be non-empty and thus be correctly interpreted as a floating type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For
num_elements == 0
, the null count should still be zero even if it interprets the dtype as string/object (so the test is still valid, though it might warn?).
Yup, this is the reason for this change.
@@ -4073,28 +4074,28 @@ def test_empty_dataframe_describe(): | |||
|
|||
|
|||
def test_as_column_types(): | |||
col = column.as_column(cudf.Series([])) | |||
col = column.as_column(_create_cudf_series([])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and below -- I'm not sure if I understand why these helper functions are needed. It seems like the test construction might need subtle changes but I don't think this casting helper function is the right answer. It's not announcing what it is actually doing -- the differences between cudf.Series
and _create_cudf_series
should be in the name, if this is needed (like _create_cudf_series_float64_default
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, reverted the change here and passed explicit dtypes. Renamed the helper to _create_cudf_series_float64_default
and _create_pandas_series_float64_default
. These will be removed in pandas-2.0
upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving -- but let's make sure to remove these testing helper functions early in the pandas 2.0 migration.
/merge |
Calling `cudf.Index([])` results in `str` dtype `Index`. This PR fixes an issue with a pytorch related pytest by explicitly passing a `float64` dtype. xref: #14116 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - https://github.com/brandon-b-miller URL: #14198
Description
Fixes: #14091
This PR fixes empty inputs dtype in
Index
to default tostr
instead offloat64
. Another change is there is a deprecation warning forSeries
constructor to match pandas.Checklist