Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting the dtype in all_null_like fails in polars #1238

Open
rcap107 opened this issue Feb 12, 2025 · 2 comments
Open

Setting the dtype in all_null_like fails in polars #1238

rcap107 opened this issue Feb 12, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@rcap107
Copy link
Contributor

rcap107 commented Feb 12, 2025

Describe the bug

polars.Series accepts only polars datatypes as value for the dtype argument, while pandas can take the datatype as string. As a result, testing fails if I use all_null_like and set the dtype.

This isn't checked in test_common.py either.

Steps/Code to Reproduce

import polars as pl
import pandas as pd
import skrub._dataframe as sbd

col = pd.Series([1,2,3])

# this works
sbd.all_null_like(col, dtype="float32")

col_pl = pl.from_pandas(col)

# this doesn't
sbd.all_null_like(col_pl, dtype="float32")

Expected Results

0   NaN
1   NaN
2   NaN
dtype: float32

Actual Results

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /home/rcappuzz/Projects/skrub/bug_allnull.py:3
      [1](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/bug_allnull.py:1) # %%
      [2](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/bug_allnull.py:2) col_pl = pl.from_pandas(col)
----> [3](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/bug_allnull.py:3) sdb.all_null_like(col_pl, dtype="float32")

File ~/.local/share/uv/python/cpython-3.10.15-linux-x86_64-gnu/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    [885](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/.local/share/uv/python/cpython-3.10.15-linux-x86_64-gnu/lib/python3.10/functools.py:885) if not args:
    [886](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/.local/share/uv/python/cpython-3.10.15-linux-x86_64-gnu/lib/python3.10/functools.py:886)     raise TypeError(f'{funcname} requires at least '
    [887](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/.local/share/uv/python/cpython-3.10.15-linux-x86_64-gnu/lib/python3.10/functools.py:887)                     '1 positional argument')
--> [889](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/.local/share/uv/python/cpython-3.10.15-linux-x86_64-gnu/lib/python3.10/functools.py:889) return dispatch(args[0].__class__)(*args, **kw)

File ~/Projects/skrub/skrub/_dataframe/_common.py:323, in _all_null_like_polars(col, length, dtype, name)
    [321](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrub/_dataframe/_common.py:321) if name is None:
    [322](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrub/_dataframe/_common.py:322)     name = col.name
--> [323](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrub/_dataframe/_common.py:323) return pl.Series(name, [None] * length, dtype=dtype)

File ~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/series/series.py:272, in Series.__init__(self, name, values, dtype, strict, nan_to_null)
    [270](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/series/series.py:270)     dtype = None
    [271](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/series/series.py:271) elif dtype is not None and not is_polars_dtype(dtype):
--> [272](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/series/series.py:272)     dtype = parse_into_dtype(dtype)
    [274](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/series/series.py:274) # Handle case where values are passed as the first argument
    [275](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/series/series.py:275) original_name: str | None = None

File ~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:57, in parse_into_dtype(input)
     [55](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:55)     return _parse_union_type_into_dtype(input)
     [56](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:56) else:
---> [57](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:57)     return parse_py_type_into_dtype(input)

File ~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:103, in parse_py_type_into_dtype(input)
    [101](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:101)     return _parse_generic_into_dtype(input)
    [102](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:102) else:
--> [103](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:103)     _raise_on_invalid_dtype(input)

File ~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:181, in _raise_on_invalid_dtype(input)
    [179](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:179) input_detail = "" if type(input) is type else f" (given: {input!r})"
    [180](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:180) msg = f"cannot parse input {input_type} into Polars data type{input_detail}"
--> [181](https://file+.vscode-resource.vscode-cdn.net/home/rcappuzz/Projects/skrub/~/Projects/skrub/skrenv/lib/python3.10/site-packages/polars/datatypes/_parse.py:181) raise TypeError(msg) from None

TypeError: cannot parse input of type 'str' into Polars data type (given: 'float32')

Versions

System:
    python: 3.10.15 (main, Sep  9 2024, 22:15:21) [Clang 18.1.8 ]
executable: /home/rcappuzz/Projects/skrub/skrenv/bin/python
   machine: Linux-6.8.0-51-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.6.1
          pip: None
   setuptools: 75.6.0
        numpy: 2.2.2
        scipy: 1.15.1
       Cython: None
       pandas: 2.2.3
   matplotlib: 3.10.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libscipy_openblas
       filepath: /home/rcappuzz/Projects/skrub/skrenv/lib/python3.10/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so
        version: 0.3.28
threading_layer: pthreads
   architecture: SkylakeX

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libscipy_openblas
       filepath: /home/rcappuzz/Projects/skrub/skrenv/lib/python3.10/site-packages/scipy.libs/libscipy_openblas-68440149.so
        version: 0.3.28
threading_layer: pthreads
   architecture: SkylakeX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libgomp
       filepath: /home/rcappuzz/Projects/skrub/skrenv/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
0.6.dev0
@rcap107 rcap107 added the bug Something isn't working label Feb 12, 2025
@rcap107
Copy link
Contributor Author

rcap107 commented Feb 12, 2025

I tried a few things to fix the issue, but what fixes one problem breaks another. I wonder if it would be easier to have some function in _common.py that returns/casts based on the string rather than on the specific dtype, similar to what is done in the conftest for the df_module 🤔

@jeromedockes
Copy link
Member

why do you need to pass a string?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants