You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TableVectorizer currently accepts both dataframes and numpy arrays as inputs. It outputs numpy arrays.
We suggest to drop numpy array input support for several reasons:
Data scientists mainly work with dataframes. They seldom manipulate numpy arrays within models.
TableVectorizer is designed to dispatch encoders based on dtypes. As numpy arrays only have a single dtype, supporting them displays the wrong message and defeats the purpose of the TableVectorizer.
Handling both dataframes and numpy array inputs obfuscates the logic.
Looking ahead, it's a good step toward only using dataframes operations within TableVectorizer, without numpy conversion that might make copies and break laziness.
Feature Description
Raise errors when numpy arrays are passed to the TableVectorizer.
Alternative Solutions
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Problem Description
As suggested in https://github.com/skrub-data/skrub/pull/786/files#r1376567227.
TableVectorizer
currently accepts both dataframes and numpy arrays as inputs. It outputs numpy arrays.We suggest to drop numpy array input support for several reasons:
TableVectorizer
is designed to dispatch encoders based on dtypes. As numpy arrays only have a single dtype, supporting them displays the wrong message and defeats the purpose of theTableVectorizer
.TableVectorizer
, without numpy conversion that might make copies and break laziness.Feature Description
Raise errors when numpy arrays are passed to the
TableVectorizer
.Alternative Solutions
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: