Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop numpy array input support for TableVectorizer #830

Closed
Vincent-Maladiere opened this issue Nov 21, 2023 · 1 comment
Closed

Drop numpy array input support for TableVectorizer #830

Vincent-Maladiere opened this issue Nov 21, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@Vincent-Maladiere
Copy link
Member

Vincent-Maladiere commented Nov 21, 2023

Problem Description

As suggested in https://github.com/skrub-data/skrub/pull/786/files#r1376567227.

TableVectorizer currently accepts both dataframes and numpy arrays as inputs. It outputs numpy arrays.
We suggest to drop numpy array input support for several reasons:

  • Data scientists mainly work with dataframes. They seldom manipulate numpy arrays within models.
  • TableVectorizer is designed to dispatch encoders based on dtypes. As numpy arrays only have a single dtype, supporting them displays the wrong message and defeats the purpose of the TableVectorizer.
  • Handling both dataframes and numpy array inputs obfuscates the logic.
  • Looking ahead, it's a good step toward only using dataframes operations within TableVectorizer, without numpy conversion that might make copies and break laziness.

Feature Description

Raise errors when numpy arrays are passed to the TableVectorizer.

Alternative Solutions

No response

Additional Context

No response

@TheooJ
Copy link
Contributor

TheooJ commented May 28, 2024

Closing because this has been addressed in #902 with CheckInputDataFrame

@TheooJ TheooJ closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants