-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add parsers #848
[WIP] Add parsers #848
Conversation
af4c152
to
01618d7
Compare
I'll close this PR and open a new one to help reviewing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this, @Vincent-Maladiere !
could you merge the main branch? now that #819 has been merged it should reduce the diff quite a bit
I'm not sure about the term "parsers/parsing". it doesn't apply to categories nor to non-string input columns, and for numbers and even dates the "parsing" is rather simple. maybe "CategoryConverter", "ToCategory", "ToCategoryTransformer" instead of "CategoryParser"?
return pd.to_numeric(column, errors=self.errors) | ||
|
||
|
||
def _union_category(column, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be best if transform didn't modify the state of the transformer and if the output type of transform was always the same (rather than different categorical dtypes). maybe an "unknown" category could be systematically added and unseen categories would go in there?
closing this as it has been superseded by #902 and other PRs. thanks @Vincent-Maladiere ! |
Introduce _DatetimeParser, _NumericParser, _CategoryParser