Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add parsers #848

Closed
wants to merge 23 commits into from

Conversation

Vincent-Maladiere
Copy link
Member

Introduce _DatetimeParser, _NumericParser, _CategoryParser

@Vincent-Maladiere
Copy link
Member Author

I'll close this PR and open a new one to help reviewing

Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this, @Vincent-Maladiere !

could you merge the main branch? now that #819 has been merged it should reduce the diff quite a bit

I'm not sure about the term "parsers/parsing". it doesn't apply to categories nor to non-string input columns, and for numbers and even dates the "parsing" is rather simple. maybe "CategoryConverter", "ToCategory", "ToCategoryTransformer" instead of "CategoryParser"?

return pd.to_numeric(column, errors=self.errors)


def _union_category(column, dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be best if transform didn't modify the state of the transformer and if the output type of transform was always the same (rather than different categorical dtypes). maybe an "unknown" category could be systematically added and unseen categories would go in there?

@jeromedockes
Copy link
Member

closing this as it has been superseded by #902 and other PRs. thanks @Vincent-Maladiere !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants