Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] dummy supervised regressor with polars support #440

Open
julian-fong opened this issue Aug 1, 2024 · 4 comments
Open

[ENH] dummy supervised regressor with polars support #440

julian-fong opened this issue Aug 1, 2024 · 4 comments
Labels
feature request New feature or request module:regression probabilistic regression module

Comments

@julian-fong
Copy link
Contributor

Implement the DummyProbaRegressor but with complete end to end support in skpro.

Some current limitations:

fit inside DummyProbaRegressor uses skpro.distributions which only supports pandas dataframes - needs a workaround

predict_proba also uses skpro.distributions - leading to the same issue, will need a workaround as well

@fkiraly any suggestions on how to implement?

@julian-fong julian-fong added the feature request New feature or request label Aug 1, 2024
@fkiraly fkiraly added the module:regression probabilistic regression module label Aug 2, 2024
@julian-fong
Copy link
Contributor Author

julian-fong commented Aug 11, 2024

@fkiraly I've come into a problem with the current implementation for polars support in skpro.

if an estimator specifies

"X_inner_mtype": "polars_eager_table",
"y_inner_mtype": "polars_eager_table",

Then during the tests, pandas DataFrames will get converted into polars dataframes via check_X in the boilerplate code in regression.base but they will lose their index

Since the index is already lost via the boilerplate code check_X, it is not retrievable when calling the private methods (since the input is already in polars dataframe format without the index). This will then fail subsequent index asserts in test files after the DataFrame is converted back into a pandas DataFrame via the convert function.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 12, 2024

Interesting - I thought it saved the index as a variable __index__ if it was not a range index.

Or, is that only in the sktime implementation by @pranavvp16 ?

@julian-fong
Copy link
Contributor Author

I think that would be in the sktime implementation, we do not save the index anywhere currently in the boilerplate if the incoming mtype is in polars format

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 13, 2024

May I suggest to try syncing the two implementations? I think the sktime type by @pranavvp16 stores non-range index as a reserved variable.

fkiraly pushed a commit that referenced this issue Aug 18, 2024
adds index support as part of #440 and is used to sync up polars
conversion utilities between skpro and sktime.

Correponding sktime pr for polars conversion utilities is
sktime/sktime#6455.

In this pr:

If a pandas Dataframe is a `from_type` and polars frame is a `to_type`
then during the conversion, we will save the index (assumed never to be
in multi-index format) and insert it as an individual column with column
name `__index__`. Then the resulting pandas dataframe will be converted
to a polars dataframe.

In the inverse function, if we are converting from polars dataframe to
pandas dataframe, if the column `__index__` exists in the pandas
dataframe post-conversion, then we will map that column to the index
before returning the pandas Dataframe

After this is merged, #447 will be implemented as a `polars` only
estimator. tests will also be written to check polars input end to end
and pandas input and output through the polars estimator (i.e pandas
input into polars estimator -> polars predictions -> pandas output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request module:regression probabilistic regression module
Projects
None yet
Development

No branches or pull requests

2 participants