Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] further work on GLMRegressor interfacing statsmodels GLM #230

Closed
fkiraly opened this issue Mar 31, 2024 · 3 comments · Fixed by #384
Closed

[ENH] further work on GLMRegressor interfacing statsmodels GLM #230

fkiraly opened this issue Mar 31, 2024 · 3 comments · Fixed by #384
Labels
feature request New feature or request good first issue Good for newcomers interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:regression probabilistic regression module

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 31, 2024

From #222, remaining work items to interface statsmodels GLM:

  • currently, only gaussian family is implemented. Further families should be interfaced from statsmodels, in particular Gamma and Tweedie. [ENH] Multiple link function support for GLMs #383
  • we should try to cover as many parameters as we can in get_test_params, currently coverage is low.
  • the docstring should say, for every parameter, what possible values are. E.g., what are possible values for cov_type, method, what are expected sizes for start_params, etc.
  • some of the parameters of statsmodels GLM are not exposed to the user, as they require array-like input which is unavailable in predict, e.g., offset, exposure. It should be investigated how these could be interfaced, with a sensible treatment in predict where applicable.
@fkiraly fkiraly added good first issue Good for newcomers module:regression probabilistic regression module interfacing algorithms Interfacing existing algorithms/estimators from third party packages feature request New feature or request labels Mar 31, 2024
@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 31, 2024

FYI @julian-fong, I've moved the work items to here, just to keep track.

@julian-fong
Copy link
Contributor

Thank you Franz, I'll look into adding these features when i get a spare moment (working on my GSoC proposal!). I'm not too familiar with the tweedie distribution, does it require to have its own probability distribution on skpro as well?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 1, 2024

I'm not too familiar with the tweedie distribution, does it require to have its own probability distribution on skpro as well?

Yes, although it does not seem too easy - it is not available in scipy or tensorflow_proba, and it is quite tedious to implement.

sklearn has a TweedieRegressor, which is GLM with Tweedie, but it does not return distribution parameters or a distribution, so it's strange why sklearn would have that in the first place.

Might be too much work for too little benefit. Anyway, gamma is a special case, which is more straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:regression probabilistic regression module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants