[ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor #423

fkiraly · 2024-07-11T21:12:15Z

We should try to interface TweedieRegressor from sklearn as an skpro regressor.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.TweedieRegressor.html

Notes on implementation:

the current adapter will not work because it does not follow the return_std interface, but we can use _prep_skl_df.
We would need a Tweedie distribution in skpro, currently it is not implemented.
Tweedie has three parameters: power, location, scale. Power is set fixed in the sklearn TweedieRegressor, location is returned by predict, but it is unclear whether scale can be obtained from it. Perhaps @fsaforo1 has insight on this point.

FYI @ShreeshaM07, this is very similar to your previous work on statsmodels GLM!

The text was updated successfully, but these errors were encountered:

ShreeshaM07 · 2024-07-15T12:47:19Z

Some points regarding the same

return_std is not available in case of TweedieRegressor in the predict method of sklearn so we may not be able to find the value of scale in the cases when the underlying distribution requires it for ex Normal.
Since this is just an extension of GLMRegressor why can we not just interface the Tweedie distribution and then add it in the family parameter of GLMRegressor? Not really sure where we can interface the distribution from though.

A doubt regarding the TweedieRegressor, is it not just an interface to possible regressors for different families for ex Poisson,Gaussian,Gamma ? So then is there any difference in implementing the TweedieRegressor if it is just going to expose these different regressors ?

fkiraly · 2024-07-15T14:31:35Z

To answer these:

I do not think this would be an extension of GLMRegressor, that interfaces the GLM from statsmodels. The sklearn TweedieRegressor is a completely different object. Of course it would be nice to add support for the Tweedie in statsmodels, that is a different, useful issue, and may meet the use case of @fsafaro1.
this scipy issue discusses the Tweedie distribution: Add Tweedie distributions to scipy.stats scipy/scipy#11291 (comment) and concludes that the scipy interface is not general enough because it is mixed type. skpro is general enough, so with the pointers in there we could implement it, either entirely from scratch, or interfacing some of the component functions such as Bessel.
for the sklearn Tweedie regressor, the remaining quesiton is still where to get the scale from. It would not be much of a Tweedie regressor if tha twould be impossible to obtain...

is it not just an interface to possible regressors for different families for ex Poisson,Gaussian,Gamma

yes, but for non-integer p parameter these are very specific families that are also not available yet. It is a good question whether the distribution should internally decompose in these case distinctions.

ShreeshaM07 · 2024-07-16T19:04:22Z

this scipy issue discusses the Tweedie distribution: scipy/scipy#11291 (comment) and concludes that the scipy interface is not general enough because it is mixed type. skpro is general enough, so with the pointers in there we could implement it, either entirely from scratch, or interfacing some of the component functions such as Bessel.

From the conversation I can infer that we can implement this in skpro as it allows for mixed type distributions with pdf and pmf in different intervals. https://lorentzen.ch/index.php/2024/06/17/a-tweedie-trilogy-part-iii-from-wrights-generalized-bessel-function-to-tweedies-compound-poisson-distribution/ seems to be a very informative post explaining the Tweedie distribution. It also gives code snippet for the pdf and pmf of the function compound poisson and gamma function.

import numpy as np
from scipy.special import wright_bessel


def cpg_pmf(mu, phi, p):
    """Compound Poisson Gamma point mass at zero."""
    return np.exp(-np.power(mu, 2 - p) / (phi * (2 - p)))

def cpg_pdf(x, mu, phi, p):
    """Compound Poisson Gamma pdf."""
    if not (1 < p < 2):
        raise ValueError("1 < p < 2 required")
    theta = np.power(mu, 1 - p) / (1 - p)
    kappa = np.power(mu, 2 - p) / (2 - p)
    alpha = (2 - p) / (1 - p)
    t = ((p - 1) * phi / x)**alpha
    t /= (2 - p) * phi
    a = 1 / x * wright_bessel(-alpha, 0, t)
    return a * np.exp((x * theta - kappa) / phi)

This can be utilized along with the usage of the wright_bessel function in scipy.special.

for the sklearn Tweedie regressor, the remaining quesiton is still where to get the scale from. It would not be much of a Tweedie regressor if tha twould be impossible to obtain...

I think there is a very round about way to do this by passing the x value to PoissonRegressor and GammaRegressor separately and finding out the values of lambda,a and b.

As we know the mean=return of predict we know p power parameter is fixed. We can calculate phi or scale using the formula below . Is it not possible that way?

ShreeshaM07 · 2024-07-16T19:51:27Z

Some thought on the Tweedie Distribution

Since it is distinguished into type of distribution using the power parameter itself we can just call pdf of Normal when pw=0 where pw is the power parameter, call pdf of Poisson when pw=1, pdf of Gamma when pw=2 and call the code snippet in the above comment when p is in (1,2)

fkiraly · 2024-07-17T20:21:10Z

From the conversation I can infer that we can implement this in skpro as it allows for mixed type distributions with pdf and pmf in different intervals.

Yes, assuming you mean the p parameter. In places where the distribution is entirely discrete or continuous, the pdf or pmf will return zero.

Further, here's an interesting option, since multiple already implemented distributions figure as special cases:

we could implement the individual families separately, e.g., compound Poisson-Gamma
define Tweedie as a _DelegatedDistribution and delegate to one of the Tweedie ED families depending on p.
as you say, we need to ensure that the parameters are mapped correctly, e.g., Tweedie being parameterized by mu, sigma, and Gamma by alpha, beta.
probably we also want to change the _DelegatedDistribution to delegate private, not public methods. This could be done in a separate PR - the current delegator delegates public methods

Here is an illustration of the suggested delegator approach:

(Tweedie is a delegator compound of Tweedie ED families)

fkiraly · 2024-07-18T22:27:55Z

Opened new issue on Tweedie distribution here, as that does not seem too straightforward - for further discussion.
#429

fkiraly added module:regression probabilistic regression module interfacing algorithms Interfacing existing algorithms/estimators from third party packages feature request New feature or request labels Jul 11, 2024

fkiraly mentioned this issue Jul 11, 2024

Intervals/quantiles can be negative for models that can only make non-negative predictions #422

Open

fkiraly mentioned this issue Jul 15, 2024

[ENH] interface GLM models from glum #424

Open

ShreeshaM07 mentioned this issue Jul 17, 2024

[ENH] Tweedie Distribution #428

Draft

5 tasks

fkiraly mentioned this issue Jul 18, 2024

[ENH] Tweedie distribution, incl mathematics and design #429

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor #423

[ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor #423

fkiraly commented Jul 11, 2024

ShreeshaM07 commented Jul 15, 2024 •

edited

Loading

fkiraly commented Jul 15, 2024

ShreeshaM07 commented Jul 16, 2024 •

edited

Loading

ShreeshaM07 commented Jul 16, 2024

fkiraly commented Jul 17, 2024 •

edited

Loading

fkiraly commented Jul 18, 2024 •

edited

Loading

[ENH] interface TweedieRegressor from sklearn as skpro regressor #423

[ENH] interface TweedieRegressor from sklearn as skpro regressor #423

Comments

fkiraly commented Jul 11, 2024

ShreeshaM07 commented Jul 15, 2024 • edited Loading

fkiraly commented Jul 15, 2024

ShreeshaM07 commented Jul 16, 2024 • edited Loading

ShreeshaM07 commented Jul 16, 2024

fkiraly commented Jul 17, 2024 • edited Loading

fkiraly commented Jul 18, 2024 • edited Loading

[ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor #423

[ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor #423

ShreeshaM07 commented Jul 15, 2024 •

edited

Loading

ShreeshaM07 commented Jul 16, 2024 •

edited

Loading

fkiraly commented Jul 17, 2024 •

edited

Loading

fkiraly commented Jul 18, 2024 •

edited

Loading